Extract a subtree from a document tree

In this page, we are going to show how to use the subtree class to extract a subtree structure from an existing document tree using JSONPath. The subtree class takes as the argumnets to its constructor:

  • an existing document tree instance of document_tree type, and

  • a JSONPath expression

in order to reference a subtree within the document tree. Once the subtree is extracted, you can use its dump() function to dump its content as a JSON string.

First, let’s include the headers we need in this example code:

#include <orcus/json_document_tree.hpp>
#include <orcus/config.hpp>

#include <iostream>
#include <string_view>

Both document_tree and subtree classes are provided by the json_document_tree.hpp header, while the config.hpp header is to access the orcus::json_config struct type.

The following is the input JSON string we will be using in this example:

constexpr std::string_view input_json = R"(
{
  "id": "12345",
  "name": "John Doe",
  "email": "johndoe@example.com",
  "roles": ["admin", "editor"],
  "isActive": true,
  "profile": {
    "age": 34,
    "gender": "male",
    "address": {
      "street": "123 Elm Street",
      "city": "Springfield",
      "state": "IL",
      "zipCode": "62704"
    },
    "phoneNumbers": [
      {
        "type": "home",
        "number": "555-1234"
      },
      {
        "type": "work",
        "number": "555-5678"
      }
    ]
  },
  "preferences": {
    "notifications": {
      "email": true,
      "sms": false,
      "push": true
    },
    "theme": "dark",
    "language": "en-US"
  },
  "lastLogin": "2024-11-25T13:45:30Z",
  "purchaseHistory": [
    {
      "orderId": "A1001",
      "date": "2024-01-15T10:00:00Z",
      "total": 249.99,
      "items": [
        {
          "productId": "P123",
          "name": "Wireless Mouse",
          "quantity": 1,
          "price": 49.99
        },
        {
          "productId": "P124",
          "name": "Mechanical Keyboard",
          "quantity": 1,
          "price": 200.00
        }
      ]
    },
    {
      "orderId": "A1002",
      "date": "2024-06-10T14:20:00Z",
      "total": 119.99,
      "items": [
        {
          "productId": "P125",
          "name": "Noise Cancelling Headphones",
          "quantity": 1,
          "price": 119.99
        }
      ]
    }
  ]
}
)";

It is defined as a raw string literal to make the value more human-readable.

First, let’s load this JSON string into an in-memory tree:

orcus::json::document_tree doc;
doc.load(input_json, orcus::json_config{});

We can pass the input string defined above as its first argument. The load() function also requires a json_config instance as its second argument to specify some configuration parameters, but since we are not doing anything out of the ordinary, a default-constructed one will suffice.

With the source JSON document loaded into memory, let’s use the orcus::json::subtree class to extract the subtree whose root path is located at the path $.profile.address of the original document:

orcus::json::subtree sub(doc, "$.profile.address");
std::cout << sub.dump(2) << std::endl;

Executing this code will generate the following output:

{
  "street": "123 Elm Street",
  "city": "Springfield",
  "state": "IL",
  "zipCode": "62704"
}

One thing to note is that a subtree instance can only reference the original document stored in document_tree. The user therefore must ensure that the referencing instance will not outlive the original. Accessing the subtree instance after the original document has been destroyed causes an undefined behavior.

Note

You must ensure that the subtree instance will not outlive the original document tree instance. Accessing the subtree instance after the original document tree instance has been destroyed causes an undefined behavior.

Let’s use another example. This time, we will extract the subtree whose root path is located at $.purchaseHistory[1].items[0]:

orcus::json::subtree sub(doc, "$.purchaseHistory[1].items[0]");
std::cout << sub.dump(2) << std::endl;

This path includes object keys as well as array positions. Executing this code will generate the following output:

{
  "productId": "P125",
  "name": "Noise Cancelling Headphones",
  "quantity": 1,
  "price": 119.99
}

It’s important to note that, currently, subtree only supports a small subset of the JSONPath specification, and does not fully support expressions involving slicing or filtering. It does, however, support wildcards as the following example demonstrates:

orcus::json::subtree sub(doc, "$.purchaseHistory[*].items");
std::cout << sub.dump(2) << std::endl;

Executing this code will generate the following output:

[
  [
    {
      "productId": "P123",
      "name": "Wireless Mouse",
      "quantity": 1,
      "price": 49.99
    },
    {
      "productId": "P124",
      "name": "Mechanical Keyboard",
      "quantity": 1,
      "price": 200
    }
  ],
  [
    {
      "productId": "P125",
      "name": "Noise Cancelling Headphones",
      "quantity": 1,
      "price": 119.99
    }
  ]
]

It extracted the items subtrees from both elements of the purchaseHistory array, and sequentially put them into a newly-created array in order of occurrence.