
Query Nested Data in Postgres using Node.js
When building servers that provide JSON data, either using REST or GraphQL, we often want to query some time, along with its “children”. If we’re not careful we can end up with a large number of inefficient queries. Fortunately, Postgres has some functions that allow returning complex data structures directly as “JSON”. This works great with node.js because they just become native JavaScript objects.
The Problem
Consider the database schema:

We can create this database schema, along with some sample data by running:
CREATE TABLE users (
user_id BIGSERIAL NOT NULL PRIMARY KEY,
display_name TEXT NOT NULL,
manager_id BIGINT NULL REFERENCES users(user_id)
);CREATE TABLE teams (
team_id BIGSERIAL NOT NULL PRIMARY KEY,
display_name TEXT NOT NULL
);CREATE TABLE user_teams (
user_id BIGINT NOT NULL REFERENCES users,
team_id BIGINT NOT NULL REFERENCES teams,
PRIMARY KEY(user_id, team_id)
);INSERT INTO users
(user_id, display_name, manager_id)
VALUES
(1, 'Forbes', NULL), (2, 'John', NULL), (3, 'Joe', 1);INSERT INTO teams
(team_id, display_name)
VALUES
(1, 'Awesome Team'), (2, 'Team of One');INSERT INTO user_teams
(user_id, team_id)
VALUES
(1, 1), (2, 1), (1, 2);
Now imagine we want to run a query like:
Get a list of all users, and for each user, get their teams and their managers
We want output that looks something like:
[
{
"id": 1,
"display_name": "Forbes",
"manager": null,
"teams": [
{
"id": 1,
"display_name": "Awesome Team"
},
{
"id": 2,
"display_name": "Team of One"
}
]
},
{
"id": 2,
"display_name": "John",
"manager": null,
"teams": [
{
"id": 1,
"display_name": "Awesome Team"
}
]
},
{
"id": 3,
"display_name": "Joe",
"manager": {
"id": 1,
"display_name": "Forbes"
},
"teams": []
}
]The Simplest Solution?
We have a few options. We could just run all the queries recursively using node.js to combine all the data using @databases/pg:
const results = await Promise.all(
(await db.query(
sql`
SELECT u.user_id AS id, u.display_name, u.manager_id
FROM users u
`,
)).map(async ({manager_id, ...user}) => ({
...user,
manager:
(await db.query(
sql`
SELECT u.user_id AS id, u.display_name
FROM users u
WHERE u.user_id=${manager_id}
`,
))[0] || null,
teams: await db.query(
sql`
SELECT t.team_id AS id, t.display_name
FROM user_teams ut
JOIN teams t USING (team_id)
WHERE ut.user_id = ${user.id}
`,
),
})),
);
console.log(results);This works fine, as long as the data set is small. The only problem is we’re running 2n + 1 queries (where n is the number of users). We also can’t run the inner queries until we’ve fetched all the users. This is putting a lot of unnecessary load on both our node.js server and our Postgres server.
Select ANY
If you’re familiar with JavaScript, and have some familiarity with SQL, you might think to fix this by making just 3 queries:
const users = await db.query(
sql`
SELECT u.user_id AS id, u.display_name, u.manager_id
FROM users u
`,
);const userTeams = await db.query(
sql`
SELECT t.team_id AS id, t.display_name, ut.user_id AS user_id
FROM user_teams ut
JOIN teams t USING (team_id)
WHERE ut.user_id = ANY(${users.map(u => u.id)})
`,
);const managers = await db.query(
sql`
SELECT u.user_id AS id, u.display_name
FROM users u
WHERE u.user_id = ANY(${users.map(u => u.manager_id)})
`,
);
This is much more efficient, but we’re still left with a lot of work to do in the JavaScript land to combine these results into the desired nested structure. We’ve fetched all the data reasonably efficiently, but we’ve still got most of the work to do.
JSON to the Rescue!
Wouldn’t it be great if Postgres could just directly return the nested data in exactly the format we want? The good news is, it can!
First, we need to understand a little bit about the data types
- The
Recorddata type, represents a row of SQL data. At the top level, all SQL queries return a list ofRecords. Unfortunately, if we try and return nested rows, these come back as awkward strings that are hard to decode into anything useful. - The
Arraydata type, represents a list of things, e.g. JSON objects. - The
JSONdata type, represents an object or array.
We’re going to use the following functions to deal with all these types:
row_to_jsontakes aRecordand returnsJSON.array_aggis an “aggregation function”. If you use this in aSELECTquery, you get one record back containing anArrayof values, instead of many records each containing one value.array_to_jsontakes anArray(e.g. the result ofarray_agg) and converts it intoJSON.coalescetakes two values, and if the first value isNULLit takes the second value (casting it to the type of the first value).
Handling the manager field
The first is row_to_json. You can use a subquery that returns only one value as a field in SQL. All we need to do here is convert the row to JSON so that our one value can be a complex object:
const results = await db.query(
sql`
SELECT
u.user_id AS id,
u.display_name,
(
SELECT row_to_json(x) FROM
(
SELECT m.user_id AS id, m.display_name
FROM users m WHERE m.user_id = u.manager_id
) x
) AS manager
FROM users u
`,
);The extra nesting there is so that we can give a clear name x to our record, and pass that into row_to_json.
We can tidy this up with a helper function:
function nestQuerySingle(query) {
return sql`
(SELECT row_to_json(x) FROM (${query}) x)
`;
}Then our query becomes:
const results = await db.query(
sql`
SELECT
u.user_id AS id,
u.display_name,
${nestQuerySingle(
sql`
SELECT m.user_id AS id, m.display_name
FROM users m WHERE m.user_id = u.manager_id
`
)} AS manager
FROM users u
`,
);N.B. This would throw an error in Postgres if the manager query ever returned multiple records
Handling the teams
Now we have our Records converted into JSON, but we need to fix another problem to handle the teams. The issue is that you cannot have multiple values as part of a single field. This is where array_agg comes in, along with array_to_json and coalesce.
const results = await db.query(
sql`
SELECT
u.user_id AS id,
u.display_name,
${nestQuerySingle(
sql`
SELECT m.user_id AS id, m.display_name
FROM users m WHERE m.user_id = u.manager_id
`
)} AS manager,
coalesce(
(
SELECT array_to_json(array_agg(row_to_json(x)))
FROM (
SELECT t.team_id AS id, t.display_name
FROM user_teams ut
JOIN teams t USING (team_id)
WHERE ut.user_id = u.user_id
) x
),
'[]'
) AS teams
FROM users u
`,
);We use a sub query to get the teams, then we use array_to_json(array_agg(row_to_json(x))) to convert the many records into a single JSON value. Unfortunately, Postgres will return NULL here if there are no records, not [], so we need to use coalesce to specify a default. The default of '[]' as a string here gets automatically cast to JSON to match the type of the result of array_to_json.
One more helper function:
function nestQuery(query) {
return sql`
coalesce(
(
SELECT array_to_json(array_agg(row_to_json(x)))
FROM (${query}) x
),
'[]'
)
`;
}and we have the final query:
const results = await db.query(
sql`
SELECT
u.user_id AS id,
u.display_name,
${nestQuerySingle(
sql`
SELECT m.user_id AS id, m.display_name
FROM users m WHERE m.user_id = u.manager_id
`
)} AS manager,
${nestQuery(
sql`
SELECT t.team_id AS id, t.display_name
FROM user_teams ut
JOIN teams t USING (team_id)
WHERE ut.user_id = u.user_id
`
)} AS teams
FROM users u
`,
);This returns the exact same data as our “simple” solution using queries nested in node.js, except that now the nesting occurs in SQL. This will be much more performant, not to mention easier to maintain.
Conclusion
You may want a few helper functions, but Postgres with @databases is a really good fit for querying nested data structures. If you liked this article, you should definitely click that “follow” button. Soon, I’ll be writing an article about how to perform recursive queries. e.g. what if I wanted to get everyone above me in the hierarchy of managers (i.e. my manager, and their manager, and their manager, … etc.).

