Syntax Tree and Alternative to LINQ in Interaction with SQL Databases

Dmitry Tikhonov
ITNEXT
Published in
7 min readOct 22, 2020

--

It was a legacy enterprise project and I was asked to improve its “advanced” filtering capabilities.

Before they had something like this:

But wanted something like this:

Let’s skip the UI part and focus on the data access layer. Like 99% of similar projects this one used a sql database (MS SQL but it is does not matter) and this project belonged to the part of the similar projects that does not struggle with Entity Framework but puts all the logic in stored procedures. The procedure which performed the “advanced” search looked like that:

CREATE PROCEDURE dbo.SomeContextUserFind
(@ContextId int, @Filter nvarchar(max)) AS
BEGIN
DECLARE @sql nvarchar(max) =
N'SELECT U.UserId, U.FirstName, U.LastName
FROM [User] U
INNER JOIN [SomeContext] [C]
ON ....
WHERE [C].ContextId = @p1 AND ' + @Filter;
EXEC sp_executesql

@sql,
N'@p1 int',
@p1=@ContextId
END

and the code that generated the filter string looked something like that:

Of course it is not the greatest code you have ever seen, but unfortunately legacy projects (or even not yet legacy) often are full of similar things. Anyway, it is what it is and it should be somehow improved.

The first thought that came to my mind is just adding more fields into the “FilterItem” and make the building logic more complex, but I quickly realized that it a road to nowhere — it would be extremely difficult to maintain such code, and I would never achieve the desired functionality.

At that point, I remembered the “abstract syntax tree”, which is obviously the best choice in this case, and now I am going to explain what it is and how it helps.

Abstract Syntax Tree

First, let’s look at the filter strings we’re going to create, for example this one:

[FirstName]='John' AND ([LastName]='Smith' OR [LastName]='Doe')

And here we can notice some structure:

This structure is a tree and that means that we can create several simple classes which will describe it:

Using the classes, we can create an object which would represent the original filter:

In fact, this structure, called the “Abstract Syntax Tree”, can be used to represent more complex queries, but they will all have a single “root” that can be stored in a single object.

“Abstract Syntax Tree” is a result of parsing a phrase of some formal language (language of Boolean expressions in our case) with defined syntax rules. Such rules have a specific notation. For example the rules of our simple language (subset of boolean expressions) can be written as:

<eqPredicate> ::= <column> = <str>
<or> ::= <eqPredicate>|or|<and> OR <eqPredicate>|or|<and>
<and> ::= <eqPredicate>|(<or>)|<and> AND <eqPredicate>|(<or>)|<and>

Note: “Abstract” means that the syntax do not describe all the language details, for instance, grouping parentheses, extra spaces etc.

Parsing is a separate huge topic in and of itself, and it’s not that important right now since we already have syntax trees ready, so let’s focus on what we can do with them.

SQL Code Generation

Obviously, the most important task for us is converting the syntax tree back to text and we have several ways to do that.

The first way is to use pattern matching, which is pretty straightforward:

As a result, the builder will contain the following line:

[FirstName]='John' AND ([LastName]='Smith' OR [LastName]='Joe')

It seems this is what we need!

Visitor

Although I have a passion for functional programming, in this case the object-oriented approach may provide a more efficient solution — I’m talking about the “Visitor” pattern. The idea of this pattern is that we do not try to determine a type of an object, but give it a list of all possible actions (in the form of an interface) and the object itself chooses the action that is most appropriate for its type. Let’s define the list:

and any object (of our structure) can accept the selection:

Now we can extract the generation of sql code into a separate class:

And use it as follows:

As a result, we will get the desired string:

[FirstName]='John' AND ([LastName]='Smith' OR [LastName]='Joe')

Using the “Visitor” pattern has several advantages over the pattern matching. So, for example, the selection of concrete types is always exhaustive since adding a new type to the structure always leads to a change in the IExprVisitor interface and, as a result, the need to extend all its implementations (otherwise there will be compilation errors).

Traversal and parentheses

There are several aspects to pay attention to in this algorithm.

First, how does it work at all?

In fact, here we are doing a depth-first traversal of the syntax tree and the sql code is a trace of this traversal:

Second, what happens to the parentheses?

The point is that Boolean expressions have a certain order of evaluation. ‘AND’ operation is evaluated first, and ‘OR’ operation is evaluated second. The parentheses are needed to change this sequence, since any expression in parentheses has a higher priority in the evaluation. But in the syntax tree, the sequence of the evaluation is given by the structure itself (from branches to the root), so a separate type for parentheses is not required.

Extension of the syntax

In reality, of course, we also need other predicates, for example ‘Not Equal’ and to be able to use it, we just need to add a new class:

And since we have a new type, the compiler tells us that we need to implement the SQL code generation for it:

We’ve created just a very simple set of Boolean predicates and MS SQL supports many more, but as you see you can easily add all the required language structures.

By the way, the SQL Server documentation has all the SQL syntax rules that are very useful:

Operator Overloading

Obviously, creating syntax trees by calling the class constructors is not convenient at all. However, the C# operator overloading can help us with this.

Let’s do the following:

Now we can create a syntax tree in a very simple and clear way:

And the result again will be:

[FirstName]='John' AND ([LastName]='Smith' OR [LastName]='Doe')

Note: C# does not allow overloading of the && and || operators and actually it makes sense, since these operators stop further evaluation if the result is already known, but we need all the parts to build the syntax tree (it will be evaluated by the SQL database).

What is Next

We seem to have solved the problem with boolean filters, but what is about sorting and paging? It is also sometimes necessary to attach a sql view (or derived table) to filter (or sort) on a calculated field.

Not a problem! Let’s just implement all the SQL SELECT syntax:

Of course, you do not need to do it by yourself since there are several libraries (for example mine) where this is already implemented, so just consider this approach as an alternative way to interact with SQL databases.

Alternative to LINQ

What we’ve done with the operator overloading is somewhat similar to LINQ expressions, and in fact there are some actual similarities. The C# compiler generates syntax trees and then libraries such as Entity Framework or “LINQ to SQL” convert those trees to real SQL code. The main problem with this approach is that the compiler generates syntax tree of C# language, but we need SQL! Reflecting imperative C# into declarative SQL is not an easy task, and the results are often unpredictable.

I prefer a different approach — instead of using C # syntax as a basis, it can use real SQL syntax. And instead of the compiler, it can use operator overloading, extension methods and fluent builders.

With this approach,on the one hand, we get almost the same flexibility as if we were using stored procedures. On the other hand, we have strong typing, intellisense and business logic does not move to the database. And … no need to google how to do a left join using LINQ )

Another advantage is that the data update statements (INSERT, UPDATE, DELETE and even MERGE) can also be implemented in the same way, and there is no need to load thousands of records from the database to update a just single column.

This is an example of what can be done using real SQL syntax as a basis (taken from here):

Conclusion

Syntax tree is a very powerful data structure that you will likely come across sooner or later. Probably these will be LINQ expressions, or maybe you will need to create a Roslyn analyzer, or maybe you want to create your own syntax yourself, like I did a few years ago in order to refactor one legacy project. In any case, it is important to understand this structure and be able to work with it.

Link to SqExpress — a project that partially includes the code from this article.

--

--