Quino: an introduction to query-mapping in the ORM

Published by marco on

The following article was originally published on the Encodo blogs and is cross-published here.

One of the most-used components of Quino is the ORM. An ORM is an Object-Relational Mapper, which accepts queries and returns data.

Applications formulate queries in Quino using application metadata
The ORM maps this query to the query language of the target database
The ORM transforms the results returned by the database to objects (the classes for which were also generated from application metadata).

This all sounds a bit abstract, so let’s start with a concrete example. Let’s say that we have millions of records in an employee database. We’d like to get some information about that data using our ORM. With millions of records, we have to be a bit careful about how that data is retrieved, but let’s continue with concrete examples.

Attempt #1: Get your data and refine it locally

The following example returns the correct information, but does not satisfy performance or scalability requirements.[1]

var people = Session.GetList<Person[2]>().Where(p => p.Company.Name == "IBM");

Assert.That(people.Count(), Is.GreaterThanEqual(140000));

What’s wrong with the statement above? Since the call to Where occurs after the call to GetList<Person>(), the restriction cannot possibly have been passed on to the ORM.

The first line of code doesn’t actually execute anything. It’s in the call to Count() that the ORM and LINQ are called into action. Here’s what happens, though:

For each row in the Person table, create a Person object
For each person object, create a corresponding Company object
Count all people where the Name of the person’s company is equal to “IBM”.

The code above benefits from almost no optimization, instantiating a tremendous number of objects in order to yield a scalar result. The only side-effect that can be considered an optimization is that most of the related Company objects will be retrieved from cache rather than from the database. So that’s a plus.

Still, the garbage collector is going to be running pretty hot and the database is going to see far more queries than necessary.[3]

Attempt #2: Refine results on the database

Let’s try again, using Quino’s fluent querying API.[4] The Quino ORM can map much of this API to SQL. Anything that is mapped to the database is not performed locally and is, by definition, more efficient.[5]

var people = Session.GetList<Person>();
people.Query.Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");[6]

Assert.That(people.Count, Is.GreaterThanEqual(140000));

First, we get a list of people from the Session. As of the first line, we haven’t actually gotten any data into memory yet—we’ve only created a container for results of a certain type (Person in this case).

The default query for the list we created is to retrieve everything without restriction, as we saw in the first example. In this example, though, we restrict the Query to only the people that work for a company called “IBM”. At this point, we still haven’t called the database.

The final line is the first point at which data is requested, so that’s where the database is called. We ask the list for the number of entries that match it and it returns an impressive number of employees.

At this point, things look pretty good. In older versions of Quino, this code would already have been sufficiently optimized. It results in a single call to the database that returns a single scalar value with everything calculated on the database. Perfect.

Attempt #3: Avoid creating objects at all

However, since v1.6.0 of Quino[7], the call to the property IDataList.Count has automatically populated the list with all matching objects as well. We made this change because the following code pattern was pretty common:

var list = Session.GetList<Person>();
// Adjust query here
if (list.Count > 0)
{
  // do something with all of the objects here
}

That kind of code resulted in not one, but two calls to the database, which was killing performance, especially in high-latency environments.

That means, however, that the previous example is still going to pull 14,000 objects into memory, all just to count them and add them to a list that we’re going to ignore. The garbage collector isn’t a white-hot glowing mess anymore, but it’s still throwing you a look of disapproval.

Since we know that we don’t want the objects in this case, we can get the old behavior back by making the following adjustment.

var people = Session.GetList<Person>();
people.Query.Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");

Assert.That(Session.GetCount(people.Query), Is.GreaterThanEqual(140000));

It would be even clearer to just forget about creating a list at all and work only with the query instead.

var query = Session.GetQuery<Person>();
query.Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");

Assert.That(Session.GetCount(query), Is.GreaterThanEqual(140000));

Now that’s a maximally efficient request for a number of people in Quino 1.10 as well.

Tune in next time for a look at what happens when a query can only be partially mapped to the database.

[1] I suppose it depends on what those requirements are, but if you think your application’s performance requirements are so loose that it’s OK to create millions of objects in memory just in order to count them, then you’re probably not in the target audience for this article.↩

[2] Instead of using a code-first, Quino uses a “metadata-first” approach, so the Person class used here is generated from the application metadata rather than written by the developer, as in other frameworks.↩

[3]

There are different strategies for retrieving associated data. Quino does not yet support retrieving anything other than root objects. That is, the associated Company object is not retrieved in the same query as the Person object.

In the example in question, the first indication that the ORM has that a Company is required is when the lambda retrieves them individually. Even if the original query had somehow indicated that the Company objects were also desired (e.g. using something like Include(Person.Relations.Company) as you would in EF), the most optimal mapping strategy is still not clear.

Should the mapper join the company table and retrieve that highly redundant data with each person? Or should it execute a single query for all companies and prime a cache with those? The right answer depends on the latency and bandwidth between the ORM and the database as well as myriad other conditions. When dealing with a lot of data, it’s not hard to find examples where the default behavior of even a clever ORM isn’t maximally efficient—or even very efficient at all.

As we already noted, though, the example in question does everything in memory. If we reasonably assume that the people belong to a relatively small number of companies—say qc—then the millions of calls to retrieve companies associated with people will result in a lot of cache hits and generate “only” qc + 1 queries.

↩

[4] Quino does not have LINQ to SQL support. I’m not even going to write “yet” at the end of that sentence because it’s not at all clear that we’re ever going to have it. Popular demand might convince us otherwise, but for now we’re quite happy with our API (and soon-to-be-revealed query language QQL).↩

[5] That’s an assumption I’m going to make for which counterexamples certainly exist, but none of which apply to the simple examples we’ll address in this article.↩

[6] The Person.Relations and Person.Fields static fields are generated with the Person class. These correspond to the application metadata and change when the metadata changes. Developers are encouraged to use these generated constants so that even metadata-based queries can be validated by the compiler.↩

[7] That was in almost three years ago, in June of 2011.↩