Quino: an introduction to query-mapping in the ORM
Published by marco on
The following article was originally published on the Encodo blogs and is cross-published here.
One of the most-used components of Quino is the ORM. An ORM is an Object-Relational Mapper, which accepts queries and returns data.
- Applications formulate queries in Quino using application metadata
- The ORM maps this query to the query language of the target database
- The ORM transforms the results returned by the database to objects (the classes for which were also generated from application metadata).
This all sounds a bit abstract, so let’s start with a concrete example. Let’s say that we have millions of records in an employee database. We’d like to get some information about that data using our ORM. With millions of records, we have to be a bit careful about how that data is retrieved, but let’s continue with concrete examples.
Attempt #1: Get your data and refine it locally
The following example returns the correct information, but does not satisfy performance or scalability requirements.[1]
var people = Session.GetList<Person[2]>().Where(p => p.Company.Name == "IBM");
Assert.That(people.Count(), Is.GreaterThanEqual(140000));
What’s wrong with the statement above? Since the call to Where
occurs after the call to GetList<Person>()
, the restriction cannot possibly have been passed on to the ORM.
The first line of code doesn’t actually execute anything. It’s in the call to Count()
that the ORM and LINQ are called into action. Here’s what happens, though:
- For each row in the
Person
table, create aPerson
object - For each person object, create a corresponding
Company
object - Count all people where the
Name
of the person’s company is equal to “IBM”.
The code above benefits from almost no optimization, instantiating a tremendous number of objects in order to yield a scalar result. The only side-effect that can be considered an optimization is that most of the related Company
objects will be retrieved from cache rather than from the database. So that’s a plus.
Still, the garbage collector is going to be running pretty hot and the database is going to see far more queries than necessary.[3]
Attempt #2: Refine results on the database
Let’s try again, using Quino’s fluent querying API.[4] The Quino ORM can map much of this API to SQL. Anything that is mapped to the database is not performed locally and is, by definition, more efficient.[5]
var people = Session.GetList<Person>();
people.Query.Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");[6]
Assert.That(people.Count, Is.GreaterThanEqual(140000));
First, we get a list of people from the Session
. As of the first line, we haven’t actually gotten any data into memory yet—we’ve only created a container for results of a certain type (Person
in this case).
The default query for the list we created is to retrieve everything without restriction, as we saw in the first example. In this example, though, we restrict the Query
to only the people that work for a company called “IBM”. At this point, we still haven’t called the database.
The final line is the first point at which data is requested, so that’s where the database is called. We ask the list for the number of entries that match it and it returns an impressive number of employees.
At this point, things look pretty good. In older versions of Quino, this code would already have been sufficiently optimized. It results in a single call to the database that returns a single scalar value with everything calculated on the database. Perfect.
Attempt #3: Avoid creating objects at all
However, since v1.6.0 of Quino[7], the call to the property IDataList.Count
has automatically populated the list with all matching objects as well. We made this change because the following code pattern was pretty common:
var list = Session.GetList<Person>();
// Adjust query here
if (list.Count > 0)
{
// do something with all of the objects here
}
That kind of code resulted in not one, but two calls to the database, which was killing performance, especially in high-latency environments.
That means, however, that the previous example is still going to pull 14,000 objects into memory, all just to count them and add them to a list that we’re going to ignore. The garbage collector isn’t a white-hot glowing mess anymore, but it’s still throwing you a look of disapproval.
Since we know that we don’t want the objects in this case, we can get the old behavior back by making the following adjustment.
var people = Session.GetList<Person>();
people.Query.Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");
Assert.That(Session.GetCount(people.Query), Is.GreaterThanEqual(140000));
It would be even clearer to just forget about creating a list at all and work only with the query instead.
var query = Session.GetQuery<Person>();
query.Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");
Assert.That(Session.GetCount(query), Is.GreaterThanEqual(140000));
Now that’s a maximally efficient request for a number of people in Quino 1.10 as well.
Tune in next time for a look at what happens when a query can only be partially mapped to the database.
Person
class used here is generated from the application metadata rather than written by the developer, as in other frameworks.↩There are different strategies for retrieving associated data. Quino does not yet support retrieving anything other than root objects. That is, the associated Company
object is not retrieved in the same query as the Person
object.
In the example in question, the first indication that the ORM has that a Company
is required is when the lambda retrieves them individually. Even if the original query had somehow indicated that the Company
objects were also desired (e.g. using something like Include(Person.Relations.Company)
as you would in EF), the most optimal mapping strategy is still not clear.
Should the mapper join the company table and retrieve that highly redundant data with each person? Or should it execute a single query for all companies and prime a cache with those? The right answer depends on the latency and bandwidth between the ORM and the database as well as myriad other conditions. When dealing with a lot of data, it’s not hard to find examples where the default behavior of even a clever ORM isn’t maximally efficient—or even very efficient at all.
As we already noted, though, the example in question does everything in memory. If we reasonably assume that the people belong to a relatively small number of companies—say qc—then the millions of calls to retrieve companies associated with people will result in a lot of cache hits and generate “only” qc + 1 queries.
Person.Relations
and Person.Fields
static fields are generated with the Person
class. These correspond to the application metadata and change when the metadata changes. Developers are encouraged to use these generated constants so that even metadata-based queries can be validated by the compiler.↩