Quino: partially-mapped queries
Published by marco on
In Quino: an overview of query-mapping in the data driver we took a look at some of the basics of querying data with Quino while maintaining acceptable performance and memory usage.
Now we’ll take a look at what happens with partially-mapped queries. Before explaining what those are, we need a more concrete example to work with. Here’s the most-optimized query we ended up with in the previous article:
var query = Session.GetQuery<Person>();
query.Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");
Assert.That(Session.GetCount(query), Is.GreaterThanEqual(140000));
With so many entries, we’ll want to trim down the list a bit more before we actually create objects. Let’s choose only people whose last names start with the letter “M”.
var query = Session.GetQuery<Person>();
query
.Where(Person.Fields.LastName, ExpressionOperator.StartsWith[1], "M")
.Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");
Assert.That(Session.GetCount(query), Is.Between(100, 120));
This is the kind of stuff that works just fine in other ORMs, like Entity Framework. Where Quino goes just a little farther is in being more forgiving when a query can be only partially mapped to the server. If you’ve used EF for anything beyond trivial queries, you’ve surely run into an exception that tells you that some portion of your query could not be mapped.[2]
Instead of throwing an exception, Quino sends what it can to the database and uses LINQ to post-process the data sent back by the database to complete the query.
Introducing unmappable expressions
Unmappable code can easily sneak in through aspects in the metadata that define filters or sorts using local methods or delegates that do not exist on the server. Instead of building a complex case, we’re going to knowingly include an unmappable expression in the query.
var query = Session.GetQuery<Person>();
query
.Where(new DelegateExpression[3](c => c.GetObject<Person>().LastName.StartsWith("M")[4])
.Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM");
Assert.That(Session.GetCount(query), Is.Between(100, 120));
The new expression performs the same check as the previous example, but in a way that cannot be mapped to SQL.[5] With our new example, we’ve provoked a situation where any of the following could happen:
- The ORM could throw up its hands and declare the query unmappable, pushing the responsibility for separating mappable from unmappable onto the shoulders of the developers. As noted above, this is what EF does.
- The ORM could determine that the query is unmappable and evaluate everything locally, retrieving only the initial set of
Person
objects from the server (all several million of them, if you’ll recall from the previous post). - The ORM could map part of the query to the database, retrieving the minimal set of objects necessary in order to guarantee the correct result. This is what Quino does. The strategy works in many cases, but is not without its pitfalls.
What happens when we evaluate the query above? With partial mapping, we know that the restriction to “IBM” will be applied on the database. But we still have an additional restriction that must be applied locally. Instead of being able to get the count from the server without creating any objects, we’re now forced to create objects in memory so that we can apply the local restrictions and only count the objects that match them all.
But as you’ll recall from the previous article, the number of matches for “IBM” is 140,000 objects. The garbage collector just gave you a dirty look again.
Memory bubbles
There is no way to further optimized this query because of the local evaluation, but there is a way to avoid another particularly nasty issue: memory bubbles.
What is a memory bubble you might ask? It describes what happens when your application is using nMB and then is suddenly using n + 100MB because you created 140,000 objects all at once. Milliseconds later, the garbage collector is thrashing furiously to clean up all of these objects—and all because you created them only in order to filter and count them. A few milliseconds after that, your application is back at nMB but the garbage collector’s fur is ruffled and it’s still trembling slightly from the shock.
The way to avoid this is to stream the objects through your analyzer one at a time rather than to create them all at once. Quino uses lazily-evaluated IEnumerable<T>
sequences throughout the data driver specifically to prevent memory bubbles.
Streaming with IEnumerable<T>
sequences
Before tackling how the Quino ORM handles the Count()
, let’s look at how it would return the actual objects from this query.
- Map the query to create a
SELECT
statement - At this point, it doesn’t matter whether the entire query could be mapped
- Create an
IEnumerable<T>
sequence that represents the result of the mapped query - At this point, nothing has been executed and no objects have been returned
- Wrap the sequence in another sequence that applies all of the “unhandled” parts of the query
- Return that sequence as the result of executing the query
- At this point, we still haven’t actually executed anything on the database or created any objects
Right, now we have an IEnumerable<T>
that represents the result set, but we haven’t lit the fuse on it yet.
How do we light the fuse? Well, the most common way to do so is to call ToList()
on it. What happens then?
- The
IEnumerator<T>
requests an element - The query is executed against the database and returns an
IDataReader
- The reader requests a row and creates a
Person
object from that row’s data - The wrapper that performs the local evaluation applies its filter(s) to this
Person
and yields it if it matches - If it matched the local filters, the
Person
is added to the list - Control returns to the
IDataReader
, which requests another row - Repeat until no more rows are returned from the database
Since the decision to add all objects to a list occurs all the way at the very outer caller, it’s the caller that’s at fault for the memory bubble not the driver.[6] We’ll see in the section how to avoid creating a list when none is needed.
Using cursors to control evaluation
If we wanted to process data but perhaps offer the user a chance to abort processing at any time, we could even get an IDataCursor<T>
from the Quino ORM so control iteration ourselves.
using (var cursor = Session.CreateCursor(query))
{
foreach (var obj in cursor)
{
// Do something with obj
if (userAbortedOperation) { break; }
}
}
And finally, the count query
But back to evaluating the query above. The Quino ORM handles it like this:
- Try to map the query to create a
COUNT
statement - Notice that at least one restriction could not be mapped
- Create a cursor to SELECT all of the objects for the same query (shown above) and count all the objects returned by that instead
So, if a count-query cannot be fully mapped to the database, the most efficient possible alternative is to execute a query that retrieves as few objects as possible (i.e. maps as much to the server as it can) and streams those objects to count them locally.
Tune in next time for a look at how to exert more control with limit
and offset
and how those work together with partial mapping.
ExpressionOperator.StartsWithCI
to perform the check in a case-insensitive manner instead.↩DelegateExpression
simply wraps the lambda given in the constructor in a Quino expression object. The parameter c
is an IExpressionContext
that provides the target object, which is in this case a Person
.↩LastName
field.↩Session.CreateCursor()
to control evaluation yourself and create the right-sized batches of objects to count. The ChangeAndSave()
extension method does exactly that to load objects in batches (size adjustable by an optional parameter) rather than one by one.↩