Optimizing data access for high-latency networks II
Published by marco on
In the previous article, we discussed a performance problem in the calendar of Encodo’s time-tracking product, Punchclock.
Instead of guessing at the problem, we profiled the application using the database-statistics window available to all Quino applications.[1] We quickly discovered that most of the slowdown stems from the relatively innocuous line of code shown below.
var people =
Session.GetList<Person>().
Where(p => p.TimeEntries.Any()).
ToList();
First things first: what does the code do?
Before doing anything else, we should establish what the code does. Logically, it retrieves a list of people in the database who have recorded at least one time entry.
The first question we should ask at this point is: does the application even need to do this? The answer in this case is ‘yes’. The calendar includes a drop-down control that lets the user switch between the calendars for different users. This query returns the people to show in this drop-down control.
With the intent and usefulness of the code established, let’s dissect how it is accomplishing the task.
- The
Session.GetList<Person>()
portion retrieves a list of all people from the database - The
Where()
method is applied locally for each object in the list[2] - For a given person, the list of TimeEntries is accessed
- This access triggers a lazy load of the list
- The
Any()
method is applied to the full list of time entries - The
ToList()
method creates a list of all people who match the condition
Though the line of code looks innocuous enough, it causes a huge number of objects to be retrieved, materialized and retained in memory—simply in order to check whether there is at least one object.
This is a real-world example of a performance problem that can happen to any developer. Instead of blaming the developer who wrote this line of code, its more important to stay vigilant to performance problems and to have tools available to quickly and easily find them.
Stop creating all of the objects
The first solution I came up with[3] was to stop creating objects that I didn’t need. A good way of doing this and one that was covered in Quino: partially-mapped queries is to use cursors instead of lists. Instead of using the generated list TimeEntries
, the following code retrieves a cursor on that list’s query and materializes at most one object for the sub-query.
var people = Session.GetList<Person>().Select(p =>
{
using (var cursor = Session.CreateCursor<TimeEntry>(p.TimeEntries.Query))[4]
{
return cursor.Any();
}
}).ToList();
A check of the database statistics shows improvement, as shown below.
Time-entry queries with cursors
Just by using cursors, we’ve managed to reduce the execution time for each query by about 75%.[5] Since all we’re interested in finding out is whether there is at least one time entry for a person, we could also ask the database to count objects rather than to return them. That should be even faster. The following code is very similar to the example above but, instead of getting a cursor based on the TimeEntries
query, it gets the count.
var people =
Session.GetList<Person>().
Where(p => Session.GetCount(p.TimeEntries.Query) > 0).
ToList();
How did we do? A check of the database statistics shows even more improvement, as shown below.
Time-entry queries with COUNTs instead of SELECTs
We’re now down to a few dozen milliseconds for all of our queries, so we’re done, right? A 95% reduction in query-execution time should be enough.
Unfortunately, we’re still executing just as many queries as before, even though we’re taking far less time to execute them. This is better, but still not optimal. In high-latency situations, the user is still likely to experience a significant delay when opening the calendar since each query’s execution time is increased by the latency of the connection. In a local network, the latency is negligible; on a WAN, we still have a problem.
In the next article, we’ll see if we can’t reduce the number of queries being executed.
It is important for users of the Microsoft Entity Framework (EF) to point out that Quino does not have a Linq-to-Sql mapper. That means that any Linq expressions like Where()
are evaluated locally instead of being mapped to the database. There are various reasons for this but the main one is that we ended up preferring a strict boundary between the mappable query API and the local evaluation API.
Anything formulated with the query API is guaranteed to be executed by the data provider (even if it must be evaluated locally) and anything formulated with Linq is naturally evaluated locally. In this way, the code is clear in what is sent to the server and what is evaluated locally. Quino only very, very rarely issues an “unmappable query” exception, unlike EF, which occasionally requires contortions until you’ve figured out which C# formulation of a particular expression can be mapped by EF.
↩