4 months Ago

Optimizing data access for high-latency networks II

Published by marco on in Programming

 In the previous article, we discussed a performance problem in the calendar of Encodo’s time-tracking product, Punchclock.

Instead of guessing at the problem, we profiled the application using the database-statistics window available to all Quino applications.[1] We quickly discovered that most of the slowdown stems from the relatively innocuous line of code shown below.

var people = 
  Session.GetList<Person>().
  Where(p => p.TimeEntries.Any()).
  ToList();

First things first: what does the code do?

Before doing anything else, we should establish what the code does. Logically, it retrieves a list of people in the database who have recorded at least one time entry.

The first question we should ask at this point is: does the application even need to do this? The answer in this case is ‘yes’. The calendar includes a drop-down control that lets the user switch between the calendars for different users. This query returns the people to show in this drop-down control.

With the intent and usefulness of the code established, let’s dissect how it is accomplishing the task.

  1. The Session.GetList<Person>() portion retrieves a list of all people from the database
  2. The Where() method is applied locally for each object in the list[2]
  3. For a given person, the list of TimeEntries is accessed
  4. This access triggers a lazy load of the list
  5. The Any() method is applied to the full list of time entries
  6. The ToList() method creates a list of all people who match the condition

Though the line of code looks innocuous enough, it causes a huge number of objects to be retrieved, materialized and retained in memory—simply in order to check whether there is at least one object.

This is a real-world example of a performance problem that can happen to any developer. Instead of blaming the developer who wrote this line of code, its more important to stay vigilant to performance problems and to have tools available to quickly and easily find them.

Stop creating all of the objects

The first solution I came up with[3] was to stop creating objects that I didn’t need. A good way of doing this and one that was covered in Quino: partially-mapped queries is to use cursors instead of lists. Instead of using the generated list TimeEntries, the following code retrieves a cursor on that list’s query and materializes at most one object for the sub-query.

var people = Session.GetList<Person>().Select(p =>
{
  using (var cursor = Session.CreateCursor<TimeEntry>(p.TimeEntries.Query))[4]
  {
    return cursor.Any();
  }
}).ToList();

A check of the database statistics shows improvement, as shown below.

 Time-entry queries with cursors

Just by using cursors, we’ve managed to reduce the execution time for each query by about 75%.[5] Since all we’re interested in finding out is whether there is at least one time entry for a person, we could also ask the database to count objects rather than to return them. That should be even faster. The following code is very similar to the example above but, instead of getting a cursor based on the TimeEntries query, it gets the count.

var people =
  Session.GetList<Person>().
  Where(p => Session.GetCount(p.TimeEntries.Query) > 0).
  ToList();

How did we do? A check of the database statistics shows even more improvement, as shown below.

 Time-entry queries with COUNTs instead of SELECTs

We’re now down to a few dozen milliseconds for all of our queries, so we’re done, right? A 95% reduction in query-execution time should be enough.

Unfortunately, we’re still executing just as many queries as before, even though we’re taking far less time to execute them. This is better, but still not optimal. In high-latency situations, the user is still likely to experience a significant delay when opening the calendar since each query’s execution time is increased by the latency of the connection. In a local network, the latency is negligible; on a WAN, we still have a problem.

In the next article, we’ll see if we can’t reduce the number of queries being executed.


[1] This series of articles shows the statistics window as it appears in Winforms applications. The data-provider statistics are also available in Quino web applications as a Glimpse plug-in.
[2]

It is important for users of the Microsoft Entity Framework (EF) to point out that Quino does not have a Linq-to-Sql mapper. That means that any Linq expressions like Where() are evaluated locally instead of being mapped to the database. There are various reasons for this but the main one is that we ended up preferring a strict boundary between the mappable query API and the local evaluation API.

Anything formulated with the query API is guaranteed to be executed by the data provider (even if it must be evaluated locally) and anything formulated with Linq is naturally evaluated locally. In this way, the code is clear in what is sent to the server and what is evaluated locally. Quino only very, very rarely issues an “unmappable query” exception, unlike EF, which occasionally requires contortions until you’ve figured out which C# formulation of a particular expression can be mapped by EF.

[3] Well, the first answer I’m going to pretend I came up with. I actually thought of another answer first, but then quickly discovered that Quino wasn’t mapping that little-used feature correctly. I added an issue to tackle that problem at a later date and started looking for workarounds. That fix will be covered in the next article in this series.
[4] Please note that cursors are disposable and that the calling application is responsible for cleanup. Failure to dispose of a cursor that has been at least partially iterated will result in an open connection in the underlying database providers associated with the query and will eventually lead to connection-pool exhaustion on those databases.
[5] Please ignore the fact that we also dropped 13 person queries. This was not due to any fix that we made but rather that I executed the test slightly differently…and was too lazy to make a new screenshot. The 13 queries are still being executed and we’ll tackle those in the last article in this series.

Optimizing data access for high-latency networks: part I

Published by marco on in Programming

 Punchclock is Encodo’s time-tracking and invoicing tool. It includes a calendar to show time entries (shown to the left). Since the very first versions, it hasn’t opened very quickly. It was fast enough for most users, but those who worked with Punchclock over the WAN through our VPN have reported that it often takes many seconds to open the calendar. So we have a very useful tool that is not often used because of how slowly it opens.

That the calendar opens slowly in a local network and even more slowly in a WAN indicates that there is not only a problem with executing many queries but also with retrieving too much data.

Looking at query statistics

This seemed like a solvable problem, so I fired up Punchclock in debug mode to have a look at the query-statistics window.

To set up the view shown below, I did the following:

  1. Start your Quino application (Punchclock in this case) in debug mode (so that the statistics window is available)
  2. Open the statistics window from the debug menu
  3. Reset the statistics to clear out anything logged during startup
  4. Group the grid by “Meta Class”
  5. Open the calendar to see what kind of queries are generated
  6. Expand the “TimeEntry” group in the grid to show details for individual queries

 Time-entry queries are the problem

I marked a few things on the screenshot. It’s somewhat suspicious that there are 13 queries for data of type “Person”, but we’ll get to that later. Much more suspicious is that there are 52 queries for time entries, which seems like quite a lot considering we’re showing a calendar for a single user. We would instead expect to have a single query. More queries would be OK if there were good reasons for them, but I feel comfortable in deciding that 52 queries is definitely too many.

A closer look at the details for the time-entry queries shows very high durations for some of them, ranging from a tenth of a second to nearly a second. These queries are definitely the reason the calendar window takes so long to load.

Why are these queries taking so long?

If I select one of the time-entry queries and show the “Query Text” tab (see screenshot below), I can see that it retrieves all time entries for a single person, one after another. There are almost six years of historical data in our Punchclock database and some of our employees have been around for all of them.[1] That’s a lot of time entries to load.

 Query text for all time entries for one person

I can also select the “Stack Trace” tab to see where the call originated in my source code. This feature lets me pinpoint the program component that is causing these slow queries to be executed.

 Stack trace for superfluous time-entry queries

As with any UI-code stack, you have to be somewhat familiar with how events are handled and dispatched. In this stack, we can see how a MouseUp command bubbled up to create a new form, then a new control and finally, to trigger a call to the data provider during that control’s initialization. We don’t have line numbers but we see that the call originates in a lambda defined in the DynamicSchedulerControl constructor.

The line of code that I pinpoint as the culprit is shown below.

var people = Session.GetList<Person>().Where(p => p.TimeEntries.Any()).ToList();

This looks like a nicely declarative way of getting data, but to the trained eye of a Quino developer, it’s clear what the problem is.

In the next couple of articles, we’ll take a closer look at what exactly the problem is and how we can improve the speed of this query. We’ll also take a look at how we can improve the Quino query API to make it harder for code like the line above to cause performance problems.


[1]

Encodo just turned nine years old, but we used a different time-entry system for the first couple of years. If you’re interested in our time-entry software history, here it is:

  1. 06.2005—Start off with Open Office spreadsheets
  2. 04.2007—Switch to a home-grown, very lightweight time tracker based on an older framework we’d written (Punchclock 1.0)
  3. 08.2008—Start development of Quino
  4. 04.2010—Initial version of Punchclock 2.0; start dogfooding Quino

Question to consider when designing APIs: Part II

Published by marco on in Programming

In the previous article, we listed a lot of questions that you should continuously ask yourself when you’re writing code. Even when you think you’re not designing anything, you’re actually making decisions that will affect either other team members or future versions of you.

In particular, we’d like to think about how we can reconcile a development process that involves asking so many questions and taking so many facets into consideration with YAGNI.

Designing != Implementing

The implication of this principle is, that if you aren’t going to need something, then there’s no point in even thinking about it. While it’s absolutely commendable to adopt a YAGNI attitude, not building something doesn’t mean not thinking about it and identifying potential pitfalls.

A feature or design concept can be discussed within a time-box. Allocate a fixed, limited amount of time to determine whether the feature or design concept needs to be incorporated, whether it would be nice to incorporate it or possibly to jettison it if it’s too much work and isn’t really necessary.

The overwhelming majority of time wasted on a feature is in the implementation, debugging, testing, documentation and maintenance of it, not in the design. Granted, a long design phase can be a time-sink—especially a “perfect is the enemy of the good” style of design where you’re completely blocked from even starting work. With practice, however, you’ll learn how to think about a feature or design concept (e.g. extensibility) without letting it ruin your schedule.

If you don’t try to anticipate future needs at all while designing your API, you may end up preventing that API from being extended in directions that are both logical and could easily have been anticipated. If the API is not extensible, then it will not be used and may have to be rewritten in the future, losing more time at that point rather than up front. This is, however, only a consideration you must make. It’s perfectly acceptable to decide that you currently don’t care at all and that a feature will have to be rewritten at some point in the future.

You can’t do this kind of cost-benefit analysis and risk-management if you haven’t taken time to identify the costs, benefits or risks.

Document your process

At Encodo, we encourage the person who’s already spent time thinking about this problem to simply document the drawbacks and concessions and possible ideas in an issue-tracker entry that is linked to the current implementation. This allows future users, maintainers or extenders of the API to be aware of the thought process that underlies a feature. It can also help to avoid misunderstandings about what the intended audience and coverage of an API are.

The idea is to eliminate assumptions. A lot of time can be wasted when maintenance developers make incorrect assumptions about the intent of code.

If you don’t have time to do any of this, then you can write a quick note in a task list that you need to more fully document your thoughts on the code you’re writing. And you should try to do that soon, while the ideas are still relatively fresh in your mind. If you don’t have time to think about what you’re doing even to that degree, then you’re doing something wrong and need to get organized better.

That is, you if you can’t think about the code you’re writing and don’t have time to document your process, even minimally, then you shouldn’t be writing that code. Either that, or you implicitly accept that others will have to clean up your mess. And “others” includes future versions of you. (E.g. the you who, six months from now, is muttering, “who wrote this crap?!?”)

Be Honest about Hacking

As an example, we can consider how we go from a specific feature in the context of a project to thinking about where the functionality could fit in to a suite of products—that may or may not yet exist. And remember, we’re only thinking about these things. And we’re thinking about them for a limited time—a time-box. You don’t want to prevent your project from moving forward, but you also don’t want to advance at all costs.

Advancing in an unstructured way is called hacking and, while it can lead to a short-term win, it almost always leads to short-to-medium term deficits. You can still write code that is hacked and looks hacked, if that is the highest current priority, but you’re not allowed to forget that you did so. You must officially designate what you’re doing as a hot-zone of hacking so that the Hazmat team can clean it up later, if needed.

A working prototype that is hacked together just so it works for the next demonstration is great as long as you don’t think that you can take it into production without doing the design and documentation work that you initially skipped.

If you fail to document the deficits that prevent you from taking a prototype to production, then how will you address those deficits? It will cost you much more time and pain to determine the deficits after the fact. Not only that, but unless you do a very good job, it is your users that will most likely be finding deficits—in the form of bugs.

If your product is just a hacked mess of spaghetti code with no rhyme or reason, another developer will be faster and produce more reliable code by just starting over. Trying to determine the flaws, drawbacks and hacks through intuition and reverse-engineering is slower and more error-prone than just starting with a clean slate. Developers on such a project will not be able to save time—and money—by building on what you’ve already made.

A note on error-handling

Not to be forgotten is a structured approach to error-handling. The more “hacked” the code, the more stringent the error-checking should be. If you haven’t had time yet to write or test code sufficiently, then that code shouldn’t be making broad decisions about what it thinks are acceptable errors.

Fail early, fail often. Don’t try to make a hacked mess of code bullet-proof by catching all errors in an undocumented manner. Doing so is deceptive to testers of the product as well as other developers.

If you’re building a demo, make sure the happy path works and stick to it during the demo. If you do have to break this rule, add the hacks to a demo-specific branch of the code that will be discarded later.

Working with a documented project

If, however, the developer can look at your code and sees accompanying notes (either in an issue tracker, as TODOs in the code or some other form of documentation), that developer knows where to start fixing the code to bring it to production quality.

For example, it’s acceptable to configure an application in code as long as you do it in a central place and you document that the intent is to move the configuration to an external source when there’s time. If a future developer finds code for support for multiple database connections and tests that are set to ignore with a note/issue that says “extend to support multiple databases”, that future developer can decide whether to actually implement the feature or whether to just discard it because it has been deprecated as a requirement.

Without documentation or structure or an indication which parts of the code were thought-through and which are considered to be hacked, subsequent developers are forced to make assumptions that may not be accurate. They will either assume that hacked code is OK or that battle-tested code is garbage. If you don’t inform other developers of your intent when your’re writing the code—best done with documentation, tests and/or a cleanly designed API—then it might be discarded or ignored, wasting even more time and money.

If you’re on a really tight time-budget and don’t have time to document your process correctly, then write a quick note that you think the design is OK or the code is OK, but tell your future self or other developers what they’re looking at. It will only take you a few minutes and you’ll be glad you did—and so will they.

5 months Ago

Riding the wave

Published by marco on in Finance & Economy

When we talk about getting real about the Internet economy, we talk about acknowledging that there is real value there. And when we talk about valuation, we think we are talking about some measure of that—real value. The word “value” is built right into the word, so that must be what it means, right?

But what do we mean when we say “real value”? What kind of value or values? And, more importantly, value to whom? Is there only positive value? Or is there a negative component? Which part is larger? Is one part felt more by one group than another? Is that, perhaps, what makes an idea seem attractive? That the positive value lands squarely on proponents while the overwhelmingly negative value lands on others and is, optimally, not even felt or visible to the proponents? Nary a ripple of effect arrives to disturb the blissful, warm feeling of self-satisfaction for having blessed the world with a wonderful idea that has, as a purely fortuitous and utterly unforeseen or even looked-for side-effect, made those proponents fantastically rich.

But let us talk of ripples, and waves, in a bit.

Is it lasting value that provides jobs, income, stability, happiness? Or is it some form of value that can be *converted* to real value? There are strong arguments for thinking that the second case applies in many cases.

These are bubbles. Bubbles ride a wave of hype. There are those who are part of the wave and those who are the surfers on that wave. Some of the surfers will crash into the wave and be subsumed by it. Others will ride in to the beach and walk away dry. These are they who have converted the intangible, ephemeral, hyped value into real currency that can be used in the real world.

Many, many people lost money on Groupon, for example, while a very few made a lot of money on it. For those that made money, this form of investment is a fantastic idea – mostly because the rate of return on actual work done is so high. And what kind of work was done? What was created of lasting significance? People are able to participate even more energetically in an economy that isn’t really working for them—they are able to save money on purchasing things that they don’t need.

My suspicions are that these valuations come from the surfers. They have an overarching interest in convincing us all to be part of the wave. Without a lot of deep knowledge, there is no way for the average person to know which waves are real and which are not. The majority are not real and will not be of benefit to anyone but the surfers. Our system requires that we try them all, sorting through a haystack of bad ideas to find the needle that will drag our miserable heap of humanity a little bit forward.

But we mostly don’t even know in which direction “forward” is. There are multiple levels of con game going on here. We think we know what is of value, we think we choose the right waves, but we are brainwashed into working against our own best interests. We end up being happy with our choices—and continuing to make them—but we are, in effect, not really benefiting at all. We help to further anchor a system that suppresses us all. We make it ever easier for the surfers who stayed on their boards the first time to stay on their boards the next time. We turn them into Gods, perhaps as a defense mechanism. If someone is doing so much better than you, is it not easier for your ego and conscience to think that they are doing something much better? That they are much smarter than you? That is one way to go. The other, which we employ less and less, is to excoriate the surfers for the parasitic criminals that they likely are.

When we say that something is a good idea or a bad idea, we evaluate it against a patchwork of often-vague ideas and moral convictions about how the world works. Often, the choice seems quite straightforward and easy for almost everyone to understand. For example, if you propose to handle two problems in the world by converting poor people (let’s just start with the $2/day level used by the World Bank) into food and energy for the remaining population, then most of the world is going to tell you that this is a bad idea. There are some who will tell you that you would need a pretty sexy website and dead-simple mobile app for that service in order to get past the second round of VC funding but, on the whole, your idea will be rejected. Apologies to Thomas Swift for stealing his satire.

If you, however, propose something less overtly evil, something that is still materially useless but much more aligned with the current economy/society like making it easier for people to get cars when they need them, then this idea is greeted as an overwhelmingly good idea by the majority.

Now, let’s dissect that sentence a bit. Who is the majority? Why, the majority of the people I know, right? Or are you less solipsistic, more noble? Then you’d say the majority opinion is that which one reads in the major literature, the major news sources. But which agenda are they promulgating? These sources will, of course, greet this idea with open arms because they are in exactly the class that will benefit from it. They probably have disposable income that they can invest in the idea in order to try to become expert surfers (see above). Or, at the very least, they will be able to use their phones to get cars to pick them up wherever and whenever they need them. And it will be more convenient, with a minimum of interaction with other people (especially people outside of/below their actual or perceived class).

But something like Uber is an idea that will only benefit that class. Because we have what we consider to be ethics, many of us will need the idea to do a song and dance, convincing us that the idea is good for *everyone*, not just ourselves and our friends. There are others who have transcended this requirement. They are riding waves everywhere.

The people that are actually driving will likely not benefit in any meaningful way. Instead, there will be anecdotes of drivers who make it big—similar to the bauble dangled in front of the poor and undereducated by the State in the form of the lottery—but most will just be struggling to make ends meet in a different, but still futureless job than they’d been doing the year before.

This is only one part of the human/social impact. What about the environmental impact? Is it an overall good to promote ideas that cause people to drive more? To perhaps purchase more cars in order to benefit from this ad-hoc spike of an economy engendered by the massive influx of speculative capital in Uber? Do we even care? We do not. Because we are clawing desperately at our own boards, trying to get up there, to climb to our knees and, hopefully, to stand, at first on wobbling knees like a foal newly squirted from its mare’s womb until, with practice, we stand confidently, hanging ten with the other captains of industry. That is the dream.

It’s hard to imagine why, when the current taxi industry treats its actual workers so poorly—long hours, low pay—we would imagine that a company that is on the Internet will magically be infused with more generosity to its workers, opening its arms to the common man, offering to share its limitless bounty with him. These are just the latest incarnation of schemes with which to build waves, to allow those magical captains of our industry to arrive once more dry on the beach. To walk away Gods, adored by the burbling dregs receding rapidly from the shore.

We very quickly are able to ignore the horrors done in order to support these systems, simply because we want to benefit from those systems. We ignore how the materials for our phones are collected, we ignore how those phones are put together, so that we can have that which we want. Or that which we have been trained to want. That we ignore suffering is also part of our programming, part of our training. It is the suffering of others, the unworthy, those not like us. Coltran is reaped from the Congo, millions die in the Congo, the people who live there do not benefit in any way (other than their own surfers) and that is the way our heroes, the companies and entrepreneurs we worship, like it. They keep it that way so that *they* can benefit rather than those undeserving Congolese with their appallingly low market cap and utterly unappealing marketing.

We still have a system where the biggest, baddest, *meanest* dog wins. We just dress it up to make ourselves feel better. We feel better about it when we win. We feel better about it when we lose. But we are playing a game whose rules are determined by others.

On encouraging a prescriptivist to use more hyphens

Published by marco on in Miscellaneous

The title sounds like a self-post on Writing Prompts, but it describes quite accurately what I attempted to do when formulating a response to the essay Nobody. Understands. Punctuation. by Peter Welch (Still Drinking).

Below is the text of my mail to him.

I’ve read a few of your essays since you made such a splash with what you are probably aware is your making-it-to-the-big-time essay ”Programming Sucks” and I enjoyed the last one propounding descriptivism over prescriptivism.

A descriptivist is ordinarily well-shielded from any sort of grammatical suggestions because even the most bull-headed of grammar Nazis should know a lost cause when they see one. But I detected more than a whiff of someone who was interested in language and expression and, more than anything else, being *understood*. I liked the description of writing as being a medium through which “writers are trying to transfer the voices in their heads into yours.”[1]

I read through as you lavished attention on semicolons, the beloved em-dash, colons and commas. But you paid only lip service to the dash or hyphen, leaving it without an example in your essay.

I write because I thought that my dear friend the hyphen could have been more enthusiastically represented in a few other places in your writing.[2] I stumbled only a few times when reading your essay, across what I like to call “speed bumps”, which are where a punctuation choice causes me to read a sentence in an unintended manner until I end up in a dead end of sorts — a parsing error — from which I must back-track until I find the fork in the sentence at which I stopped following the author’s intent. In my experience, these forks often arise from a missing hyphen; that was the case in the following sentence:

“Punctuation started with periods that told the speaker when to take a breath, and as both a long time proponent of using the run on sentence to better communicate the ranting rage in my head over the nonsense that people choose to fight about in this country and a person who is occasionally asked to read his work out loud, I’ve come to value this original function in a visceral way.”

I would have had an easier time reading the sentence above had you included a hyphen between “long” and “time” and, ironically, “run” and “on”. Perhaps it’s just me and my penchant for hyphens. Their absence online and in many otherwise excellent essays by excellent authors occasionally gives me pause and reason to wonder whether my expectation of them is wrong-headed. Perhaps others are much more easily able to read “wrong headed” as a single word without a hyphen, or “back track” or “bull headed”. But, when I look these words up in a dictionary, there they are, defined in the official dictionary as having a hyphen.

I hope you take this missive in the spirit in which it was offered — and also as what it quite clearly also is: just a bit of fun writing on my part. I hope too that you are open-minded enough to consider whether maybe, just maybe, the hyphen could be paid a bit more attention and perhaps even wend its way into your writing where it would not only be grammatically correct but genuinely useful to the task of conveying meaning and fostering understanding on the part of the intended audience.

Cheers and keep up the good work. I follow your blog with pleasure.


[1] As your essay made abundantly clear, it’s not that a descriptivist doesn’t care about punctuation, but rather that punctuation rules shouldn’t become a distraction from the main task of imparting meaning. For example, I am strongly of the opinion that it matters not at all whether I place a period or comma within or without an ending quote because it has zero effect on the pacing of the sentence. I chose “inside” for old-time’s sake.
[2] As an aside, I want to stress that my attitude toward punctuation lines up quite closely with your own.[3] Did I consider briefly whether I should surround “the hyphen” in the previous sentence with commas? I did, but decided that my mental oration of it didn’t require them, so I left them out. Did I also consider whether “briefly” in the second sentence of this footnote should be bracketed — nay, strait-jacketed — with commas as well? I did, and moved on, once again, without them.
[3] Am I also a fan of footnotes as a source of humor and tangential information? I am.