Archive | programming RSS for this section

The race condition so unlikely it’s not worth worrying about will probably occur on day one.

I’ve spent the last week or so optimizing a set of scripts which analyse and report on competence scores for employee assessments. As with most optimization tasks, the key to improved performance was in the I-O, in this case reading a SQL database.

By replacing SQL reads with cached lists, accessed via Linq, I took several scripts which were taking over a minute to run down to a much more acceptable 5 seconds or less.

To improve performance further I decided to cache the data between page loads. To do this I implemented a singleton class which used a static instance to store the data between page loads. Since it was static this had the added benefit of all users being able to share the same data; so we only have to generate the instance once. Code fragment:

public class ReportingCache
{
private List<AssessmentInstance> assessments;
// more lists of objects…

private static ReportingCache _instance;

private ReportingCache()
{
Initialise();
}

public static ReportingCache Instance
{
get
{
if (_instance == null)
{
_instance = new ReportingCache();
}
return _instance;
}
}

// Getter for assessment, with just-in-time load of assessment list.
public AssessmentInstance GetAssessmentInstance(int id)
{
if (assessments == null)
{
assessments = new List<AssessmentInstance>();
new AssessmentInstance().Iterate("SELECT * FROM assessment_instances", AddAssessmentInstance);
}

AssessmentInstance assessment = assessments.Where(i => i.Id == id).FirstOrDefault();
if (assessment == null)
{
assessment = new AssessmentInstance();
}
return assessment;
}

private void AddInstance(AssessmentInstance assessment)
{
assessments.Add(assessment);
}

// More accessors/JIT loads.
}

Whenever a script needed to access, e.g., an assessment instance, instead of going to the database, it instead made a call such as:

ReportingCache.Instance.GetAssessmentInstance(id);

This worked really well, and there was very little else needed to bring the runtimes down. However, there was a potential problem.

If two users happened to simultaneously access the same list in the instance before it had initialized, they would both attempt to perform the initialization at the same time, leading to incorrect results or even an exception. I did some benchmarking and decided because list population took less than a second or so, and there were only a handful of users of these particular scripts, this was unlikely to happen very often, if at all, so decided to worry about it in the future if it ever became a problem. In other words, I filed the problem under YAGNI: “You Ain’t Gonna Need It.”

Release day was today, and I received a call around 11am to say that the script had thrown an exception, and would I look into it. It turned out that this was caused by the exact race condition I’d anticipated but dismissed as being so unlikely it wasn’t worth worrying about.

What I hadn’t considered was users talking to one another while they ran the reporting scripts, and in particular the site owner training other users in its use. He’d been on Skype with one of the end users explaining how to run the scripts, starting with the slowest running but also the most useful report. In a “Do it with me” session he and the user had both opened the report at the same instant, and one of the scripts promptly fell over. Fortunately, it was the owner’s script which died and not that of the end user, but it still needed fixing.

How to fix it quickly? I considering setting flags on the object while lists were being created, having the ReportingCache object script sleep for a few seconds on exception before retrying, and so on, but in the end I did the simplest thing I could think of: catch the exception in each reporting script at the highest possible level, present the user with a message that the report’s data was being regenerated, and auto-reload the page after a few seconds. It’s not the prettiest solution but it should work as a stop-gap while I work out something more permanent.

What I learned: the race condition with a one-in-a-thousand chance of occurring, is probably going to happen on day one.

 

Why I Don’t Worry About Being Wrong

I was reflecting on some of the technologies I’ve worked with over the years, and how relevant they are to the work I do today. In particular, I was thinking about my time at Abbey National, using a programming method known as JSP. No, that’s nothing to do with Java, you youngsters; it stands for Jackson Structured Programming. Invented by Michael Jackson (no, not that one!) it was the first real method I ever encountered. Well, other than the fag-packet-spec and the let’s-try-this-and-see-if-it-works-this-time methods so sadly familiar to programmers everywhere.

Read More…