In Praise of the Embaressingly Simple

Recently, while having lunch with some former members of my project the conversation drifted to some of the old code that's still around.   These guys are incredibly good programmers, and so many of their contributions are still running today - four to six years after they've left the project.

One of the items that we discussed was our "batch broker" - a process responsible for handing out unique batch ids - that uniquely identify processes, end up in logs, in audit tables, and sometimes tagged to rows in the database.

We laughed about how embarrassingly simple this process was: just a few dozen lines of python code that
  • open up and lock a file
  • increment the number within
  • close & lock the file
  • log the requester & new batch_id
  • return the batch_id
Our myriad batch programs (transforms, loads, publishes, etc) then simply call a bash or python function on their local system which calls this program remotely over ssh to get a new batch_id.   Total amount of code is maybe 50 lines across all libraries.

What makes this an embarrassing solution is that it isn't distributed (say, using ZooKeeper), is subject to downtime when its server gets upgraded or crashes, and it won't scale to hundreds or thousands of requests a second (it supports about 2), doesn't log enough information to make it easy to diagnose mis-configured requesters, and isn't resilient enough to be recovered from a toasted server without some amount of work.

And yet it deserves praise.   Because it has exceeded our requirements at a ridiculously low cost:  it was originally written in just a couple of hours over ten years ago and has offered 99.999% up time while assigning 26 million batch ids without a hitch.   It's a small enough process that any programmer can learn it in about 5 minutes and it shares hosting with other process.  ZooKeeper, in comparison, would have involved weeks of research, training and configuration time as well as multiple servers for deployment.

The next upgrade for this program is to move the id generation into a relational database, and store client arguments (process name, organization, etc).   That's pretty trivial and should improve diagnostics, recovery and speed.   And if our requirements change such that we can't afford any downtime then we'll look at something sexy.  In the meanwhile we'll have to be satisfied with cheap & simple.  And I'm OK with that.

No comments:

Post a Comment