2012-09-20

In Praise of the Embaressingly Simple

Recently, while having lunch with some former members of my project the conversation drifted to some of the old code that's still around.   These guys are incredibly good programmers, and so many of their contributions are still running today - four to six years after they've left the project.

One of the items that we discussed was our "batch broker" - a process responsible for handing out unique batch ids - that uniquely identify processes, end up in logs, in audit tables, and sometimes tagged to rows in the database.

We laughed about how embarrassingly simple this process was: just a few dozen lines of python code that
  • open up and lock a file
  • increment the number within
  • close & lock the file
  • log the requester & new batch_id
  • return the batch_id
Our myriad batch programs (transforms, loads, publishes, etc) then simply call a bash or python function on their local system which calls this program remotely over ssh to get a new batch_id.   Total amount of code is maybe 50 lines across all libraries.



What makes this an embarrassing solution is that it isn't distributed (say, using ZooKeeper), is subject to downtime when its server gets upgraded or crashes, and it won't scale to hundreds or thousands of requests a second (it supports about 2), doesn't log enough information to make it easy to diagnose mis-configured requesters, and isn't resilient enough to be recovered from a toasted server without some amount of work.

And yet it deserves praise.   Because it has exceeded our requirements at a ridiculously low cost:  it was originally written in just a couple of hours over ten years ago and has offered 99.999% up time while assigning 26 million batch ids without a hitch.   It's a small enough process that any programmer can learn it in about 5 minutes and it shares hosting with other process.  ZooKeeper, in comparison, would have involved weeks of research, training and configuration time as well as multiple servers for deployment.

The next upgrade for this program is to move the id generation into a relational database, and store client arguments (process name, organization, etc).   That's pretty trivial and should improve diagnostics, recovery and speed.   And if our requirements change such that we can't afford any downtime then we'll look at something sexy.  In the meanwhile we'll have to be satisfied with cheap & simple.  And I'm OK with that.

13 comments:

  1. Nice Post! It is really interesting to read from the beginning & I would like to share your blog to my circles, keep your blog as updated.
    Regards,
    Hadoop Training in Chennai|Big Data Training in Chennai

    ReplyDelete
  2. Nice Post! It is really interesting to read from the beginning & I would like to share your blog to my circles, keep your blog as updated.
    hadoop training in chennai

    ReplyDelete
    Replies
    1. I have read your blog its very attractive and impressive. I like it your blog.

      Digital Marketing Company in Chennai Digital Marketing Agency

      Delete
    2. Java Training Institutes Java Training Institutes Java EE Training in Chennai Java EE Training in Chennai Java Spring Hibernate Training Institutes in Chennai J2EE Training Institutes in Chennai J2EE Training Institutes in Chennai Core Java Training Institutes in Chennai Core Java Training Institutes in Chennai

      Java Online Training Java Online Training Java Online Training Java Online Training Java Online Training Java Online Training

      Delete
  3. Nice Post! It is really interesting to read from the beginning & I would like to share your blog to my circles, keep your blog as updated.
    ssrs training in chennai

    ReplyDelete
  4. Excellent post! I heve read your blog it's very interesting and informative. Keep sharing.
    erp providers in chennai | erp software solutions in chennai

    ReplyDelete
  5. Interesting post! This is really helpful for me. I like it! Thanks for sharing!

    Webseiten Gestaltung Lüdenscheid

    ReplyDelete
  6. I found a lot of interesting information here. A really good post
    office 2010 professional plus key deutsch

    ReplyDelete