Artificial Intelligence

Human beings can be incredibly strange at times.

For various reasons, we have multiple build clusters with independent build queues. Builds are normally queued at developer request (we do a feature branch workflow that doesn’t really lend itself to more traditional CI). The tool to do this takes either a command-line argument to specify the cluster to use, or it will prompt for it if none is specified. The queues for each cluster are visible via Buildbot’s standard web UI.

Now you’d think that, given this scenario, developers would (in the interests of getting their build results back ASAP) find the cluster with the shortest queue and send their request there. After all, that’d be the logical path. It turns out, however, that this isn’t always the case. One particular scenario that would crop up is that one of the clusters would get a reputation for “always having a long queue” – and thus developers would start simply specifying the other cluster explicitly… with the resulting effect that suddenly, the cluster which supposedly was less clogged… would then be clogged with a greater-than-average build load. Other developers would simply pick one cluster arbitrarily to always run their builds on, not paying any attention at all to what the current load was.

The solution turned out to be fairly straightforward: take out the prompt if no cluster was explicitly specified, and just pick a random cluster (and of course, inform the developer which had been picked). Laziness won out in the end, and the randomness resulted in a fairly even distribution of load. So the next time someone asks you about scheduling algorithms, you can tell them that rand() > developers.


About this entry