Puns & Finger Guns: Sleigh Lady Sleigh

This is a republication of a blog post from my old Wordpress site. Some formatting may be broken. Originally published January 30, 2014.

Introduction

What follows is my partial attempt at solving the Santa’s Sleigh problem posted on the Kaggle data mining competition website. More importantly, it was an opportunity to reacquaint myself with the Python ecosystem and particularly Numpy.

The Problem

The problem we are solving is one of packing efficiency: what’s the best way to pack a million variously-shaped boxes into a “sleigh” of fixed footprint and infinite height? The answer will depend on how packing efficiency is defined: do we wish merely to minimize empty space in the sleigh, or do we also want to keep the height of the present stack low? Does the order in which the presents are packed matter?

In this particular definition, only height and present order matter. The exact function used to evaluate a solution is a little opaque, so I’ll try to convey the gist of it in the figure below:

Possible present configurations for a three-present load — Notice that floating presents are allowed!

In the BAD case, the presents are out of order: if you were to reach into the sleigh, you would hit Present 2 before Present 1 (only the tops of the presents are considered when deciding the order). Moreover, the present pile in the BAD solution is taller than the BETTER and BEST cases, which will penalize it. In the BETTER case, the presents are in the correct order (again, order only depends on the tops of the presents) and the maximum height is lower. In the BEST case, we improve on the BETTER case and rotate Present 1 to compress the height further without sacrificing order. Note that the contest rules allow the presents to float within the sleigh. A Christmas miracle!

It is trivially easy to create a solution with perfect ordering. If we imagine taking the BETTER or BEST solutions and adding more presents — say Presents 4, 5, and 6 — we need only ensure that in this next layer of presents the tops are flush as in the first layer. This is called the Top Down solution and was provided as a benchmark for the competition.

It’s a bit more difficult to optimize the maximum height. One could, for instance, pack the biggest presents first and then try to fill in the gaps with progressively smaller presents: essentially trying to maximize the density. Such a solution will, however, probably have a terrible ordering.

The trick, of course, is minimizing maximum height at the same time as maximizing order. In contrast to the simple examples above–and adding greatly to the difficulty–the actual problem is 3D, not 2D, and tasks us with optimizing not three but one million presents!

Genetic Algorithms

Genetic algorithms are a class of optimization techniques best employed when:

You don’t know how to make a good solution from scratch,
But you know a good solution when you see it.

In the sleigh problem, we really don’t know how to build a good solution from scratch but we do know how to rank any set of candidate solutions based on the height plus ordering metric described in the previous section.

Genetic algorithms are biologically inspired, and seek to mimic evolution via selection. Members from the population of candidate solutions are selected according to fitness, and reproduce: their genes are mixed–a process known as crossover–and the next generation is born. The next generation may also undergo mutation, wherein members of the new generation are randomly altered, usually only slightly. Crossover is typically the main driver of generation-to-generation improvement, but mutation is imprortant in preventing the program from getting stuck in a local minimum.

Broadly, then, the genetic algorithm works like this:

Select some members from the population for crossover according to fitness (members of the first generation are randomly generated)
Crossover parent individuals to create children for the next generation
Mutate some fraction of the population
Repeat

In the context of the sleigh problem, fitness is straightforward: the ordering plus height metric. But how do we encode the solution into genes that we can crossover and mutate? There is no unique answer, but I chose to model the solution like this: imagine taking the presents in order, one at a time, and dropping the presents into sleigh. On each present is written the xy-coordinate to drop from, the orientation of the present, and how much the present should float above whatever is beneath it. Here is a simple cartoon illustrating this system:

Filling the sleigh, TETRIS-style — Present position, orientation, and “float” amount is stored with each present

Modeling the genes this way makes setting up crossover and mutation a snap, but has at least one serious limitation: since the presents are always dropped in order there’s no chance for, say, Present 2 to get under Present 1. This could be fixed by allowing the presents to be dropped in different orders, but that would make the crossover implementation much more difficult.

Mutation

Now that we have a way to describe the genes, let’s describe the possible mutations to presents within the gene:

Change position: change the location at the top of the sleigh from which we drop the present.
Change orientation: rotate the present about one of its axes.
Change floating height: alter how high the present floats above anything that’s beneath it

Mutation types — The three types of mutation allowed. In “Change Position,” Present 2 has shifted to the right slightly. Hence, Present 3 falls on top of it.

In each successive generation, some of the presents in each new gene will mutate. The fraction of presents that mutate is up to us, and we will explore that briefly in an upcoming section.

Crossover

There are many different ways to choose which individuals get selected for crossover, but any selection method should somehow favor fitter solutions. In this implementation, I used tournament selection, primarily because it is straight-forward to code.

Thanks to the way we set up the genes, the mechanics of crossover is also straight-forward. We randomly select some point along the series of presents and “cross” the line of presents to that point with the line of presents after that point in the mating individual. This process is illustrated below. Note the crossed lines in the middle figure — this is the the actual crossover step:

Crossover splits two genes according to some rule, and recombines the parts

Selecting the Parameters

There are a lot of “knobs” we can adjust in our genetic algorithm implementation. To give you an idea, I’ll focus on mutation rates. To test which mutation rates were ideal, I ran my program for 50 generations on the first 648 presents while varying my mutation rates. I chose 648 presents because the Top Down benchmark solution has 648 presents in the first layer. Here are the results (a lower Mean Fitness number means a better result):

Mean fitness of the population after fifty generations. Population: 30. Lower numbers are better. Other parameters were adjusted, including crossover parameters. For simplicity, only mutation parameters are shown. The blue lines connect the means of each bar.

“rotateFrac” is the fraction of presents that get the “Change Orientation” mutation described above. A value of 0.05 means 5% of the presents in a child get a new orientation. Similarly, “htBufFrac” is fraction of presents which get the “Change Floating Height” operations and “newPosFrac” is the fraction of presents which get the “Change positions” mutation. You can see that best results are achieved when newPosFrac=5%, htBufFrac=10%, and and rotateFrac=5%. This gives you, at best, a Mean Fitness just over 95000 after 50 generations. To give you an idea of how far we have to go, the Top Down Benchmark solution has a fitness score of just 172 for the first 648 presents!

Armed with good parameters, we can now run the genetic algorithm for as many generations as we like. Let’s try 15000 generations and see how we do. Again, I’m only optimizing 648 presents.

Fitness of each individual over time with best-guess mutation parameters. Points are colored by memory-location. Lower Fitness values mean better solutions.

Since random initial solutions are so bad, it’s easy to improve them quickly: note the precipitous drop in the first 500 generations — a 30% improvement. After that, each fixed percentage of improvement requires more time. To get from 80000 to 70000 takes the remaining 14000 generations!

Is the optimization asymptotic? That is, has our genetic algorithm reached its limit after 15000 generations, or is it still generating better solutions? Looking at the plot above, you might guess our program has “flat-lined” and is just incestuously combining very similar solutions. To see if this really is the case, we can zoom in on the tail end of the above plot. Let’s focus on the last 3000 generations:

Each unbroken horizontal line represents a persistent individual (the colors only reflect the in-memory location of the individual). We can see that the fittest individual in the population — with fitness around 71500 — was only created at around the 14700th generation. Empirically, then, we can see the program is still optimizing the population. Whether it is too expensive computationally to proceed is up to us.

Digression: Height-only optimization

What happens if you optimize only on height? This is not part of the competition, but I was curious since the height component of fitness is simpler than the ordering component. We might be able to push our algorithm closer to the breaking point — that is, where diminishing returns make further improvements to the population unfeasible.

As above, I ran tests over the parameter space to determine which mutation parameters to use: 5% for both new positions and new orientations. I set the floating height to zero for all presents. Below are the results after 15000 generations:

Fitness of each individual over time with best-guess mutation parameters. Points are colored by memory-location.

The most obvious feature is the rapid optimization of the early solutions. There is a roughly 100% improvement in the first 50 generations. Next we see the familiar linear to asymptotic progress as the generations march on. Let’s look at the tail end again:

Remember that there are 30 individuals in this populations. Yet by the 15000th generation, there are only three distinct fitnesses. I didn’t investigate the individual solutions, but this probably means they are “stuck” and largely identical. However, mutation is still doing its job here in preventing the program from getting completely stuck: even at generation 14000, at least one fitter solution emerged.

Conclusions

I never scaled up my program to attempt to tackle the full one-million presents problem. In fact, the only solution I submitted was from a random solution generator I wrote just to make sure I understood the problem! It didn’t fare too well…

In fact, it was super-shitty!

My main mistake was not writing my program to take advantage of multiple cores from the beginning. That’s really too bad, because it should have been pretty trivial to parallelize. Given the ubiquity and low cost of computing clusters these days, that was pretty severely hamstringing myself.

I also had to spend a lot of necessary time re-learning Numpy and “Pythonic thinking.” This limited the time I could spend implementing alternate solutions. For instance, I had set up the program to mix benchmark solutions into the population, but ran out of time to really put it through its paces.

In the learning aspect, though, I consider this project a modest success. Now, onto pandas and sci-kit, and more Kaggle competitions!

Some source code is available at https://github.com/rhsimplex/santas-sleigh

Puns & Finger Guns

Pages

Wednesday, January 28, 2015

Sleigh Lady Sleigh