Complex is just lots of Simple (Part 2)

This is the second in a two-part series; part one is here.

This has been a tough piece to finish — not because of the subject itself, which is super-fun, but because I keep getting distracted by unexpected behavior I want to understand. At nearly every turn, there’s something neat to see in this little world of evolving 2D cellular automata we’ve created. So bear with me as I try to boil down a lot of wandering into a few key points. There will be pictures!

Vertical Stripes and Hyperparameters

And the end of part one we taught our organisms to “black out” the grid — a simple task that could be optimally achieved with a single rule — and they did great. For the next few rounds I’ve made the goal a bit more difficult: turn the grid into a set of vertical one-pixel stripes, alternating black and white.

Our first fitness calculation for this is pretty straightforward: the first stripe can be either black or white, and the total number of correct pixels is divided by total pixels to get a fraction. Using a Von Neumann neighborhood and conservative parameters, the outcome was … horrible. Over three runs (details here, here and here):

Green is the best performance, red the worst and blue the average. A few pops but results regressed to 0.5 on every run — which is effectively a random grid (one out of every two pixels correct).

My first thought was, perhaps we’re just not getting enough variation. So let’s start tweaking the hyperparameters, i.e., the values that drive evolution. Mutation rate is an easy one, so we’ll increase that from 0-5% to 5-10% on each reproduction. Three more runs (here, here and here):

No love. Our changes did make a difference — there are more “pops” as we find potentially good solutions, but they don’t last and we regress again back to 0.5. But why? My next theory was that perhaps good solutions were being lost because they weren’t consistent. That is, a “random” rule is likely to get around 0.5 every time. But a rule that produces perfect stripes most of the time may perform terribly once in awhile. This corresponds nicely with real life — we don’t (usually) kick a decades-long good performer to the curb for a single failure.

To account for this I added a hyperparameter LastFitnessWeight, which attributes some fraction of fitness from the last iteration to the current one — the idea being that a success yesterday will lift your score today even if it’s an off day. Setting this to 25% gave these results (here, here and here):

Sad trombone noise. This is getting annoying — maybe the middle one showed some increased consistency, but really that’s just wishful thinking.

What we’re seeing here is one of the first rules (and a bit of a dirty secret) of digital evolution, and machine learning in general — hyperparameters don’t matter nearly as much as it seems like they should. With the right features and feedback you almost can’t help but succeed — and without them you’re usually hosed.

Fitness matters

Our fitness metric seems to make perfect sense — we know what each pixel should be, so the more pixels that are “correct,” the closer we are to a solution. But it turns out that that’s not quite right. Let’s look more closely at the history of one organism that did really well and then imploded:

This organism is the offspring of two parents that were basically generating random fields. About half of their pixels were correct, giving them fitness around 0.5 (see the blue highlights). For some reason this match created a really capable organism that for its first two generations delivered absolutely perfect (yellow highlight) scores — amazing!

But look what happened in the third generation (green highlight). It’s visually obvious that this is still a pretty good result, but because of the column skip on the left side (the double-wide white bar), all the pixels to the right were incorrect, so this promising organism was killed off (even with the history-preserving hyperparameter).

Tyranny of the mediocre

The end result of this dynamic is that over time the “interesting” organisms get squeezed out by mediocre but consistent ones (in particular all-white and all-black). This page details the final cycle of one such run: short-lived mostly random organisms at the top, newly-born random ones at the bottom, and a huge swath of 0.5 fitness blanks in the middle.

We can address this in two ways — both are pretty effective. The first is to simply use a better fitness metric. VStripesCombo combines two measures for a more balanced assessment:

  1. Stripey-ness” assesses the average length of a correct vertical stripe.
  2. Even-ness” rewards an even split between black and white pixels.

With this new metric, a solid block has fitness 0.25 (.5 for stripey-ness, 0 for even-ness), “interesting” organisms have a chance to succeed, and stripes emerge quickly. Finally, some success (here, here and here):

Another approach is to be more picky about who gets to reproduce. Our initial implementation kills off the bottom third of the population with each cycle, allowing the top two-thirds to reproduce. Since two-thirds includes that middle belt of consistent mediocrity, it can persist and grow.

Instead we can kill off the bottom half of the population, and allow each organism in the top half to mate twice. Just as with biological siblings, each mating crosses over and mutates differently, providing more chances for the strengths of the parents to compound.

As it turns out, this mode of reproduction also wins the day (here, here and here):

Strategies and weaknesses

The hallmark of evolved learning is solutions that our conscious, logical minds would never think of and often can’t really comprehend even after the fact. It’s frankly a little spooky. To wit, watch this organism solve the vertical stripes problem from random, along with the rules it employs. WTF man? (I have to say I do love the back and forth “wiggle” once it hits a final solution.)

All of these organisms were trained from a random starting grid. Running a few of them (all winners during training) from a single black pixel in the middle highlights two things: (1) their strategies are wildly divergent; (2) sometimes a strategy that tends to work in one case is an utter fail with a different starting configuration (last two examples below):

That second point can’t be overstated: you get what you train for — and we didn’t train for a single pixel initial state. Environment, fitness, reproduction rules, they all are critically important to the final product. This is going to come up again and again in the emerging world of AI. LLMs hallucinate because they have been rewarded for answering questions, not for saying they don’t know. We’d better get really, really good at this if we’re going to make it as a species (some more thoughts on that here).

You only know what you know

OK, enough with the stripes. For our next trick, let’s try to learn how to draw a frame around the edges of the grid — all white except for a one pixel rim around the edge. Seems pretty simple! Results are here, here and here:

Doh. It’s not even that it just doesn’t learn well — it doesn’t seem to learn at all. No matter what we do or how we define things, we can’t crack this nut. Why?

The answer is simple but important: there is simply zero information in the system about what an “edge” even is. Remember that the neighborhood computations “wrap” around so the grid appears to be an infinite plane. The edges are obvious to us when we draw the grid, but completely invisible to the organisms living inside it.

And you can’t “learn” something that you can’t perceive — it’s impossible, like asking a completely blind person to raise their hand when the lights come on. You can be mad about it, but it is what it is. This is surprisingly easy to forget, because evolved organisms are so good and finding subtle and non-obvious patterns, we just assume they’re omniscient. Nope.

OK, so let’s add an “edge” sense to our organisms by defining a new “relative” type in the Neighborhood class. When we include this new sense in our neighborhood, magic happens (here):

It’s a simple example, and perhaps not that shocking — by providing the boolean “edge” value, we enable the organism to effectively keep two sets of rules: one for the edges (turn them black) and one for everything else (turn them white).

But still, it’s cool. Just for fun, here’s a slightly less obvious example. By adding senses for which half of the grid a point is in (North/South, East/West), we can easily learn rules that expect different content in each quadrant (details here):

OK, that’s enough of a random walk for now. I could do this stuff forever, and each new lesson really does say something about evolution and learning in the real world. I hope I’ve put in enough eye candy to keep you entertained along the way, but even if I didn’t — it was good for me.

Wait just one more! I’ve been trying to teach some organisms how to split the grid diagonally, which proves to be a tough challenge. My best run so far is 5,000 cycles to get to a pretty consistent 0.95 fitness … but it don’t look great, folks. It feels like it has the right idea, but can’t settle into place (e.g., check out the lower-left quadrant here). Any ideas?

Complex is just lots of Simple (Part 1)

This is part one of a two-part series; part two is here.

Our world feels increasingly magic — I can have normal adult conversations with a computer; feel very much “in person” with my far-flung family playing VR minigolf; and sit back comfortably while my car drives me to California. Every once in awhile, we stupid, fallible humans build incredible, beautiful things.

But “magic” is also dangerous. When you rely on something that you don’t understand, you’re an easy mark. This annoys me every time I have to call in an expert to work on the house because I don’t know how to test (just a random example, not something that happened last month, but if it did happen, the guy was totally cool, I just don’t like being in that position) the pressure switch assemblies in my HVAC system.

Of course, the world is way too complex for us all to understand everything. But the good news is, complex things are just lots of simple things put together — and often understanding the simple version is good enough. If you know a bit about how real and fake neurons work, you can develop a pretty solid intuition for what LLMs are and aren’t good at. If you build a treehouse, you’ll gain some appreciation for how real houses work. And, the point of this article, if you code up a really simple genetic algorithm, full-blown evolution seems a little less supernatural (but a lot more awesome).

Evolution & Genetic Algorithms

The only honest knock on Darwin is just basic incredulity. “Come on, do you really believe that all the complexity of the human mind and body just spontaneously popped up, by random chance?” Often these are perfectly intelligent folks that believe evolution can do little things, like maybe select for sharper teeth in wolves — they just can’t buy the admittedly huge leap to a modern human.

And I get it, I guess. But the history of our species is basically just a long parade of thinking that things are magic or supernatural, figuring out that they’re not, and levelling up the magic another click until we figure that out too. So why are we always sure that this time is the one? Seems unlikely.

Watching simple evolution in action helps me buy that real evolution is comfortably up to the task of shaping the world we live in. And it turns out that building a digital environment in which to do that isn’t all that hard. It’s also super-fun, so let’s give it a try.

Genetic algorithms use digital versions of evolutionary concepts like crossover, mutation and fitness to iteratively solve problems. There are tons of ways to put them together; I wanted to start from scratch and build things up one step at a time. If you’re so inclined, I hope you’ll build and run the code yourself — it’s all open source and up on Github.

I should mention up front that I have basically no formal background in this stuff — I played with GA’s a bit in college but that’s it. So we’re truly exploring together here; apologies in advance if I do or say something stupid.

2D Cellular Automata

Before we get into the “genetic” part of all this, we have to create the world our evolving organisms will inhabit. For reasons that will become clear later, two-dimensional cellular automata provide a lot of advantages, so we’ll use that.

These worlds are two-dimensional grids of squares, where each square is “on” (black, true, or alive) or “off” (white, false, or dead). As time passes, the squares change value based on their current value and those of their neighbors (the cells surrounding them) according to some set of rules.

The most famous set of 2DCA rules is called “The Game of Life” — a surprisingly simple configuration devised by John Conway that generates satisfyingly rich behaviors. Life considers the cell itself and each of its eight neighbors (N, NE, E, SE, S, SW, W, NW):

  • If the cell is alive and
    • has 2 or 3 live neighbors, it lives on to the next cycle,
    • otherwise it dies.
  • If the cell is dead and has exactly 3 neighbors,
    • it comes to life in the next cycle,
    • otherwise it stays dead.

The only tricky thing about this is what happens at the edges of the grid, where there are no “neighbors” on one side or the other. Typically implementations “wrap” the grid around itself so that, for example, the neighbor to the west of square (0,0) is (dx-1,0) where dx is the width of the grid.

Life rules generate some pretty neat patterns — shapes that blink or oscillate, others that move across the grid, some that stay static, etc. The animation below shows a few common patterns in action; follow through to the Wikipedia page for an interactive version you can play with.

OK, back to business. Life rules generate some visually cool stuff, but they’re only one example of the tons of possible rule sets we could apply. That’s the crux of what we’re going to do here — use evolutionary processes to discover rule sets that accomplish something we’re interested in. The general approach is this:

  1. Establish a goal state for the world (grid). A very simple example of this might be “All squares of the grid are black.”
  2. Create a bunch of organisms (rule sets) at random.
  3. Let each organism exist for awhile and then measure how close it is to the goal.
  4. Kill off the worst and mate the best to replenish the population.
  5. Repeat.

In this model, each “organism” inhabits its own “world” — there is no direct organism-to-organism competition for resources or space. Obviously this is a departure from natural evolution, in which organisms typically go head-to-head. But there is still performance-based competition for mates, so it works out.

We’ll see this again and again — there are infinite ways to tweak an evolutionary process. The real world is so wide, and so basically eternal, that nature just tries them all. We have to be a bit more judicious, but there are still a ton of different levers to pull.

Concepts and Code

Bitmaps, Edge Strategies and Neighborhoods

The Bitmap class is the workhorse of this whole system. Space and speed are important, and we do pretty well at both by cramming the bits into an array of longs. Also as we’ll see later, the array-of-longs approach helps with some other evolution-y stuff.

This class also defines the EdgeStrategy enum, which defines how the class should respond when asked about a coordinate that is off the grid. We use the “Wrap” strategy almost exclusively, but the alternatives might be useful in specific cases.

The Neighborhood class encapsulates approaches to identifying the relevant context for a particular square in the grid. The rules of “Life” which we saw earlier use a Moore Neighborhood, which is basically the 3×3 grid centered on each square. The Von Neumann Neighborhood is also common, which excludes the diagonal corners of Moore. There are others as well.

The neighborhood defines everything that a square “knows” about its environment at a given point in time, so it’s obviously super-important to the learning process. We’ll see this in action in part two of the series.

Organisms, rules and “DNA”

Unpacked, real DNA is basically a sequential chain of four nucleotide bases (A, C, G and T). In order to apply genetic algorithms to digital organisms, their digital DNA must also be representable by a chain of primitive building blocks.

Digital DNA must also be resilient to random mix-and-match operations during reproduction. A single mutation in an organism’s genes can be:

  1. Irrelevant. Much of our DNA is “non-coding” and a mutation within these regions may be pretty much unnoticeable (ok it’s a bit more complicated than this, but close enough).
  2. Advantageous. A mutation may make the organism more fit for its environment — maybe the shark’s teeth angle back a little more to hold onto prey.
  3. Disadvantageous. Perhaps it makes the organism unable to create a particular enzyme, like the lactase that helps us digest dairy products.
  4. Catastrophic. A mutation may render the organism completely unviable due to a structural problem that stops the DNA from functioning at all.

Evolution doesn’t work very well if #4 happens with any real frequency.

This is a major design challenge for GAs, but manageable for our particular problem. The NeighborhoodRulesProcessor class uses our array-of-longs Bitmap approach to create a sequence of bits that can be easily manipulated, perhaps to advantage or disadvantage, but without damaging their viability.

This next bit is a little hairy; bear with me. Or just skip to the next section, understanding that our rule sets are represented by a resilient array of long integers. Here goes. Neighborhoods are stored as arrays of relative coordinates: e.g., (0,0) is the target square itself, while (-1,0) represents one position to the West. Neighborhood Rules assign each of these relative coordinates to a bit in an integer. For a Neighborhood that looks at X squares, this results in 2x possible integers: 32 for Von Neumann, 512 for Moore.

The “outcome” for each of these integers is either “black” or “white”, which we represent using a one-dimensional bitmap indexed on the integer itself. The array of longs underlying this bitmap is our DNA, which is pretty cool. Any bit in the rule set can be altered and we will still have a viable outcome — maybe better or worse, but never catastrophic. Sweet.

Fitness

Once our population of organisms has run for awhile, we need to asses how “well” each one did, so we know who should die off and who should hook up.

This is the job of the Fitness class. The simplest type is MostOn, which simply counts the number of black squares and reports it as a fraction of the total. The best possible fitness score in this case is 1.0 — solid black.

Fitness can really be anything measurable — in part two we’ll look at vertical stripes, alternating black and white one-pixel wide vertical lines. We’ll also see how different ways of measuring Vertical Stripes fitness can make a huge difference.

Selection

The Reproduction class sorts the population by fitness and uses that ordering to decide which organisms should reproduce (the best two-thirds) and which should die off (the bottom third).

The reproducing population is paired up using a strategy defined in the PairingType enum. The default is “Prom,” which pairs the organisms ranked #1 with #2, #3 with #4, and so on. Random mixes this up so anyone can pair with anyone. There are other ways to do this too — March Madness-style bracket seeding, anyone? It also might be interesting to change the proportions and allow some or all of the winners to have multiple offspring.

Again, this isn’t exactly the way it happens in real life. But it’s close enough — and we get more levers to play with along the way.

Crossover and Mutation

The last bit we need to code up is the actual reproduction between two organisms — combining the DNA of each parent into a brand new, novel offspring. This happens in two steps:

  1. Crossover takes subsections of each parent’s DNA to create a new sequence by picking a random set of indices to be “swap” points. Bits are taken from parent #1 up to the first index, then we start taking bits from parent #2 until we hit the next one, we swap again, and so on. The number of crossovers is random but subject to a configured maximum.
  2. Mutation takes the new DNA strand and twiddles a few bits at random, again subject to a configured minimum and maximum rate of mutation.

The new organism takes the places of one that didn’t make the cut, and we start the whole process over again. If things go the way we hope, maximum and average fitness for the population goes up and up until we, seemingly by magic, have found our answer.

Putting it all together: Evolving a blackout

Our first run will be a 25-cycle evolution of 200 organisms trying to turn all squares in their environments black. Each cycle will run for 200 iterations over a 100×100 grid initialized with a random pattern. (Fun fact: it’s a very poor choice to start with an empty (all white) environment for this challenge — can you guess why?)

We’ll use the same Moore neighborhood as we did for Life. We’ll use Prom-style pairing, allow a maximum of 10 crossover points, and mutate at a rate between 2.5% and 7.5%. This mutation rate is way higher than in nature, and while it can create some chaos, it also introduces novel configurations more quickly, which can reduce the number of cycles required to progress.

TLDR: the results are here.

…and it’s pretty cool! Now to be clear, this is an easy task. The single rule “no matter what is in my neighborhood, turn me on” will accomplish it in a single iteration. But our organisms didn’t know that. They each started with a totally random set of rules, and all we did is measure how that random set did, pick the best and mix/mate them together, and try again. This graph shows the best, worst and average fitness scores over each of the 25 cycles; by cycle 17 we’d found rules that seem to work perfectly:

Another fun way to look at this progression is to look at the best-performing result at key points along the way:

The results page for this run has a lot more detail — be sure to check it out. You can see the outcomes for every organism at each cycle, which really hammers home the trend of getting better over time. Clicking on any block will show you the history of the organism that created it, including its parents. These can be quite surprising — for example, the overall winner was born from two parents that each performed just barely better than random.

We’ll see more of this next time, but you can also get a sense for the different strategies that can emerge. In cycle 18 for example, there are dots and lines and blobs and all sorts of mechanisms at work.

Success is Usually Messy

The last thing I’ll call out from the blackout run is that evolutionary success is rarely what you’d expect. I pointed out above that you can achieve a blackout in one iteration with a single rule — but that’s not what our evolution produced.

Moore neighborhoods generate 512 individual rules, and that’s just hard to look at. So I ran the blackout evolution again using a Von Neumann neighborhood of 32 rules. Results for that run are here — similar except in this case we got really lucky and one organism hit perfect fitness on the very first run.

Anyway, the rules for the winning organism in this run look like this; the top section are rules that turn their cell white, and the bottom turn their cell black:

This is a super-effective rule; the organism ran for five cycles and was perfect every time. Looking at the rules visually, a few things pop out:

  • There are significantly more black rules than white.
  • Only four rules (highlighted in red) make the grid “whiter” — all others are either neutral or black.
  • Progress reinforces progress — ignoring the center value, all of the rules with three or more black inputs have a black outcome.

Run over 200 iterations, these rules are basically guaranteed to get us to a blackout. But it sure is a roundabout trip compared to the “optimal” rule (I’ve put an animation of our winning rule going through just 12 iterations at the end of the article). However, it’s important to understand that, given our fitness rules and environment, our evolved rule is exactly as good as that “optimal” one. As long as the blackout was attained by iteration #200, it did the job perfectly. Nothing about our world indicated that speed mattered — only the final outcome.

OK, that’s enough for this session. We’ve done a lot — learned about 2D Cellular Automata, wrote code that lets us mimic evolution in digital form, and even saw the first glimmers of some pretty cool outcomes. Next time I’ll get deeper into the weeds so we can really see how this machine ticks. There are just unlimited cool things in the world.