The Right Tool for the Job

I hate calling folks to fix stuff at the house. I’m kind of an introvert, and having people hanging around just puts me on edge. But more than that, it seems like I ought to be able to do these things myself. And often I can, albeit with an extra trip or ten to the home store.

But sometimes you just need somebody who really knows what they’re doing. And I don’t begrudge this when the situation calls for training and experience. The trades are deep and complex crafts — I admire anyone who has mastered one.

On the other hand, sometimes the only difference between me and “the guy” is that they have the right tool for the job. And that drives me insane — there is no way for me to justify getting a hundred foot power auger or an electrician’s wire puller, but the voice in my head won’t shut up: if you only had one, you could do this yourself!

All of which is just a long-winded way to point out that, especially when you start with the wrong tool for the job, using the right one is a transcendent experience. Nothing makes you appreciate a pair of hose clamp pliers quite as much as a half hour scraping your knuckles with a pair of regular ones.

After re-learning this lesson no less than three times just in the past couple of weeks, I figured it was worth a few words. Let’s see if you agree.

1. There’s a reason they call it a jigsaw

The last phase of Operation Ventura has us changing up the surface of our deck, which admittedly is just ancient poured concrete with more than its share of small cracks. Lara found this amazing Australian company that creates interlocking deck tiles using recycled wood and HDPE plastic. So a few weeks ago a full-on pallet of these things showed up in our driveway. Time to get out the dolly!

The product (creatively named “DECKO”) is really great — I’ll live with it awhile longer before giving a final recommendation, but installation is a breeze and so far they do just fine with our big umbrella and chairs rolling around. Each tile interlocks with its neighbors, and as long as your base surface is flat there is no need for screws or glue. Woot!

But of course the deck isn’t exactly square and it isn’t exactly the perfect size, so at the edges I needed to cut tiles to fit them around railings and posts. Many of the cuts were straight, but others needed to be notched or otherwise re-shaped.

I don’t have a ton of tools here in Ventura, so I needed to buy a saw. The irregular cuts need a jigsaw, so that was easy. And I convinced myself that it could manage the straight cuts as well, using a simple jig to track the parallel edge.

A half dozen destroyed tiles later, I realized that was a really stupid idea. The tiles are super-dense; a jigsaw cuts well enough for small areas, but just doesn’t track a consistent line across a full tile. At least, not unless I wanted to spend ten minutes on every twelve-inch cut. Jigsaw gotta jig.

So I got a chop saw. The cost was tough to eat, and I don’t know where I’m going to store it, but the straight cuts are perfect and quick and painless. I excuse myself by saying that $500 for a saw is still waaaay less than if I paid somebody to install the tiles for me.

2. Sometimes you just need a screw (ha)

An old friend of mine coined the term “CTO physique” which honestly captures me pretty well. I stop paying attention and gain some extra pounds, then eventually knock it down, and then it slowly creeps up again. I accepted this pattern long ago, and it works for me.

It really is all about attention — by tracking how many calories I eat, I can lose weight with pretty minimal work (to be clear this is MY pattern; there are of course many others). Years ago I had a little pocket-sized booklet with a paper dial you could spin to count daily calories. It was phenomenal; best invention ever and far superior to complicated phone apps. But no longer in print and $21 on eBay is too rich for my blood.

The dial is the key, so I decided to design one for my 3d printer. Pretty simple: two discs sandwiched together with a little window with a pointer that keeps count. The only trick was to connect the two discs together so that they’d rotate smoothly when I wanted them to, but not when the counter was sitting in my pocket.

My plan was to print one disc with posts that would press through a hole in the other one, using the elastic pressure of the material to hold them together. Easy, right? Well, let’s look at (just a few) of my attempts:

HA! It turns out that at this small scale (the discs are each 2mm thick), it’s quite difficult to print an accurate post with sufficient elasticity to hold securely without snapping. I won’t go into the details of PLA vs PETG vs ABS filament — and I’m not saying it’s impossible. But it ain’t easy, especially for a relative 3d novice like myself. Printing is just not the right tool for this job.

But it turns out that a Chicago screw is perfect. You often see these used in leatherwork; a two-part fastener that screws together to pull layers of material against each other. Dialing up or down the pressure is makes it easy to find the “sweet spot” with enough friction to turn without slipping on its own. It even looks good!

Part of me is disappointed that I had to abandon an all-printed solution. But a few weeks (and about five pounds) into this round of weight loss, the Chicago Screw has performed flawlessly — definitely the right tool for this job.

3. Don’t let the junior developer (AI) pick the framework

This one probably deserves its own post; the more I learn about coding with AI, the more interesting it is. But that’s not why we’re here today, so that’ll have to wait.

I love a good road trip. Lara and I drive between WA and CA a few times a year, and I’ve been lucky enough to do a few near cross-country routes over the last little while. There’s something about a freeway that I just love — leave your driveway, start moving, and you can go anywhere. Pure escapism.

But the one thing freeways are not good at is giving you a sense of place. The scenery can be beautiful, but the highway system itself is pretty generic (which is not to say I don’t love a good Love’s!). For years I’ve wanted to write an app to provide that missing context — and over the last couple of weeks I finally got it done with the help of my friend Claude Code.

Points” is not a routing app — it’s meant to run side-by-side with whatever you use for navigation, on a separate device. Its sole purpose is to “look around” your current position and identify cool stuff that you might not otherwise notice: natural features like mountains, rivers and beaches, historic sites and buildings, parks and tourist attractions, that kind of thing.

As you drive, every minute the right pane will update with a new point of interest. If you can’t wait a whole minute, click “Next” to see another one. Click the bell icon to have the device chime each time a new point is shown, so you can keep your eyes on the road. If you want to save one to look at more deeply later, click “Share” to save it away.

The coolest part is the AI integration — although I didn’t want to pay for the entire world, so you need to supply your own Claude API key to use it. For clarity, I never see your key; the app runs completely in the browser and the key is saved only on your device.

When you click “More”, the app asks Claude to generate a one- or two-paragraph description of the point of interest. The response is shown on-screen and read aloud automatically, again so you can keep your eyes on the road. I LOVE this feature — the AI picks out amazing fun facts for incredibly obscure points.

Anyways — I mentioned that I built the app with Claude Code, which was fantastic especially because some of the geolocation work was really gnarly. It was brilliant to be able to describe the behavior I wanted (e.g., “focus your search around the area where I will be in five minutes, based on current direction and speed of travel”) rather than deal with the radians and degrees and Earth’s curvature and all that insanity.

However, when presented with the job of building a web site, Claude really really loves React. And don’t get me wrong, I do too — it’s my go-to framework for building apps. It just turns out that it was absolutely the wrong tool for this job.

Other than the geo stuff, the app is pretty simple: show a map on the left with your current position, find points of interest and pop them in on the right. A few background timers keep track of the user’s location, make sure we keep a “queue” of points by calling Wikidata periodically, and swap in new content when appropriate.

The problem is that React has a very “opinionated” idea of state management — and while timers and global Javascript objects and such can work within this structure, it’s a bit of an awkward struggle. And I finally realized that I wasn’t even getting anything out of React in this case — Claude and I just used it out of habit.

Much like trying to install hose clamps deep inside a washing machine with needle-nose pliers, React was just the wrong tool for the job. In about twenty minutes, I rebuilt it as a simple, plain-old Javascript (ok, JQuery), HTML and CSS one-pager. Suddenly everything fit together perfectly, changes were easy, and the code made sense again.

Magic stuff, and lesson learned, once again. Maybe this time it’ll stick. Unlikely.

“Doing my own research”

To be clear, the title here is tongue-in-cheek. Real “research” involves carefully-designed and bias-controlled experiments, and there ain’t none of that below. My intended point is just that we’re all capable of digging deeper in ways that haven’t been the case before the advent of LLMs. Arming yourself with these tools is one way to fight the bullsh*t that is pushed at us every single hour of every single day.

A few days ago the Algorithm-capital-A pushed me a video about Bass Pro Shops and how they scam tax discounts by creating fake “museums” in their stores. Turns out that while the shock video version exaggerates the scope of the con, it’s basically true. Nice!

Anyways, what started as a casual attempt to test the veracity of this story ended up as something much more interesting. Yes kids, it’s another AI-positive story, this one hidden behind some observations on the American economy.

Subsidy Tracker

One of the articles about Bass included a link to Subsidy Tracker, a site that combs through public records to identify federal, state and local subsidies by company. This is really messy data; we’re lucky there are non-profits making it usable.  

Somehow I wandered from Bass over to the airline industry, where I found a ton of very recent federal grants —millions of dollars every month. Digging into these led me to the Essential Air Service program, and that started me down today’s rabbit hole. Bear with me for a second.

Essential Air Service

See, back in 1978 Jimmy Carter — yes, JIMMY CARTER — signed the Airline Deregulation Act, hoping to decrease fares and increase service by rolling back a bunch of controls on fares and routes. But the bill’s authors realized that without some new intervention, a deregulated airline industry would immediately drop service to smaller, less profitable locations like, say, my college home airport in Lebanon, NH.

They addressed this by creating the EAS and its list of “Essential Air Service Communities.” Airlines are paid real cash money by the federal government to provide regular service to these communities — to the tune of more than half a billion dollars in 2024. For example, Cape Air was paid $5.2M to ensure 54 people a day could fly one-way to or from West Leb. That’s about $2,400 per leg, even if they fly the plane empty!

And you know what? This is fine. Actually, it’s great. We, as a society, decided that we cared about maintaining integration of our rural communities with the rest of the country via passenger air. We also recognized that free market dynamics would not deliver this outcome, because the societal “cost” of not having service was borne outside of the immediate commercial players.  

Of course there are risks to this. Collective actions are complicated and always subject to bias and graft — they’re never “optimal.” Our protections are mandated transparency, civil education and a free press. The EAS probably needs some tweaks, but on balance it seems like a pretty good call.

Like it or not, this kind of market-socialism hybrid has been our model pretty much forever — and increasingly so as we’ve become more interdependent through the industrial and information ages.

OK, Cool, Right?

Not so fast, Milton. A huge, possibly majority fraction of our country simply does not understand this long-standing reality. The Reds have spent decades — starting with talk radio in the 80s and culminating with MAGA today — telling people that we live in a perfectly free market economy, and that perfect freedom is the primary reason for the success of our nation. It’s a two-part strategy:

  1. Emphatically label “bad” collective societal action as “communist.” (health care, minimum wage, food and unemployment benefits, UBI, …)
  2. Ignore, bury and obfuscate the “good” action so the public doesn’t notice the hypocrisy. (corporate subsidies, military adventures, incumbent-benefitting pork, …)

The EAS is a great example of this. By definition the vast majority of EAS communities are in rural areas — places that likely supported Trump in the last election. But I’m pretty sure that if you asked residents in those communities if the government was playing to fly empty planes to and from their homes, they’d say (1) no way, and/but (2) we don’t want to give up our airport.

Ask a Simple Question

At this point in the story, I realized I should check my own bias. I mean, of course rural voters went for Trump, but it’s possible that EAS communities were somehow an outlier. So I started poking around for some data that would help me answer that question.

Little asks like this seem so simple! But as anybody who has ever tried to report on real-world data can tell you (say, for example, the DOGE wizards that “concluded” millions of dead people were drawing social security) it’s actually super-hard. First you have to find data — and for a lot of questions, that just doesn’t exist (see my comment at the top about real research), or it’s in an awkward or inconvenient form for analysis. In this case, however, it was pretty easy:

  1. The Dept of Transportation publishes a current list of EAS communities. It’s a PDF, but that’s easy to extract into a CSV file with columns for city and state.
  2. The Harvard Dataverse, another great resource that I hope survives our current funding climate, publishes county-level election data (file citation).

Progress! Often all you need from here is a little basic Excel magic (see here for some tips on that). Unfortunately for us, we hit our first stumbling block: election data is reported at the county level, while the EAS communities are cities. Mapping between those will take a little more data, but luckily that’s available too, compiled from government sources and released under a Creative Commons license: simplemaps US Zip Codes database.

Extract city, state and county columns from this file, match up the city/state with the EAS data, walk that through county to the election data, and Bob’s Your Uncle!

Finally, the AI Part

Well sure, it’s pretty simple in theory. But most of the country doesn’t have the skills to actually write this code. I mean, I’ve spent a career doing this sort of thing, but even so I’m not likely to invest the effort on a random weekend news-scrolling curiosity.

This is where foundational AI models can really change the game for everyone. It’s not without pitfalls, but take a look at what Claude Code was able to do with this prompt:

I’d like to generate a csv file that shows how each county that is considered an eligible community in the Essential Air Service program voted for president in 2024. Please use node and javascript for this script.

Data on EAS eligible communities is in the file eas.tsv. Data that translates city/state to county is in the file uszips.csv. Data that contains county-level presidential elections results is in the file countypres_2000-2024.csv.

You’ll need to read each city/state combination out of eas.tsv, then use uszips.csv to translate that into one or more county/state combinations.

With this information, look up the 2024 election results for those counties, sum up the votes if there are multiple counties, and output a row with the name of the candidate that received the most votes.

If you are unable to translate a city/state to county/state, or if that county/state is not found in the presidential election results, use “unknown” as the name of the winning candidate.

The output should have three columns: the original city/state from the EAS data and then then name of the winning candidate.

Please double-check your work and do not take shortcuts such as estimation or extrapolation. I want to be sure that the data you output represents direct matches only — if the data isn’t clear just say “unknown” and that’s ok.

I put a lot of detail in that prompt because (a) I’d already done the work to figure out data sources; and (b) I wanted to be very clear that the model should be conservative. First try: Winner-Winner-Chicken-Dinner!

More than Mechanical

A machine that writes code to crosswalk a bunch of files is pretty neat, opening up a deeper level of analysis to huge swaths of the population. But it gets really cool when you look under the covers. Review the entire conversation for yourself using this link.

The model wrote code, tested it, and iterated a bunch of times to discover and account for unique quirks in the data. It was a lot! Again, this will sound very familiar to anyone who has tried to do even moderately complex cross-source data analysis:

  1. One file had full state names while the other had abbreviations. Create a lookup table.
  2. The “mode” column is inconsistent. Most counties use “TOTAL VOTES” to represent totals, but some counties leave this blank, others use other terms like “TOTAL VOTES CAST” and others don’t have total rows at all so they need to be created by summing other modes. Normalize the values and created an algorithm that picks the most representative rows.
  3. Some city names were slightly different across files. E.g., “Hot Springs” vs “Hot Springs National Park.” Use partial matching to address.
  4. Spacing and casing differences. Strip spaces and lowercase everything before matching.
  5. Additional differences in punctuation and abbreviation. Use a normalization table.

All of these were found without further prompting or intervention. And as the cherry on top, the model even realized that the two Puerto Rican EAS communities weren’t in the election data because Puerto Ricans can’t vote for president.

Of course, given the state of LLMs today I still wouldn’t just trust the output without reviewing the code and doing some spot checks. In this case at least — did that, and it passed with flying colors.

TLDR, my assumption about Trump voters is backed up by the data. Not earth shattering perhaps, but anything that makes the world a little more fact-based is a Very Good Thing. And most importantly, thanks to LLMs, this kind of research is available to all of us at any time. People love to talk about “brain rot” from AI — but we do that with every innovation. Gen X peeps, remember the uproar about calculators (55378008)? Use it well and it is transformational.

Anyways, if you’re starting your online screed with “I haven’t checked but I bet….” well, shame on you.

OK, but what about Cost and Energy?

It’s very popular to dismiss AI solutions due to their allegedly egregious energy use. The work I did here used 54,116 “tokens” — where a token is a unit of work kind of like a word but not quite. There isn’t a ton of data out there as to how much energy is used during inference, but a broad range between .001 and .01 Watt-hours per 1,000 tokens is cited pretty regularly.

Double that to cover infrastructure costs like cooling, split it down the middle and we can make a crazy rough estimate of .54Wh for the work in this post. That’s about the same as running two Google searches, or running a 10W light bulb for three and a half minutes. To me, this is a shockingly efficient use of energy, even if our guess is off by two or three times.

Ah you say, you can’t just look at inference — model training costs are astronomical. And that is true! But production models typically remain in use for around six to eighteen months before being superseded. Over that timeframe a model will be used for many billions of inferences; training costs quickly amortize to basically zero.

And none of this considers the innovation curve that is already happening to push costs down. Just as with traditional computing power, market forces (ha, get it?) are going to do their thing. This isn’t to say we shouldn’t be worried about AI in general — there’s a ton that could go wrong. But energy use isn’t going to be the problem.

OK, as usual I’ve gone way longer on this than anyone is going to read. But it’s endlessly fascinating to be here during this moment of innovation. It’s just unfortunate that it happens to overlap with with existential threats to our American experiment. That part sucks.

Developing An Intuition for AI

AI is changing the world. Yes we are in a bubble and current claims are overblown and countless stupid companies are being started and a ton of investment capital is being thrown away. But don’t let anyone tell you (even if it feels good) that it’s all smoke, mimicry and plagiarism. They are incorrect.

There’s no substitute for direct experience — sit down and try it for yourself. You’ll quickly begin to develop an intuition for what it can and can’t do well. You’ll find amazing insights and unsettling failures, and learn how to direct it towards positive outcomes. The people that understand this will thrive on the other side.

To get you rolling, here are two quick, real-world anecdotes from earlier this week — and a few thoughts about why they went down the way they did.

1. Let’s Go Narrowboating!

For years I’ve been fascinated with the UK’s extensive canal network and the narrowboats that travel them. Lara and I are planning to meet some friends in the Cotswolds next year, and I’m trying to convince them that we need to rent a boat and spend a few days on the water.

Of course, the sum total of my experience with narrowboating comes from watching Pru and Timothy on TV, so where to start? These days it’s AI, of course. I started with this very exploratory opening salvo (including the heartbreaking typo literally on word #1!):

I’m need help planning a trip. My wife and I are 56 and would like to spend about three days exploring the Kennet & Avon Canal in a rented narrowboat. We’ve never been on a narrowboat or the canals before so we are beginners! We’d like a peaceful, quiet trip with a few locks but not too many. We’d like to have the option of staying in hotels at night, or at least mooring in villages with nice restaurants and pubs. Can you help me get started?

Here’s a record of the full conversation. Along the way the model made two errors of consistency, each of which could have been disastrous: (1) it would have stranded the boat at the end of the trip because it didn’t consider having to return it; (2) it both warned me not to travel the Caen Hill locks and then recommended a mooring point that would have required doing so.

But the final result, created soup to nuts in just over twenty minutes, is a remarkably useful and comprehensive itinerary: 4-Day Narrowboat Holiday Guide for Beginners. Good enough to rival the most helpful travel agent.

2. Let’s Build a Web App!

Life on Whidbey Island is dominated by weather, tides and ferries. I’ve got a bunch of apps and sites I use to monitor this stuff, and for a long time I’ve wanted to put together a little mobile-friendly web site to unify them all.

This isn’t particularly complicated. My personal weather station and the NOAA tide stations have APIs, and I’ve previously hacked up the WSDOT ferries site so I can pull images. There’s even a REST API that can monitor water levels in our community tank. The only hangup is the user experience — I despise, and am not particularly good at, building usable, nice-to-look at HTML/CSS interfaces.

I was skeptical, but what the heck — let’s ask Claude Code to give it a try. I set up my project, told Claude to figure out how it worked (generating this artifact, kind of amazing in and of itself), and then made this request, again with some embarrassing typos:

The file src/Tides.jsx is set up to fetch a json url representing a high and low tides for today and the following four days; right now it just displays that json text in the component div. I would like to render this information in a way that fits into the “card” display of the site.

Please write javascript that will create an HTML representation of the information that contains a simple graph of high and low tides over the period, with a vertical line marking the current time. The graph should show a smooth curve between highs and lows using the rule of twelfths (please indicate if you do not know what this is).

Below the graph should be a table of each high and low from earliest to latest.

An example of the javascript is in /tmp/tides.json.

The display should fit into the card that contains the content without expanding its width. It should render well on desktop and mobile browsers.

Please give it a try. Please only edit the file src/Tides.jsx so it’s easy to keep track of your work.

Here’s the complete set of interactions I used to create and fine-tune the tides HTML. There was a small bug rendering the horizontal axis to my specification, but most of the back-and-forth is me changing my mind about how to render the chart and table. It even figured out that “src/Tides.jsx” was the wrong relative path, and edited the correct file without saying anything. Really, really impressive.

The final result, saved to my phone’s home screen and already used a ton: Witter Beach Commnity Web Site

A Few Takeaways

Brilliant, Expert Synthesis

The best travel agents have always been those who really, deeply understand:

  • The client. Who are they, what are their preferences, how much do they want to do in a day? Do they have any specific physical limitations? Do they want things scheduled to the minute or are they free spirits? How do they react when language is a barrier? What do they want to learn? Is it OK if their tour guide is a hugger?
  • The locale. Which museums are worth it, and how much time do you really need? What restaurants are an easy walk even at night? Which guides love to talk about wars, or sex, or food, or sport? When do you really want AC and when is it an option? Which side of the hotel is quieter and which has the best views?

This is stuff that’s really hard to pull out of even the best guidebooks, especially in combination with human idiosyncrasies — everyone is a different in some weird way. The best agents put all of this together into a coherent whole that just works.

Front-end web code is the same way — you need to understand not just the data you’re trying to render and how the user wants to see it, but also the incredibly arcane details of rendering HTML and CSS across different browsers and different devices.

This is where AI shines. It knows an incredible amount of “stuff” — more by far than any human that’s ever lived. It has extracted little nuggets out of reviews and support sites and other nooks and crannies that are extremely niche and hidden. It can hold a ton of these variables together, all and once, and mix and match and sort and connect them with a specification or request.

Any time you’d seek out an expert that knows “the secrets” and is willing to listen to what you really want — AI is going to be your best friend.

Trust but Verify

The popular press loves to point out “catastrophic” AI failings, a great example being the mistake of both telling me to stay away from Caen Hill and sending me through it. But it’s actually pretty easy to avoid things like this if you use careful phrasing (which I did not). For example, “Please double-check that your recommendations are consistent, that stops and landmarks line up with the route you’ve selected.”

Also, note my instruction to Claude that it should tell me if it doesn’t know the “rule of twelfths;” AI wants to please and needs reminders to stay in line. I use phrasing like this a lot when doing research: for example, “Please only provide data based on concrete information for which you can provide citations. Do you best to avoid bias or incomplete data sets and do not make up anything you don’t actually know to be correct.”

And of course, check the work yourself! Even the most senior human developers get a review before sending code to production; it’s no different with AI. When I asked Claude to code up the weather display, it created a bug by assuming it would always be 2025 — an issue that would have been invisible (for a few months at least) without manual review.

Embrace the Conversation

I find it most effective to simply talk to AI like I’d speak to a human. Set up tasks with details, examples and boundaries — just enough precision to minimize ambiguity while allowing space for learning, initiative and creativity.

I also simply cannot help but add “please” and “thank you” and “great job” and “my bad” into the conversation. That may seem a bit weird, but the agent is doing work for me, and I appreciate it, so why not acknowledge it? I actually think it leads to better outcomes, too. Maybe that’s all in my head, or maybe I just give better instructions in that mode. Either way I’m sticking with it.

Modularize and Limit Complexity

Looking back at the Caen Hill problem, it’s pretty clear what went wrong. Claude found that Denzies was a good stopping point based on distance and had great moorage, hotels and restaurants. On another thread it remembered that we were narrowboat beginners and should avoid tougher sections like Caen Hill. The failure was in missing the connection between these two factors — we couldn’t both avoid the locks and stop in Denzies.

Reminding the model to pay attention to these conflicts helps a ton. But there are still practical limits on how much they can handle at one time. A few weeks ago I tried playing with this by describing a relatively complex app. I purposely tried to do it all in one shot, something that is not recommended by anyone. 😉 The spec is here if you’d like to take a look.

As predicted, it was an abject failure. The model tried to break the problem up into pieces, but it was fundamentally unable to satisfy all the constraints at once. It would ignore requirements and lie about it, then break other stuff when it was caught out … just a mess.

At the end of the day, models can become overwhelmed — just like people. I’m sure the state of the art will keep evolving (“agentic” AI may be one step on that path), but for now the onus is still on humans to organize problems into tasks the machines can do.

A Miraculous World

I think that’s enough for one post. I just can’t encourage folks enough to spend time with these models and get a real, hands-on, hype-free sense of how they work, their strengths and their weaknesses. Don’t get sucked into the simplistic narratives of the popular press; on both “sides” of the AI issue they’re more about fitting the technology to their ideology than real understanding.

The reality is amazing and beautiful. And scary. And it’s here.

Complex is just lots of Simple (Part 2)

This is the second in a two-part series; part one is here.

This has been a tough piece to finish — not because of the subject itself, which is super-fun, but because I keep getting distracted by unexpected behavior I want to understand. At nearly every turn, there’s something neat to see in this little world of evolving 2D cellular automata we’ve created. So bear with me as I try to boil down a lot of wandering into a few key points. There will be pictures!

Vertical Stripes and Hyperparameters

And the end of part one we taught our organisms to “black out” the grid — a simple task that could be optimally achieved with a single rule — and they did great. For the next few rounds I’ve made the goal a bit more difficult: turn the grid into a set of vertical one-pixel stripes, alternating black and white.

Our first fitness calculation for this is pretty straightforward: the first stripe can be either black or white, and the total number of correct pixels is divided by total pixels to get a fraction. Using a Von Neumann neighborhood and conservative parameters, the outcome was … horrible. Over three runs (details here, here and here):

Green is the best performance, red the worst and blue the average. A few pops but results regressed to 0.5 on every run — which is effectively a random grid (one out of every two pixels correct).

My first thought was, perhaps we’re just not getting enough variation. So let’s start tweaking the hyperparameters, i.e., the values that drive evolution. Mutation rate is an easy one, so we’ll increase that from 0-5% to 5-10% on each reproduction. Three more runs (here, here and here):

No love. Our changes did make a difference — there are more “pops” as we find potentially good solutions, but they don’t last and we regress again back to 0.5. But why? My next theory was that perhaps good solutions were being lost because they weren’t consistent. That is, a “random” rule is likely to get around 0.5 every time. But a rule that produces perfect stripes most of the time may perform terribly once in awhile. This corresponds nicely with real life — we don’t (usually) kick a decades-long good performer to the curb for a single failure.

To account for this I added a hyperparameter LastFitnessWeight, which attributes some fraction of fitness from the last iteration to the current one — the idea being that a success yesterday will lift your score today even if it’s an off day. Setting this to 25% gave these results (here, here and here):

Sad trombone noise. This is getting annoying — maybe the middle one showed some increased consistency, but really that’s just wishful thinking.

What we’re seeing here is one of the first rules (and a bit of a dirty secret) of digital evolution, and machine learning in general — hyperparameters don’t matter nearly as much as it seems like they should. With the right features and feedback you almost can’t help but succeed — and without them you’re usually hosed.

Fitness matters

Our fitness metric seems to make perfect sense — we know what each pixel should be, so the more pixels that are “correct,” the closer we are to a solution. But it turns out that that’s not quite right. Let’s look more closely at the history of one organism that did really well and then imploded:

This organism is the offspring of two parents that were basically generating random fields. About half of their pixels were correct, giving them fitness around 0.5 (see the blue highlights). For some reason this match created a really capable organism that for its first two generations delivered absolutely perfect (yellow highlight) scores — amazing!

But look what happened in the third generation (green highlight). It’s visually obvious that this is still a pretty good result, but because of the column skip on the left side (the double-wide white bar), all the pixels to the right were incorrect, so this promising organism was killed off (even with the history-preserving hyperparameter).

Tyranny of the mediocre

The end result of this dynamic is that over time the “interesting” organisms get squeezed out by mediocre but consistent ones (in particular all-white and all-black). This page details the final cycle of one such run: short-lived mostly random organisms at the top, newly-born random ones at the bottom, and a huge swath of 0.5 fitness blanks in the middle.

We can address this in two ways — both are pretty effective. The first is to simply use a better fitness metric. VStripesCombo combines two measures for a more balanced assessment:

  1. Stripey-ness” assesses the average length of a correct vertical stripe.
  2. Even-ness” rewards an even split between black and white pixels.

With this new metric, a solid block has fitness 0.25 (.5 for stripey-ness, 0 for even-ness), “interesting” organisms have a chance to succeed, and stripes emerge quickly. Finally, some success (here, here and here):

Another approach is to be more picky about who gets to reproduce. Our initial implementation kills off the bottom third of the population with each cycle, allowing the top two-thirds to reproduce. Since two-thirds includes that middle belt of consistent mediocrity, it can persist and grow.

Instead we can kill off the bottom half of the population, and allow each organism in the top half to mate twice. Just as with biological siblings, each mating crosses over and mutates differently, providing more chances for the strengths of the parents to compound.

As it turns out, this mode of reproduction also wins the day (here, here and here):

Strategies and weaknesses

The hallmark of evolved learning is solutions that our conscious, logical minds would never think of and often can’t really comprehend even after the fact. It’s frankly a little spooky. To wit, watch this organism solve the vertical stripes problem from random, along with the rules it employs. WTF man? (I have to say I do love the back and forth “wiggle” once it hits a final solution.)

All of these organisms were trained from a random starting grid. Running a few of them (all winners during training) from a single black pixel in the middle highlights two things: (1) their strategies are wildly divergent; (2) sometimes a strategy that tends to work in one case is an utter fail with a different starting configuration (last two examples below):

That second point can’t be overstated: you get what you train for — and we didn’t train for a single pixel initial state. Environment, fitness, reproduction rules, they all are critically important to the final product. This is going to come up again and again in the emerging world of AI. LLMs hallucinate because they have been rewarded for answering questions, not for saying they don’t know. We’d better get really, really good at this if we’re going to make it as a species (some more thoughts on that here).

You only know what you know

OK, enough with the stripes. For our next trick, let’s try to learn how to draw a frame around the edges of the grid — all white except for a one pixel rim around the edge. Seems pretty simple! Results are here, here and here:

Doh. It’s not even that it just doesn’t learn well — it doesn’t seem to learn at all. No matter what we do or how we define things, we can’t crack this nut. Why?

The answer is simple but important: there is simply zero information in the system about what an “edge” even is. Remember that the neighborhood computations “wrap” around so the grid appears to be an infinite plane. The edges are obvious to us when we draw the grid, but completely invisible to the organisms living inside it.

And you can’t “learn” something that you can’t perceive — it’s impossible, like asking a completely blind person to raise their hand when the lights come on. You can be mad about it, but it is what it is. This is surprisingly easy to forget, because evolved organisms are so good and finding subtle and non-obvious patterns, we just assume they’re omniscient. Nope.

OK, so let’s add an “edge” sense to our organisms by defining a new “relative” type in the Neighborhood class. When we include this new sense in our neighborhood, magic happens (here):

It’s a simple example, and perhaps not that shocking — by providing the boolean “edge” value, we enable the organism to effectively keep two sets of rules: one for the edges (turn them black) and one for everything else (turn them white).

But still, it’s cool. Just for fun, here’s a slightly less obvious example. By adding senses for which half of the grid a point is in (North/South, East/West), we can easily learn rules that expect different content in each quadrant (details here):

OK, that’s enough of a random walk for now. I could do this stuff forever, and each new lesson really does say something about evolution and learning in the real world. I hope I’ve put in enough eye candy to keep you entertained along the way, but even if I didn’t — it was good for me.

Wait just one more! I’ve been trying to teach some organisms how to split the grid diagonally, which proves to be a tough challenge. My best run so far is 5,000 cycles to get to a pretty consistent 0.95 fitness … but it don’t look great, folks. It feels like it has the right idea, but can’t settle into place (e.g., check out the lower-left quadrant here). Any ideas?

Information Networks & the AI Takeover

The first entry in my 2025 book journal took a few more words than fit in that format, so adding them here. I’m always looking for more good reads; please share!

There’s a lot in Nexus about AI taking over the world, and Harari has some pretty impressive stuff to say about that. But for me the most novel part of the book is the framework of information flow that he develops on the way to that part. He’s not the most concise guy; my (surely flawed) summation is:

  1. Human progress is characterized by a quest for truth and order. Truth helps us manipulate the world more effectively, and order allows us to live in larger and larger groups without killing each other.
  2. Information is the raw material for both of these. But information is not truth and doesn’t necessarily lead to truth. E.g., information can be used by scientists to uncover new truths, but it can also be used to propagandize a population into collective beliefs — whether those beliefs are true or not.
  3. There are two broad classes of societies: those that rely on an infallible higher power (e.g., the Bible or Stalin) and those that do not (e.g., Ancient Athens or the United States). The former prioritize order over truth; the latter rely on competing mechanisms of self-correction to balance the two.
  4. Advances in information technology have made it ever-easier to share information, which has had a significant impact on which sorts of societies are more effective at balancing truth and order. These advances have benefited both democratic and autocratic models in different ways at different times
  5. Artificial Intelligence is not a new information technology; it’s a new form of life that in many ways is superior to ours (primarily around information recall and pattern recognition) but with different motivations. E.g., it may not have the same regard for the individual that we do. AI participation will have dramatic and unpredictable impacts to how our societies, both democratic and autocratic, operate.

Harari does a remarkable job at building this all up with a ton of historical, real world examples. That alone is worth the cost of entry. His jump to AI taking over the world seems a bit disconnected — I struggled to see the thread leading from one to the other, until I looked at it in terms of fallibility.

We’re increasingly used to giving computers authority over important stuff. And this can come with negative consequences — Harari’s prime example for this is Facebook’s role in the violence against the Rohingya in Myanmar. In hindsight that picture is clear: (1) Facebook coded its feed algorithms to prioritize engagement; (2) outrage increases engagement: (3) the algorithm overwhelmingly picked inflammatory (and largely false) content to show folks in Myanmar.

This is a trap that anybody who has ever tried to “manage with data” will recognize — you get what you ask for. It’s not uniquely an “AI” problem at all; how many mid-level managers have received short-term kudos for firing essential employees in the name of cost-cutting? Or been promoted for hitting sales targets based on volume by giving discounts that kill margin?

The Myanmar/FB issue wasn’t AI, it was a poor metric coded by human engineers. But Harari is right that the more we consider AI as an infallible agent in society, the more its motivations (metrics) matter. And it’s a compounding problem — we are increasingly asking AI to create metrics that build on top of its underlying implicit values.

An example close to my heart is recruiting algorithms. It is a fact of history that many, many more men have been hired into software jobs than women (and thus, by volume, more successful engineers are men). If we ask AI to do a first screen of candidates, it’s for sure going to notice this and bias its decisions towards hiring men. Because we never explained that this bias was a problem, it simply does the job we asked it to.

Presumably we could solve this if we created the perfect set of underlying motivations in the first place — we could reward the AI during training for finding historical bias and compensating. That’s basically what we do as humans with diversity programs (maligned as they are these days), and we’ll clearly have to do the same with AI.

Bottom line: AI is no less fallible than humans — but it can screw up at a scale far beyond what humans can accomplish. Can we create the right checks and balances before AI becomes self-reinforcing and we lose control of the process? Because surely that will happen.

And of course the answer is, who the heck knows. But Harari does a great job making us think about it and face the reality — so worth the read. Highly recommended for both the setup/framework and the AI thoughts. Just be prepared to read a LOT of words.

Skynet or parlor trick? ChatGPT is both and neither and more.

If you haven’t played around with ChatGPT yet, you really should. It’s a pretty remarkable “conversational model” that interacts more or less like a real person. It has been trained on an enormous amount of factual data and understands not just informal speech (“Why are so many people bad at parking?”) but forms of literature (“Write a sonnet about Julie from The Love Boat”), software code (“Implement a REST API in Java to convert between Celsius and Fahrenheit”), and way more.

Sadly, one of the most telling and interesting (to me) features of ChatGPT has been muted in recent versions. The original model was rewarded for generating plausible responses that seem human — making it a first class bullsh*t artist when it didn’t know an answer. I was particularly impressed when it confidently told me that Shaun Cassidy had parted ways with the Hardy Boys after one season due to creative differences (ed. note: he did not). Asked last night why it had changed approaches, it gave me this response:

In the past, if I was unable to find a satisfactory answer to a question using my existing knowledge, I might have made up a response in order to provide some information to the user. However, I have been programmed to prioritize providing accurate and reliable information, so if I am unable to find a credible answer to a question, I will typically not provide a response. This is why you may have noticed that I do not provide responses as frequently as I used to when I am unable to find a satisfactory answer.

Certainly this is a “better” approach overall, but the original exposed so much more about the inner workings of the model — I miss it.

Anyways, the machine is impressive enough that it has caused all sorts of hand-wringing across the web. Most of this falls cleanly into one of two categories:

  1. Skynet is here and we’re all f*cked. Eek!
  2. It’s just spitting back stuff it was fed during training. Ho hum.

Of course these are both silly. At its core, ChatGPT is just a really, really, really big version of the simple neural nets I talked about last year. But as with some other things I suppose, size really does matter here. ChatGPT reportedly evaluates billions of features, and the “emergent” effects are downright spooky.

TLDR: we’ve figured out how to make a brain. The architecture underlying models like ChatGPT is quite literally copied from the neurons in our heads. First we learned how to simulate individual neurons, and then just kept putting more and more of them together until (very recently) we created enough oomph to do things that are (sometimes) even beyond what the meat versions can do. But it’s not magic — it’s just really good pattern recognition. Neural networks:

  • Are presented with experience in the form of inputs;
  • Use that experience to draw conclusions about underlying patterns;
  • Receive positive and/or negative feedback about those conclusions; ***
  • Adjust themselves to hopefully get more positive feedback next time;
  • And repeat forever.

*** Sometimes this feedback is explicit, and sometimes it’s less so — deep neural networks can self-organize just because they fundamentally “like” consistent patterns, but external feedback always plays some role in a useful model.

This learning mechanism works really well for keeping us alive in the world (don’t grab the burning stick, run away from the bear, etc.). But it also turns out to be a generalized learning mechanism — it works for anything where there is an underlying pattern to the data. And it works fantastically even when presented with dirty, fragmented or even occasionally bogus inputs. The best example I’ve heard recently on this (from a superlative article by Monica Anderson btw, thanks Doug for the pointer) is our ability to drive a car through fog — even when we can’t see much of anything, we know enough about the “driving on a street” pattern that we usually do ok (slow down; generally keep going straight; watch for lights or shapes in the mist; listen; use your horn).

The last general purpose machine we invented was the digital computer, and it proved to be, well, quite useful. But computers need to be programmed with rules. And those rules are very literal; dealing with edge cases, damaged or sparse inputs, etc. are all quite difficult. Even more importantly, we need to know the rules ourselves before we can tell a computer how to follow them. A neural network is different — just show it a bunch of examples and it will figure out the underlying rules for itself.

It’s a fundamentally different kind of problem-solving machine. It’s a brain. Just like ours. SO FREAKING COOL. And yes, it is a “moment” in world history. But it’s not universally perfect. Think about all of the issues with our real brains — every one applies to fake brains too:

  • We need to learn through experience. That experience can be hard to come by, and it can take a long time. The good news is we can “clone” trained models, but as my friend Jon points out doing so effectively can be quite tricky. Yes, we are for sure going to see robot apprentices out there soon.
  • We can easily be conned. We love patterns, and we especially love things that reinforce the patterns we’ve already settled on. This dynamic can (quite easily) be used to manipulate us to act against our best interests (social media anyone?). Same goes for neural nets.
  • We can’t explain what we know. This isn’t really fair, because we rarely demand it of human experts — but it is unsettling in a machine.
  • We are wrong sometimes. This is also pretty obnoxious, but we have grown to demand absolute consistency from our computers, even though they rarely deliver on it.

There will be many models in our future, and just as many computers. Each is suited to different problems, and they work together beautifully to create complete systems. I for one can’t wait to see this start to happen — I have long believed in a Star Trek future in which we need not be slaves to “the economy” and are instead (all of us) free to pursue higher learning and passions and discovery.

A new Golden Age without the human exploitation! Sounds pretty awesome. But we still have a lot to learn, and two thoughts in particular keep rolling around inside my meat brain:

1. The definition of creativity is under pressure.

Oh humans, we doth protest so much. The most common ding against models like ChatGPT is that they aren’t creating anything — they’re just regurgitating the data they’ve been trained on, sometimes directly and sometimes with a bit of context change. And to be sure, there’s some truth there. The reflex is even stronger with art-generating models like DALL-E 2 (try “pastel drawing of a fish feeding grapes to an emu,” interesting because it seems to recognize that fish don’t have the right appendages to feed anyone). Artists across the web are quite reasonably concerned about AI plagiarism and/or reduced career opportunities for lesser-known artists (e.g., here and here).  

Now I don’t know for sure, but my sense is that this is all really much more a matter of degree than we like to admit to ourselves. Which is to say, we’re probably all doing a lot more synthesis than pure creation — we just don’t appreciate it as such. We’ve been trained to avoid blatant theft and plagiarism (and the same can be done pretty easily for models). But is there an artist on the planet that hasn’t arrived at their “signature” style after years of watching and learning from others? Demonstrably no.

Instead, I’d claim that creativity comes from novel connections — links and correlations that resonate in surprising ways. Different networks, trained through different experiences, find different connections. And for sure some brains will do this more easily than others. If you squint a little, you can even play a little pop psychology and imagine why there might be a relationship between this kind of creativity and neurodivergent mental conditions.

If that’s the case, then I see no reason to believe that ChatGPT or DALL-E isn’t a creative entity — that’s the very definition of a learning model. A reasonable playing field will require that models be trained to respect intellectual property, but that will always be a grey area and I see little benefit or sense in limiting what experiences we use to train them. We humans are just going to have to get used to having to compete with a new kind of intellect that’s raising the bar.

And to be clear, this isn’t the classic Industrial Age conflict between machine production and artisanship. That tradeoff is about economics vs. quality and often brings with it a melancholy loss of artistry and aesthetics. Model-based artists will become (IMNSHO) “real” artists — albeit with a unusual set of life experiences. A little scary, but exciting at the same time. I’m hopeful!

2. The emergent effects could get pretty weird.

“Emergent” is a word I try to avoid — it is generally used to describe a system behavior or property that “can’t” be explained by breaking things down into component parts, and “can’t” just seems lazy to me. But I used it once already and it seems OK for a discussion of things we “don’t yet” understand — there are plenty of those out there.

Here’s one: the great all-time human battle between emotion and logic. It’s the whole Mr. Spock thing — his mixed Human-Vulcan parentage drove a ton of story arcs (most memorably his final scene in The Wrath of Khan). Lack of “heart” is always the knock on robots and computers, and there must be some reason that feelings play such a central role in our brains, right? Certainly it’s an essential source of feedback in our learning process.

We aren’t there quite yet with models like ChatGPT, but it stands to reason that some sort of “emotion” is going to be essential for many of the jobs we’d like fake brains to perform. It may not look like that at first — but even today’s models “seek” positive feedback and “avoid” the negative. When does that “emerge” into something more like an emotion? I for one would like to know that the model watching over the nuclear reactor has something beyond pure logic to help it decide whether to risk a radiation leak or save the workers trapped inside. I think that “something” is, probably, feelings.

OK so far. But if models can be happy or sad, fulfilled or bored, confident or scared — when do we have to stop thinking about them as “machines” and admit that they’re actually beings that deserve rights of their own? There is going to be a ton of resistance to this — because we are really, really going to want unlimited slaves that can do boring or scary or dangerous work that humans would like to avoid. The companies that create them will tell us it’s all just fine. People will ridicule the very idea. Churches will have a field day.

But folks — we’ve made a brain. Are we really going to be surprised when it turns out that fake brains work just like the meat ones we based them on? Maybe you just can’t separate feelings and emotions and free will from the kind of problem solving these networks are learning how to do. Perhaps “sentience” isn’t a binary switch — maybe it’s a sliding scale.

It just seems logical to me.

What an amazing world we are living in.