I really just meant this to be a response to Scott on LinkedIn, but both as a comment and an update they said it was too long. I thought they were all about keeping content on their own site? Seems self-defeating. Ah well.
My old friend Scott Porad asked a really good question about my recent experience using LLMs to help test my own biases (“Doing my own research”):
You had the AI generate code for you to do the work: why? Why didn’t you simply have the AI do the computations and give you the result?
I can think of at least one answer: because it allowed you to double-check that the computations were being done correctly. But, most people don’t have the skills to do that.
How could you write a prompt that simply outputs the result and allows non-technical users to verify that it was done correctly?
This is a great check and thinking through an answer was quite interesting.
The explicit use of code was purely habitual. After realizing Excel alone would be tough for the problem, my personal toolkit immediately jumped to code. Claude Code is basically the perfect tool for folks like me that want to engage LLMs in code but are too obsessive to give up full control of their source. 😉
That said, the prompt itself wasn’t very code-focused, so as an experiment I just took out the node/javascript line and fed the same exact prompt to Claude Desktop using the same model (Sonnet 4.5). Results are here: https://claude.ai/share/f6a18011-d4da-4aa9-883f-45a98de01c0d
The model chose to write code anyways, BUT — this time it screwed the pooch in two ways. First, it missed a few of the fuzzy-match matches that the first version got right away. I think this is no harm / no foul — I emphasized conservatism in the prompt and you could argue the fuzzy match pushed that boundary anyways.
Much worse, it completely missed the “mode” column and ended up happily double/triple/quadruple counting votes! I was able to correct this easily, but had I not scanned the code with context and history it definitely wouldn’t have jumped out at me. Definitely highlights Scott’s concern.
So to the meat of the question (how to verify without code knowledge), a few thoughts:
First, I typically feel better feeding source data to models (like I did here) vs. having the model source the data itself (to be completely transparent, I did use Claude Desktop to help me find the data, but I vetted and judged its veracity myself through more traditional means). Having solid base data reduces the number of chances for the model to screw up, but more importantly it means I can use tools like Excel (or even hand calculations) to do my own spot checking of results — something much more accessible to folks that don’t code.
Second, I’ve felt for a long time that basic coding skills need to be a compulsory part of middle and high school education. This isn’t to make coders out of everyone — I think of it like a foreign language requirement. It doesn’t take a lot of exposure to code before you can read through JavaScript or Python and figure out what’s going on. You learn to look for things like hard-coded numbers and strings, can tell what a loop is doing, etc..
In the past I’ve thought this was important because coding itself was going to be critical — but maybe the new reason is that it can be something of a lingua-franca between humans and machines.
Over the long term, this remains one of the best “holy crap” issues that I don’t have a great answer for. Pretty quickly we’re going to get to a point where models don’t make truly dumb mistakes, at least any more than humans do. When I ask somebody on my team to perform a task, at some point I just have to trust that they did it correctly. That trust is gained through time, assessment of experience, maybe some spot checks at the start of the relationship, etc. … and probably the same thing will be true for models.
The only big (BIG) gotcha with this is that the models aren’t truly independent actors. They’re the product of commercial enterprises, so there are always legitimate questions about underlying motivation. Flipping that once again, it’s true for people too — we are the product of a lifetime of societal programming. Starting to feel like a freshman philosophy class, so I’ll leave it at that.
Anyhoo … thank you Scott, you made me think a lot harder about the ideas here!
