Shutdown Radio on Azure

Back about a year ago when I was playing with ShutdownRadio, I ranted a bit about my failed attempt to implement it using Azure Functions and Cosmos. Just to recap, dependency conflicts in the official Microsoft Java libraries made it impossible to use these two core Azure technologies together — so I punted. I planned to revisit an Azure version once Microsoft got their sh*t together, but life moved on and that never happened.

Separately, a couple of weeks ago I decided I should learn more about chatbots in general and the Microsoft Bot Framework in particular. “Conversational” interfaces are popping up more and more, and while they’re often just annoyingly obtuse, I can imagine a ton of really useful applications. And if we’re ever going to eliminate unsatisfying jobs from the world, bots that can figure out what our crazily imprecise language patterns mean are going to have to play a role.

No joke, this is what my Bellevue workbench looks like right now, today.

But heads up, this post isn’t about bots at all. You know that thing where you want to do a project, but you can’t do the project until the workbench is clean, but you can’t clean up the workbench until you finish the painting job sitting on the bench, but you can’t finish that job until you go to the store for more paint, but you can’t go to the store until you get gas for the car? Yeah, that’s me.

My plan was to write a bot for Microsoft Teams that could interact with ShutdownRadio and make it more natural/engaging for folks that use Teams all day for work anyways. But it seemed really silly to do all of that work in Azure and then call out to a dumb little web app running on my ancient Rackspace VM. So that’s how I got back to implementing ShutdownRadio using Azure Functions. And while it was generally not so bad this time around, there were enough gotchas that I thought I’m immortalize them for Google here before diving into the shiny new fun bot stuff. All of which is to say — this post is probably only interesting to you if you are in fact using Google right now to figure out why your Azure code isn’t working. You have been warned.

A quick recap of the app

The idea of ShutdownRadio is for people to be able to curate and listen to (or watch I suppose) YouTube playlists “in sync” from different physical locations. There is no login and anyone can add videos to any channel — but there is also no list of channels, so somebody has to know the channel name to be a jack*ss. It’s a simple, bare-bones UX — the only magic is in the synchronization that ensures everybody is (for all practical purposes) listening to the same song at the same time. I talked more about all of this in the original article, so won’t belabor it here.

For your listening pleasure, I did migrate over the “songs by bands connected in some way to Seattle” playlist that my colleagues at Adaptive put together in 2020. Use the channel name “seattle” to take it for a spin; there’s some great stuff in there!

Moving to Azure Functions

The concept of Azure Functions (or AWS Lambda) is pretty sweet — rather than deploying code to servers or VMs directly, you just upload “functions” (code packages) to the cloud, configure the endpoints or “triggers” that allow users to execute them (usually HTTP URLs), and let your provider figure out where and how to run everything. This is just one flavor of the “serverless computing” future that is slowly but surely becoming the standard for everything (and of course there are servers, they’re just not your problem). ShutdownRadio exposes four of these functions:

  • /home simply returns the static HTML page that embeds the video player and drives the UX. Easy peasy.
  • /channel returns information about the current state of a channel, including the currently-playing video.
  • /playlist returns all of the videos in the channel.
  • /addVideo adds a new video to the channel.

Each of these routes was originally defined in Handlers.java as HttpHandlers, the construct used by the JDK internal HttpServer. After creating the Functions project using the “quickstart” maven archetype, lifting these over to Azure Functions in Functions.java was pretty straightforward. The class names are different, but the story is pretty much the same.

Routes and Proxies

My goal was to make minimal changes to the original code — obviously these handlers needed to change, as well as the backend store (which we’ll discuss later), but beyond that I wanted to leave things alone as much as possible. By default Azure Functions prepend “/api/” to HTTP routes, but I was able to match the originals by turfing that in the host.json configuration file:

"extensions": {
       "http": {
             "routePrefix": ""
       }
}

A trickier routing issue was getting the “root” page to work (i.e., “/” instead of “/home“). Functions are required to have a non-empty name, so you can’t just use “” (or “/” yes I tried). It took a bunch of digging but eventually Google delivered the goods in two parts:

  1. Function apps support “proxy” rules via proxies.json that can be abused to route requests from the root to a named function (note the non-obvious use of “localhost” in the backendUri value to proxy routes to the same application).
  2. The maven-resources-plugin can be used in pom.xml to put proxies.json in the right place at packaging time so that it makes it up to the cloud.

Finally, the Azure portal “TLS/SSL settings” panel can be used to force all requests to use HTTPS. Not necessary for this app but a nice touch.

All of this seems pretty obscure, but for once I’m inclined to give Microsoft a break. Functions really aren’t meant to implement websites — they have Azure Web Apps and Static Web Apps for that. In this case, I just preferred the Functions model — so the weird configuration is on me.

Moving to Cosmos

I’m a little less sanguine about the challenges I had changing the storage model from a simple directory of files to Cosmos DB. I mean, the final product is really quite simple and works well, so that’s cool. But once again I ran into lazy client library issues and random inconsistencies all along the way.

There are a bunch of ways to use Cosmos, but at heart it’s just a super-scalable NoSQL document store. Honestly I don’t really understand the pedigree of this thing — back in the day “Cosmos” was the in-house data warehouse used to do analytics for Bing Search, but that grew up super-organically with a weird, custom batch interface. I can’t imagine that the public service really shares code with that dinosaur, but as far as I can tell it’s not a fork of any of the big open source NoSQL projects either. So where did it even come from — ground up? Yeesh, only at Microsoft.

Anyhoo, after creating a Cosmos “account” in the Azure portal, it’s relatively easy to create databases (really just namespaces) and containers within them (more like what I could consider databases, or maybe big flexible partitioned tables). Containers hold items which natively are just JSON documents, although they can be made to look like table rows or graph elements with the different APIs.

Access using a Managed Identity

One of the big selling points (at least for me) of using Azure for distributed systems is its support for managed identities. Basically each service (e.g., my Function App) can have its own Active Directory identity, and this identity can be given rights to access other services (e.g., my Cosmos DB container). These relationships completely eliminate the need to store and manage service credentials — everything just happens transparently without any of the noise or risk that comes with traditional service-to-service authentication. It’s beautiful stuff.

Of course, it can be a bit tricky to make this work on dev machines — e.g., the Azure Function App emulator doesn’t know squat about managed identities (it has all kinds of other problems too but let’s focus here). The best (and I think recommended?) approach I’ve found is to use the DefaultAzureCredentialBuilder to get an auth token. The pattern works like this:

  1. In the cloud, configure your service to use a Managed Identity and grant access using that.
  2. For local development, grant your personal Azure login access to test resources — then use “az login” at the command-line to establish credentials on your development machine.
  3. In code, let the DefaultAzureCredential figure out what kind of token is appropriate and then use that token for service auth.

The DefaultAzureCredential iterates over all the various and obtuse authentication types until it finds one that works — with production-class approaches like ManagedIdentityCredential taking higher priority than development-class ones like AzureCliCredential. Net-net it just works in both situations, which is really nice.

Unfortunately, admin support for managed identities (or really any role-based access) with Cosmos is just stupid. There is no way to set it up using the portal — you can only do it via the command line with the Azure CLI or Powershell. I’ve said it before, but this kind of thing drives me absolutely nuts — it seems like every implementation is just random. Maybe it’s here, maybe it’s there, who knows … it’s just exhausting and inexcusable for a company that claims to love developers. But whatever, here’s a snippet that grants an AD object read/write access to a Cosmos container:

az cosmosdb sql role assignment create \
       --account-name 'COSMOS_ACCOUNT' \
       --resource-group 'COSMOS_RESOURCE_GROUP' \
       --scope '/dbs/COSMOS_DATABASE/colls/COSMOS_CONTAINER' \
       --principal-id 'MANAGED_IDENTITY_OR_OTHER_AD_OBJECCT' \
       --role-definition-id '00000000-0000-0000-0000-000000000002'

The role-definition id there is a built-in CosmosDB “contributor” role that grants read and write access. The “scope” can be omitted to grant access to all databases and containers in the account, or just truncated to /dbs/COSMOS_DATABASE for all containers in the database. The same command can be used with your Azure AD account as the principal-id.

Client Library Gotchas

Each Cosmos Container can hold arbitrary JSON documents — they don’t need to all use the same schema. This is nice because it meant I could keep the “channel” and “playlist” objects in the same container, so long as they all had unique identifier values. I created this identifier by adding an internal “id” field on each of the objects in Model.java — the analog of the unique filename suffix I used in the original version.

The base Cosmos Java API lets you read and write POJOs directly using generics and the serialization capabilities of the Jackson JSON library. This is admittedly cool — I use the same pattern often with Google’s Gson library. But here’s the rub — the library can’t serialize common types like the ones in the java.time namespace. In and of itself this is fine, because Jackson provides a way to add serialization modules to do the job for unknown types. But the recommended way of doing this requires setting values on the ObjectMapper used for serialization, and that ObjectMapper isn’t exposed by the client library for public use. Well technically it is, so that’s what I did — but it’s a hack using stuff inside the “implementation” namespace:

log.info("Adding JavaTimeModule to Cosmos Utils ObjectMapper");
com.azure.cosmos.implementation.Utils.getSimpleObjectMapper().registerModule(new JavaTimeModule());

Side node: long after I got this working, I stumbled onto another approach that uses Jackson annotations and doesn’t require directly referencing private implementation. That’s better, but it’s still a crappy, leaky abstraction that requires knowledge and exploitation of undocumented implementation details. Do better, Microsoft!

Pop the Stack

Minor tribulations aside, ShutdownRadio is now happily running in Azure — so mission accomplished for this post. And when I look at the actual code delta between this version and the original one, it’s really quite minimal. Radio.java, YouTube.java and player.html didn’t have to change at all. Model.java took just a couple of tweaks, and I could have even avoided those if I were being really strict with myself. Not too shabby!

Now it’s time to pop this task off of the stack and get back to the business of learning about bots. Next stop, ShutdownRadio in Teams …and maybe Skype if I’m feeling extra bold. Onward!

Refine your search for “gunshot wound”

I tend to be a mostly forward-looking person, but there’s nothing like a bit of nostalgia once in awhile.

After finally putting together a pretty solid cold storage solution for the family, I spent a little time going through my own document folders to see if there was anything there I really didn’t want to lose. The structure there is an amusing recursive walk through the last fifteen years of my career — each time I get a new laptop I just copy over my old Documents folder, so it looks like this:

  • seanno99 – Documents
    • some files
    • seanno98 – Documents
      • some files
      • seanno97 – Documents
        • some files
        • seanno96 – Documents
          • etc.

Yeah of course there are way better ways to manage this. But the complete lack of useful organization does set the stage for some amusing archeological discoveries. Case in point, last night I stumbled across a bunch of screen mocks for the service that ultimately became the embedded “Health Answer” in Bing Search (this was a long time ago, I don’t know if they still call them “Answers” or not, and I’m quite sure the original code is long gone).

One image in particular brought me right back to a snowy day in Redmond, Washington — one of my favorite memories in a luck-filled career full of great ones, probably about nine months before the mock was created.

Back then, the major engines didn’t really consider “health” to be anything special. This was true of most specialized domains — innovations around generalized search were coming so hot and heavy that any kind of curation or specialized algorithms just seemed like a waste of time. My long-time partner Peter Neupert and I believed that this was a mistake, and that “health” represented a huge opportunity for Microsoft both in search and elsewhere. There was a bunch of evidence for this that isn’t worth spending time on here — the important part is that we were confident enough to pitch Microsoft on creating a big-time, long-term investment in the space. I’m forever thankful that I was introduced to Peter way back in 1998; he has a scope of vision that I’ve been drafting off for a quarter century now.

Anyways, back in the late Fall of 2005 we were set to pitch this investment to Steve and Bill. The day arrives and it turns out that the Northwest has just been hit by a snowstorm — I can’t find a reference to the storm anywhere online, so it was probably something lame like six inches, but that’s more than enough to knock out the entire Seattle area. There is no power on the Microsoft campus and most folks are hiding in their homes with a stock of fresh water and canned soup. But Steve and Bill apparently have a generator in their little office kingdom, so we’re on. Somebody ran an extension cord into the conference room and set up a few lights, but there’s this great shadowy end-of-the-world vibe in the room — sweet. So we launch into our song and dance, a key part of which is the importance of health-specific search.

And here comes Bill. Now, he has gotten a lot of sh*t in the press lately, and I have no reason to question the legitimacy of the claims being made. This bums me out, because Bill Gates is one of the very few people in the world that I have been truly impressed by. He is scary, scary smart — driven by numbers and logic, and just as ready to hear that he’s an idiot as he is to tell you that you are. For my purposes here, I choose to remember this Bill, the one I’ve gotten to interact with.

“This is the stupidest idea I have ever heard.”

Bill dismisses the entire idea that people would search for issues related to their health. He expresses this with a small one-act play: “Oh, oh, I’ve been shot!” — he clutches his chest and starts dragging himself towards the table — “I don’t know what to do, let me open up my computer” — he stumbles and hauls himself up to the laptop — “No need for the ER, I’ll just search for ‘gunshot wound’” — sadly he collapses before he can get his search results. And, scene.

Suffice to say that backing down is not the right way to win a debate with Bill. I remember saying something that involved the words “ridiculous” and “bullsh*t” but that’s it — I was in The Zone. Fast forward about a week, the snow melted and Peter did some background magic and our funding was in the bag.

A few months later, we ended up buying a neat little company called Medstory that had created an engine dedicated to health search. And thus were born the “HealthVault Search” mocks that I found deep in the depths of my archives the other day. The best part? If you’ve looked at the image, you already know the punch line: GUNSHOT WOUND was immortalized as the go-to search phrase for the first image presented — every meeting, every time.

Bing!

Map-engraved / heart-inlaid coasters

As a present for my wife this year, I made a set of heart-shaped coasters to commemorate key times/places in our past — where we met, were married, adopted pets and had kids. Each coaster has a map, a heart inset at the key location, and a description on the back. It was a fun project using a few techniques I thought others might find useful, so just a quick post to walk through it.

I used 1/4″ MDF with maple veneer but the engraving covers the full front of the coaster so you could really use any light-colored wood with minimal grain pattern (you kind of want a blank canvas). The heart insets are translucent red 1/8” acrylic, actually part of a pack I actually got for Valentine’s last year! I was worried that insetting the half-thickness acrylic into the coaster might be an awkward fit, but it worked great. The backs are 2mm adhesive cork that I use a lot for tabletop projects.

The Hearts

I started with a simple vector heart shape and scaled it to the target size of the coaster (4.5” square works for most mugs and glasses). I then scaled copies to three additional sizes:

  1. One inset 1/8” for the cork backing.
  2. A small one 1/2″ square for cutting the hole for the inlay.
  3. The small one outset .007” for the acrylic, which (when flipped over) made a snug fit into the hole.

I use Inkscape for most of my designs. Inset/outset from the “Path” menu is the freaking best feature ever — the only trick is that the size of each step is a global setting, so be sure to double-check under Edit / Preferences / Steps / “Inset/Outset by” to be sure it’s what you want. Seven thousandths of an inch is about perfect for the kerf I get on most 1/8” and 1/4″ wood and acrylic. I’m sure it varies a little but not enough to worry about. Remember to flip the insert over before pressing it in, which takes care of the every-so-slightly-conical cut you get from the laser.

The Maps

OpenStreetMap is a fantastic resource — community-produced and openly licensed, even for commercial distribution (attribution is required; see their guidelines for details). There are a bunch of ways to use the data; this is the process I finally worked out for my purposes:

  1. Navigate to the area you want to capture and zoom in/out as needed.
  2. Export the map as a PDF:
    1. Click the “share” button on the right side of the screen.
    1. Check the “Set custom dimensions” checkbox and select the desired area. Select more area than you need; it provides some wiggle room and we’ll clip it out later.
    1. Set the format to “PDF”.
    1. Play with the “Scale” setting to get a final image that works for you. I found it easiest to start with 1:5000 and adjust from there.
    1. Click Download.
  3. Open the PDF in Inkscape and make edits (remove landmarks, reposition street names, etc.) if needed.
  4. Paste your shape (in my case the 4.5” heart) and position it over the map.
  5. “Select All” and choose Object / Clip / Set to clip the map to your shape.
  6. Optional: I pasted in another copy of the 4.5” heart with a wide stroke, which made a nice outline.
  7. Under the File menu, choose “Export PNG Image”. Make sure “Drawing” is selected at the top and then export.
  8. Finally, open the new PNG file in Inkscape, add additional elements (i.e., the cut lines for the heart and inlay hole) and save as an SVG ready for the Glowforge.

All that work to massage the map into a bitmap (PNG) is worth it — the Glowforge handles the engraving super well.

Printing and Assembling

Printing requires three Glowforge runs, one for each material. For the wood, I used the “Thick Maple Plywood” settings and they worked great, engraving with Draft Photo / Convert to Dots with default settings except two passes instead of just one. The acrylic worked fine as “Medium Red Acrylic”. For the cork I configured “uncertified” material with a height of 2mm; engrave at speed 80 / power 10% and cut at 400 / 100%.

After pressing in the inlays, I poured on two coats of TableTop Epoxy, sanded off the drips, stuck on the cork backs, and that’s a wrap.

I really love working with the maps — such a neat way to personalize stuff. Hope folks will get some use out of the technique, and if you give it a try, let me know if I can help out. Kachow!

Cold Storage on Azure

As the story goes, Steve Jobs once commissioned a study to determine the safest long-term storage medium for Apple’s source code. After evaluating all the most advanced technologies of the day (LaserDisc!), the team ended up printing it all out on acid-free paper to be stored deep within the low-humidity environment at Yucca Mountain — the theory being that our eyes were the most likely data retrieval technology to survive the collapse of civilization. Of course this is almost certainly false, but I love it anyways. Just like the (also false) Soviet-pencils-in-space story, there is something very Jedi about simplicity outperforming complexity. If you need me, I’ll be hanging out in the basement with Mike Mulligan and Mary Anne.

Image credit Wikipedia

Anyways, I was reminded of the Jobs story the other day because long-term data storage is something of a recurring challenge in the Nolan household. In the days of hundreds of free gigs from consumer services, you wouldn’t think this would be an issue, and yet it is. In particular my wife takes a billion pictures (her camera takes something like fifty shots for every shutter press), and my daughter has created an improbable tidal wave of video content.

Keeping all this stuff safe has been a decades-long saga including various server incarnations, a custom-built NAS in the closet, the usual online services, and more. They all have fatal flaws, from reliability to cost to usability. Until very recently, the most effective approach was a big pile of redundant drives in a fireproof safe. It’s honestly not a terrible system; you can get 2TB for basically no money these days, so keeping multiple copies of everything isn’t a big deal. Still not great though — mean time to failure for both spinning disks and SSD remains sadly low — so we need to remember to check them all a couple of time each year to catch hardware failures. And there’s always more. A couple of weeks ago, as my daughter’s laptop was clearly on the way out, she found herself trying to rescue yet more huge files that hadn’t made it to the safe.

Enter Glacier (and, uh, “Archive”)

It turns out that in the last five years or so a new long-term storage approach has emerged, and it is awesome.

Object (file) storage has been a part of the “cloud” ever since there was a “cloud” — Amazon calls their service S3; Microsoft calls theirs Blob Storage.  Conceptually these systems are quite simple: files are uploaded to and downloaded from virtual drives (“buckets” for Amazon, “containers” for Azure) using more-or-less standard web APIs. The files are available to anyone anywhere that has the right credentials, which is super-handy. But the real win is that files stored in these services are really, really unlikely to be lost due to hardware issues. Multiple copies of every file are stored not just on multiple drives, but in multiple regions of the world — so they’re good even if Lex Luthor does manage to cleave off California into the ocean (whew). And they are constantly monitored for hardware failure behind the scenes. It’s fantastic.

But as you might suspect, this redundancy doesn’t come for free. Storing a 100 gigabyte file in “standard” storage goes for about $30 per year (there are minor differences between services and lots of options that can impact this number, but it’s reasonably close), which is basically the one-and-done cost of a 2 terabyte USB stick! This premium can be very much worth it for enterprises, but it’s hard to swallow for home use.

Ah, but wait. These folks aren’t stupid, and realized that long-term “cold” storage is its own beast. Once stored, these files are almost never looked at again — they just sit there as a security blanket against disaster. By taking them offline (even just by turning off the electricity to the racks), they could be stored much more cheaply, without sacrificing any of the redundancy. The tradeoff is only that if you do need to read the files, bringing them back online takes some time (about half a day generally) — not a bad story for this use case. Even better, the teams realized that they could use the same APIs for both “active” and “cold” file operations — and even move things between these tiers automatically to optimize costs in some cases.

Thus was born Amazon Glacier and the predictably-boringly-named Azure Archive Tier. That same 100GB file in long-term storage costs just $3.50 / year … a dramatically better cost profile, and something I can get solidly behind for family use. Woo hoo!

But Wait

The functionality is great, and the costs are totally fine. So why not just let the family loose on some storage and be done with it? As we often discover, the devil is in the user experience. Both S3 and Blob Service are designed as building blocks for developers and IT nerds — not for end users. The native admin tools are a non-starter; they exist within an uber-complex web of cloud configuration tools that make it very easy to do the wrong thing. There are a few hideously-complicated apps that all look like 1991 FTP clients. And there are a few options for using the services to manage traditional laptop backups, but they all sound pretty sketchy and that’s not our use case here anyways.

Sounds like a good excuse to write some code! I know I’m repeating myself but … whether it’s your job or not, knowing how to code is the twenty-first century superpower. Give it a try.

The two services are basically equivalent; I chose to use Azure storage because our family is already deep down the Microsoft rabbit hole with Office365. And this time I decided to bite the bullet and deploy the user-facing code using Azure as well — in particular an Azure Static Web App using the Azure Storage Blob client library for JavaScript. You can create a “personal use” SWA for free, which is pretty sweet. Unfortunately, Microsoft’s shockingly bad developer experience strikes again and getting the app to run was anything but “sweet.” At its height my poor daughter was caught up in a classic remote-IT-support rodeo, which she memorialized in true Millennial Meme form.

Anyhoo — the key features of an app to support our family use case were pretty straightforward:

  1. Simple user experience, basically a “big upload button”.
  2. Login using our family Office365 accounts (no new passwords).
  3. A segregated personal space for each user’s files.
  4. An “upload” button to efficiently push files directly into the Archive tier.
  5. A “thaw” button to request that a file be copied to the Cool tier so it can be downloaded.
  6. A “download” button to retrieve thawed files.
  7. A “delete” button to remove files from either tier.

One useful feature I skipped — given that the “thawing” process can take about fifteen hours, it would be nice to send an email notification when that completes. I haven’t done this yet, but Azure does fire events automatically when rehydration is complete — so it’ll be easy to add later.

For the rest of this post, we’ll decisively enter nerd-land as I go into detail about how I implemented each of these. Not a full tutorial, but hopefully enough to leave some Google crumbs for folks trying to do similar stuff. All of the code is up on github in its own repository; feel free to use any of it for your own purposes — and let me know if I can help with anything there.

Set up the infrastructure

All righty. First you’ll need an Azure Static Web App. SWAs are typically deployed directly from github; each time you check in, the production website will automatically be updated with the new code. Set up a repo and the Azure SWA using this quickstart (use the personal plan). Your app will also need managed APIs — this quickstart shows how to add and test them on your local development machine. These quickstarts both use Visual Studio Code extensions — it’s definitely possible to do all of this without VSCode, but I don’t recommend it. Azure developer experience is pretty bad; sticking to their preferred toolset at least minimizes unwelcome surprises.

You’ll also need a Storage Account, which you can create using the Azure portal. All of the defaults are reasonable, just be sure to pick the “redundancy” setting you want (probably “Geo-redundant storage”). Once the account has been created, add a CORS rule (in the left-side navigation bar) that permits calls from your SWA domain (you’ll find this name in the “URL” field of the overview page for the SWA in the Azure portal).

Managing authentication with Active Directory

SWAs automatically support authentication using accounts from Active Directory, Github or Twitter (if you choose the “standard” pricing plan you can add your own). This is super-nice and reason alone to use SWA for these simple little sites — especially for my case where the users in question are already part of our Azure AD through Office365. Getting it to work correctly, though, is a little tricky.

Code in your SWA can determine the users’ logged-in status in two ways: (1) from the client side, make an Ajax call to the built-in route /.auth/me, which returns a JSON object with information about the user, including their currently-assigned roles; (2) from API methods, decode the x-ms-client-principal header to get the same information.

By default, all pages in a SWA are open for public access and the role returned will be “anonymous”. Redirecting a user to the built-in route /.auth/aad will walk them through a standard AD login experience. By default anyone with a valid AD account can log in and will be assigned the “authenticated” role. If you’re ok with that, then good enough and you’re done. If you want to restrict your app only to specific users (as I did), open up the Azure portal for your SWA and click “Role management” in the left-side navigation bar. From here you can “invite” specific users and grant them custom roles (I used “contributor”) — since only these users will have your roles, you can filter out the riff-raff.

Next you have to configure routes in the file staticwebapp.config.json in the same directory with your HTML files to enforce security. There’s a lot of ways to do this and it’s a little finicky because your SWA has some hidden routes that you don’t want to accidentally mess with. My file is here; basically it does four things:

  1. Allows anyone to view the login-related pages (/.auth/*).
  2. Restricts the static and api files to users that have my custom “contributor” role.
  3. Redirects “/” to my index.html page.
  4. Redirects to the AD auth page when needed to prompt login.

I’m sure there’s a cleaner way to make all this happen, but this works and makes sense to me, so onward we go.

Displaying files in storage

The app displays files in two tables: one for archived files (in cold storage) and one for active ones that are either ready to download or pending a thaw. Generating the actual HTML for these tables happens on the client, but the data is assembled at the server. The shared “Freezer” object knows how to name the user’s personal container from their login information and ensure it exists. The listFiles method then calls listBlobsFlat to build the response object.

There are more details on the “thawing” process below, but note that if a blob is in the middle of thawing we identify it using the “archiveStatus” property on the blob. Other than that, this is a pretty simple iteration and transformation. I have to mention again just how handy JSON is these days — it’s super-easy to cons up objects and return them from API methods.

Uploading

Remember the use case here is storing big files — like tens to hundreds of gigabytes big. Uploading things like that to the cloud is a hassle no matter how you do it, and browsers in particular are not known for their prowess at the job. But we’re going to try it anyways.

In the section above, the browser made a request to our own API (/api/listFiles), which in turn made requests to the Azure storage service. That works fine when the data packages are small, but when you’re trying to push a bunch of bytes, having that API “middleman” just doesn’t cut it. Instead, we want to upload the file directly from the browser to Azure storage. This is why we had to set up a CORS rule for the storage account, because otherwise the browser would reject the “cross-domain” request to https://STORAGE_ACCT.blob.core.windows.net where the files live.

no preflight cache for PUT, so sad

The same client library that we’ve been using from the server (node.js) environment will work in client-side JavaScript as well — sort of. Of course because it’s a Microsoft client library, they depend on about a dozen random npm packages (punycode, tough-cookie, universalify, the list goes on), and getting all of this into a form that the browser can use requires a “bundler.” They actually have some documentation on this, but it leaves some gaps — in particular, how best to use the bundled files as a library. I ended up using webpack to make the files, with a little index.js magic to expose the stuff I needed. It’s fine, I guess.

The upload code lives here in index.html. The use of a hidden file input is cute but not essential — it just gives us a little more control over the ux. Of course, calls to storage methods need to be authenticated; our approach is to ask our server to generate a “shared access signature” (SAS) token tied to the blob we’re trying to upload — which happens in freezer.js (double-duty for upload and download). The authenticated URL we return is tied only to that specific file, and only for the operations we need.

The code then calls the SDK method BlockBlobClient.uploadData to actually push the data. This is the current best option for uploading from the browser, but to find it you have to make your way there through a bunch of other methods that are either gone, deprecated or only work in the node.js runtime. The quest is worthwhile, though, because there is some good functionality tucked away in there that is key for large uploads:

  • Built in retries (we beef this up with retryOptions).
  • Clean cancel using an AbortController.
  • A differentiated approach for smaller files (upload in one shot) vs. big ones (upload in chunks).
  • When uploading in chunks, parallel upload channels to maximize throughput. This one is tricky — since most of us in the family use Chrome, we have to be aware of the built-in limitation of five concurrent calls to the same domain. In the node.js runtime it can be useful to set the “concurrency” value quite high, but in the browser environment that will just cause blocked requests and timeout chaos. This took me awhile to figure out … a little mention in the docs might be nice folks.

With all of this, uploading seems pretty reliable. Not perfect though — it still dies with frustrating randomess. Balancing all the config parameters is really important, and unfortunately the “best” values change depending on available upload bandwidth. I think I will add a helper so that folks can use the “azcopy” tool to upload as well — it can really crank up the parallelization and seems much less brittle with respect to network hiccups. Command-line tools just aren’t very family friendly, but for what it’s worth:

  1. Download azcopy and extract it onto your PATH.
  2. Log in by running azcopy login … this will tell you to open up a browser and log in with a one-time code.
  3. Run the copy with a command like azcopy cp FILENAME https://STORAGE_ACCT.blob.core.windows.net/CONTAINER/FILENAME --put-md5 --block-blob-tier=Archive.
  4. If you’re running Linux, handy to do #3 in a screen session so you can detach and not worry about logging out.

Thawing

Remember that files in the Archive tier can’t be directly downloaded — they need to be “rehydrated” (I prefer “thawed”) out of Archive first. There are two ways to do this: (1) just flip the “tier” bit to Hot or Cool to initiate the thaw, or (2) make a duplicate copy of the archived blob, leaving the original in place but putting the new one into an active tier. Both take the same amount of time to thaw (about fifteen hours), but it turns out that #2 is usually the better option for cold-storage use cases. The reason why comes down to cost management — if you move a file out of archive before it’s been there for 180 days, you are assessed a non-trivial financial penalty (equivalent to if you were using an active tier for storage the whole time). Managing this time window is a hassle and the copy avoids it.

So this should be easy, right? Just call beginCopyFromURL with the desired active tier value in the options object. I mean, that’s what the docs literally say to do, right?

Nope. For absolutely no reason that I can ascertain online, this doesn’t work in the JavaScript client library — it just returns a failure code. Classic 2020 Microsoft developer experience … things work in one client library but not another, the differences aren’t documented anywhere, and it just eats hour after hour trying to figure out what is going on via Github, Google and Stack Exchange. Thank goodness for folks like this that document their own struggles … hopefully this post will show up in somebody else’s search and help them out the same way.

Anyways, the only approach that seems to work is to just skip the client library and call the REST API directly. Which is no big deal except for the boatload of crypto required. Thanks to the link above, I got it working using the crypto-js npm module. I guess I’m glad to have that code around now at least, because I’m sure I’ll need it again in the future.

But wait, we’re still not done! Try as I might, the method that worked on my local development environment would not run when deployed to the server: “CryptoJS not found”. Apparently the emulator doesn’t really “emulate” very well. Look, I totally understand that this is a hard job and it’s impossible to do perfectly — but it is crystal clear that the SWA emulator was hacked together by a bunch of random developers with no PM oversight. Argh.

By digging super-deep into the deployment logs, it appeared that the Oryx build thingy that assembles SWAs didn’t think my API functions had dependent modules at all. This was confusing, since I was already dependent on the @azure/storage-blob package and it was working fine. I finally realized that the package.json file in the API folder wasn’t listing my dependencies. The same file in the root directory (where you must run npm install for local development) was fine. What the f*ck ever, man … duping the dependencies in both folders fixed it up.

Downloading and Deleting

The last of our tasks were to implement download and delete — thankfully, not a lot of surprises with these. The only notable bit is setting the correct Content-Type and Content-Disposition headers on download so that the files saved as downloaded files, rather than opening up in the browser or whatever other application is registered. Hooray for small wins!

That’s All Folks

What a journey. All in all it’s a solid little app — and great functionality to ensure our family’s pictures and videos are safe. But I cannot overstate just how disappointed I am in the Microsoft developer experience. I am particularly sensitive to this for two reasons:

First, the fundamental Azure infrastructure is really really good! It performs well, the cost is reasonable, and there is a ton of rich functionality — like Static Web Apps — that really reduce the cost of entry for building stuff. It should be a no-brainer for anyone looking to create secure, modern, performant apps — not a spider-web of sh*tty half-assed Hello World tutorials that stop working the day after they’re published.

Even worse for my personal blood pressure, devex used to be the crown jewel of the company. When I was there in the early 90s and even the mid 00s, I was really, really proud of how great it was to build for Windows. Books like Advanced Windows and Inside OLE were correct and complete. API consistency and controlled deprecation were incredibly important — there was tons of code written just to make sure old apps kept working. Yes it was a little insane — but I can tell you it was 100% sincere.

Building for this stuff today feels like it’s about one third coding, one third installing tools and dependencies, and one third searching Google to figure out why nothing works. And it’s not just Microsoft by any means — it just hurts me the most to see how far they’ve fallen. I’m glad to have fought the good fight on this one, but I think I need a break … whatever I write next will be back in my little Linux/Java bubble, thank you very much.  

Fake Neurons Are Cool

Back when I was in college, getting a Computer Science degree meant taking a bunch of somewhat advanced math courses. My math brain topped out at Calc2, so clearly I was going to have to work the system somehow. Thus was born my custom-made “Cognitive Science” degree, a combination of the cool parts of Psychology with the cool parts of CS. Woot! My advisor in the degree was Jamshed Bharucha, who has done a ton of really cool work trying to understand how we perceive music.

In retrospect it was an awesome time to be learning about artificial intelligence. Most approaches still didn’t work very well (except in limited domains). The late-80s hype around expert systems had petered out, and the field overall was pretty demoralized. But blackboards and perceptrons were still bopping around, and I was particularly enamored with the stuff that Rodney Brooks (eventually of Roomba fame) was doing. What was great for a student was that all of these ideas were still relatively simple — you could intuitively talk about how they worked, and the math was approachable enough that you could actually implement them. Today it’s much harder to develop that kind of intuition from first principles, because everything is about mind-numbing linear algebra and layer upon layer of derivatives (on the other hand, today the algorithms actually work I guess).

Most notably for me, the classic 1986 backpropagation paper by Rumelhart / Hinton / Williams was just gaining traction as I was learning all of this. Backprop basically restarted the entire field and, coupled with Moore’s Law, set the stage for the pretty incredible AI performance we take for granted today. Dr. Bharucha saw this happening, and tapped me to write a graphical neural net simulator on the Mac that we used to teach classes. Sadly, while you can still find a few osbscure mentions of DartNet around the web (including a tiny screenshot), it seems that the code is lost — ah well.

Nostalgia aside, I have been noodling an idea for a project that would require a network implementation. There are a metric ton of really, really good open source options to choose from, but I realized I didn’t really remember the details how it all worked, and I don’t like that. So with the help of the original paper and some really nice, simple reference code I set about to get refreshed, and figured that others might enjoy a “101” as well, so here we go.

Real Neurons are Really Cool

We do our thinking thanks to cells called Neurons. In combination they are unfathomably complex, but individually they’re pretty simple, at least at a high level. Neurons basically have three parts:

image credit w/thanks to Wikimedia
  1. The cell body or soma, which holds the nucleus and DNA and is a lot like like any other cell.
  2. Dendrites, branching tendrils which extend from the cell body and receive signals from other cells.
  3. The axon, a (usually) single long extension from the cell body that sends signals to other cells.

Neurons are packed together so that axons from some neurons are really close to dendrites from others. When a neuron is “activated”, its axon releases chemicals called neurotransmitters, which travel across the gap (called a synapse) to nearby dendrites. When those dendrites sense neurotransmitters, they send an electrical pulse up towards their cell body. If enough dendrites do this at the same time, that neuron also “activates”, sending the pulse up the axon which responds by releasing more neurotransmitters. And she tells two friends, and she tells two friends… and suddenly you remember your phone number from 1975.

It doesn’t quite work this way, but imagine you’ve got a neuron in your head dedicated to recognizing Ed Sheeran. Dendrites from this neuron might be connected to axons from the neuron that recognizes redheads, and the one for Cherry Seaborn, and the one for British accents, and dozens of others. No single dendrite is enough to make the Ed Sheeran neuron fire; it takes a critical mass of these inputs firing at the same time to do the job. And some dendrites are more important than others — the “shabbily dressed” neuron probably nudges you towards recognizing Ed, but isn’t nearly as powerful as “hearing Galway Girl”.

Pile up enough neurons with enough connections and you end up with a brain. “Learning” is just the process of creating synapses and adjusting their strengths. All of our memories are encoded in these things too. They’re why I think of my grandparents’ house every time I smell petrichor, and why I start humming Rocky Mountain High whenever I visit Overlake Hospital. It’s just too freaking amazing.

Fake Neurons

People have been thinking about how to apply these concepts to AI since the 1940s. That evolution itself is fascinating but a bit of a side trip. If we fast-forward to the early 1980s, the state of the art was more-or-less represented in Minsky and Papert’s book Perceptrons. In brief (and I hope I don’t mess this up too much):

  1. Coding a fake neuron is pretty easy.
  2. Coding a network of fake neurons is also pretty easy, albeit computationally intense to run.
  3. Two-layer, fully-connected networks that link “input” to “output” neurons can learn a lot of things by example, but their scope is limited.
  4. Multi-layer networks that include “hidden” neurons between the inputs and outputs can solve many more problems.
  5. But while we understood how to train the networks in #3, we didn’t know how to train the hidden connections in #4.

The difference between the networks in #3 and #4 is about “linearity”. Imagine your job is to sort a pile of random silverware into “forks” and “spoons”. Unfortunately, you discover that while many pieces are pretty obviously one or the other, there are also a bunch of “sporks” in the mix. How do you classify these sporkish things? One super-nerdy way would be to identify some features that make something “forky” vs. “spoony” and plot examples on a chart (hot tip: whenever you see somebody building “graphs” in PowerPoint as I’ve done here, you should generally assume they’re full of crap):

If we measure the “tine length” and “bowl depth” of each piece, we can plot it on this graph. And lo and behold, we can draw a straight line (the dotted one) across this chart to separate the universe quite accurately into forks and spoons. Sure, the true sporks in the lower-left are tough, as are weirdo cases like the “spaghetti fork” represented by the mischaracterized red dot on the right. But by and large, we’ve got a solid classifier here. The dividing line itself is pretty interesting — you can see that “tine length” is far more “important” to making the call than the bowl depth. This makes intuitive sense — I’ve seen a lot of shaped forks, but not a lot of spoons with long tines.

This kind of classification is called “linear regression,” and it is super-super-powerful. While it’s hard for us to visualize, it will work with any number of input parameters, not just two. If you imagine adding a third dimension (z axis)  to the chart above, a flat plane could still split the universe in two. A whole bunch of AI you see in the world is based on multi-dimensional linear classifiers (even the T-Detect COVID-19 test created by my friends at Adaptive).

But there are a bunch of things linear classifiers can’t do — a good example being complex image recognition (dog or fried chicken?). Enter the multi-layered neural network (#4 in the list above). Instead of straight lines, these networks can draw distinctions using complex curves and even disjoint shapes. Super-cool … except that back in the early 80s we didn’t know how to train them. Since carefully hand-crafting a network with thousands of connections is laughable, we were kind of stuck.

I already gave away the punchline — in 1986 some super-smart folks solved this dilemma with a technique they called “backpropagation.” But before we dig into that, let’s look a little more closely at how artificial neural nets are typically put together.

Network Layers and Forward Propagation

I alluded to the fact that our brains are generally a jumble of interconnected neurons. Some connections are predetermined, but most come about as we learn stuff about the world. The interconnectedness is massively complex — our artificial versions are much simpler, because we’re just not as smart as Nature. Still, there is a lot going on.

Fake neurons are arranged into “layers”, starting with the input layer. This input layer is where features (tine length, etc.) are presented to the system, usually as floating point numbers and ideally normalized to a consistent range like 0 to 1 (normalizing the inputs lets the network assess the importance of each feature on its own). The last layer in a network is the “output” layer, which is where we read out results. The output layer might be a single neuron that provides a yes/no answer; or it might be a set of neurons, each of which assesses the probability of the inputs representing a particular thing, or something in between.

In between these two layers is usually at least one “hidden” layer. The number of neurons in these layers is up to the network designers — and there aren’t a ton of “rules” about what will work best in any specific situation. This is true of most “hyperparameters” used to tune AI systems, and selection usually comes down somewhere between a random guess and trying a whole bunch to see what performs the best. And we think we’re so smart.

Every neuron in layer N is connected via an artificial synapse to every neuron in layer N + 1. The “strength” of each synapse is represented by a floating-point value called a “weight”. Generating an output from a given set of inputs is called a “forward pass” or “forward propagation” and works like this:

  1. Assign each input value to the input neurons.
  2. For each neuron N in the first hidden layer,
    1. For each neuron N’ in the layer below,
      1. Calculate the value sent from N’ to N by multiplying the value of N’ with the weight of the synapse between them.
    1. Sum these values together to get total input value for neuron N.
    1. Add the “bias” value of N to the sum. Intuitively this bias allows each neuron to have a “base level” importance to the system.  
    1. Apply an “activation” function to the sum to determine the final output value of N (see discussion of activation functions below).
  3. Repeat step 2 for each subsequent network layer until the output layer is reached.

Activation functions are interesting. In a very practical sense, we need to normalize the output values of each neuron — if we didn’t, the “sum” part of the algorithm would just keep growing with every layer. Using a non-linear function to perform that normalization enables the non-linear classification we’re trying to build. Remember that real neurons are binary — they either fire or they do not — a very non-linear operation. But artificial networks tend to use something like the sigmoid function (or actually ever more complicated ones these days) that have the added benefit of being efficient at a learning approach called gradient descent (I know, more terms … sorry, we’ll get there).

It’s hard to describe algorithms in English. Hopefully that all made sense, but said more simply: artificial neural networks arrange neurons in layers. Activation of the neurons at each layer is calculated by adding up the activations from the layer below, scaled by weights that capture the relative importance of each synapse. Functions transform these values into final activations that result in non-linear output values. That’s good enough.

Training the Network / Enter Backpropagation

Backprop is one of those algorithms that I can fight through with pen and paper and really understand for about five seconds before it’s lost again. I take some pride in those five seconds, but I wish I was better at retaining this stuff. Ah well — I can at least hold onto an intuitive sense of what is going on — and that’s what I’ll share here.

We can train a network by showing it a whole bunch of input samples where we know what the output should be (this is called supervised learning). The network is initialized with a set of totally random weights and biases, then a forward pass is done on the first sample. If we subtract the (pretty much random) outputs we get from the correct/expected results, we get an “error” value for each output node. Our goal is to use that error plus a technique called “gradient descent” to adjust the weights coming into the node so that the total error is smaller next time. Then we run the other samples the same way until the network either gets smart or we give up.

Gradient descent is a very simple idea. Considering one synapse (weight) in our network, imagine a chart like the one here that plots all the possible weight values against the errors they produce. Unless the error is totally unrelated to the weight (certainly possible but then it is all random and what’s the meaning of life anyways), you’ll end up with a curve, maybe one like the dotted line shown below. Our job is to find the weight value that minimizes error, so we’re trying to hit that lower trough where the green dot is.

If our initial random stab is the red dot, we want to move the weight “downhill” to the left. We don’t know how far, but we can see that the slope of the curve is pretty steep where we are, so we can take a pretty big step. But oops, if we go too far we end up missing the bottom and land somewhere like the gold dot. That’s ok — the slope is shallower now, so we’ll try again, taking a smaller step to the right this time. And we just keep doing that, getting closer and closer to the bottom where we want to be.

Alas, the big problem with gradient descent is represented by the purple dot, called a “local minimum”. If our initial random weight puts us near that part of the curve, we might accidentally follow it downhill to the purple and get “stuck” because the slope there is zero and we never take a big enough step to escape. There are various ways to minimize (ha) this problem, all of which amount in practice to jiggling the dot to try to and shake it loose. Fun stuff, but I’m just going to ignore it here.

Anyways, it turns out that something called the “chain rule” lets us figure out the rate of change of the error at each output node with respect to each incoming weight value. And once we know that, we can use gradient descent to adjust those weights just like we did with the red dot. And it also enables us to iteratively distribute errors through the lower layers, repeating the process. I would just embarrass myself trying to explain all the math that gets us there, but I grasped it just long enough to implement it here.

Again trying to wrap all this up, in short we train a network by (a) computing how poorly it performs on known input/output combinations, (b) divvying up that error between the synapses leading to the outputs and using that to update weight values, then (c) iteratively pushing the error to the next lower layer in the network and repeating the process until we get to the bottom. Do this enough times and we end up with a network that (usually) does a good job of classification, even on inputs it hasn’t seen before.

Show Me the Money (Code)

Matrix math bugs me. OK, mostly I’m jealous of the way other folks toss around “transposes” and “dot products” and seemingly know what they’re talking about without sketching out rows and columns on scrap paper. I suspect I’m not alone. But it turns out that having a solid Matrix class really simplifies the code required for working with fake neurons. So that’s where we’ll start, in Matrix.java. There is absolutely nothing exciting in this file — it just defines a Matrix as a 2D array of doubles and provides a bunch of super-mechanical operations and converters. I like the vibe of the iterate and transform methods, and it’s helpful to understand how I coded up equality tests, but really let’s move on.

Network.java is where all the magic really happens. Network.Config defines parameters and can also fully hydrate/dehydrate the state of weights and biases. One thing to be careful of — I put in a little hook to provide custom activation functions, but right now the code ignores that and always uses sigmoid. Beyond all of that housekeeping, there are three bits of code worth a closer look: forwardPass, trainOne and trainAndTest:

There are two versions of the forwardPass method: a public one that just returns an output array, and an internal one that returns activation values for all neurons in the network. That internal one does the real work and looks like this:

	private List<Matrix> forwardPassInternal(double[] input) {

		List<Matrix> results = new ArrayList<Matrix>();
		results.add(new Matrix(input, 0, cfg.Layers[0]));

		for (int i = 0; i < weights.size(); ++i) {

			Matrix layer = weights.get(i).multiply(results.get(i));
			layer.add(biases.get(i));
			layer.transform(v -> activation.function(v));

			results.add(layer);
		}

		return(results);
	}

The “results” list has one entry for each layer in the network, starting with input and ending with output. Each entry is a Matrix, but keep in mind that it’s really just a simple array of activation values for each neuron at that layer (rows = # of neurons, columns = 1). We initialize the list by copying over the input activation values, then iterate over each layer computing its activation values until we get to the output. This is just an actual implementation of the forward propogation pseudocode we discussed earlier.

Training is also just a few lines of code, but it is a bit harder on the brain:

	public void trainOne(double[] vals) {

		// forwardprop

		List<Matrix> results = forwardPassInternal(vals);

		// backprop
		
		Matrix errors = new Matrix(vals, numInputs(), numInputs() + numOutputs());
		errors.subtract(results.get(results.size() - 1));

		for (int i = weights.size() - 1; i >= 0; --i) {

			// figure out the gradient for each weight in the layer
			Matrix gradient = new Matrix(results.get(i+1));
			gradient.transform(v -> activation.derivative(v));
			gradient.scale(errors);
			gradient.scale(cfg.LearningRate);

			// do this before updating weights
			errors = weights.get(i).transpose().multiply(errors);

			// the actual learning part!
			Matrix weightDeltas = gradient.multiply(results.get(i).transpose());
			weights.get(i).add(weightDeltas);
			biases.get(i).add(gradient);
		}
	}

The input to this function is a single array that holds both input and expected output values. Having both in one array is kind of crappy from an interface design point of view, but you’ll see later that it makes some other code a lot easier to manage. Just hold in your head that the inputs start at index 0, and the expected outputs start at index numInputs().

In brief, we take the output from forwardPassInternal and compute errors at the output layer. We then iterate backwards over each set of synapses / weights, computing the rate of change of each error with respect to its incoming weight, scaling that by our learning rate and the incoming activation, and finally adjusting the weights and bias. All of this crap is where the Matrix operations actually help us stay sane — but remember underneath each of them is just a bunch of nested array traversals.

If you’re still with me, the last important bit is really just scaffolding to help us run it all. I won’t copy all of that code here, but to help you navigate:

  1. Input is provided to the method with a TrainAndTestConfig object that defines key parameters and points at the training data file. The data file must be in TSV (tab-separated value text) format, with one row per test case. The columns should be inputs followed by expected outputs — all double values. Note you can provide additional columns to the right that will be passed through to output — these are meaningless to the algorithms but can be a useful tracking tool as we’ll see in the Normalization section later.
  2. The HoldBackPercentage specifies how much of the training set should be excluded from training and used to test performance. If this value is “0”, we train and test with the full set. This is useful for simple cases, but is typically considered bad practice because we’re trying to build a generalized model, not just one that can spit back cases its seen before. The train and test sets are selected randomly.
  3. Once we get the train and test sets figured out, starting at line 412 we finally instantiate a network and train for the number of iterations specified in the configuration file. The code tries to cover all of the training cases while keeping the order of presentation random; probably could be better here but it does the job.
  4. Then at line 428 we run each row from the testSet and produce an array that contains the inputs, outputs, expected outputs, computed errors and (if provided) extra fields. That array gets written to standard output as a new TSV. If cfg.FinalNetworkStatePath is non-null, we dehydrate the network to yet another file, and we’re done.

Let’s Run It Already

You can pretty easily build and run this code yourself. You’ll need git, maven and a JDK installation, then just run:

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox && mvn clean package install
cd ../evolve && mvn clean package
cd datasets
./trainAndTest.sh xor

Here’s some slightly-abridged output from doing just that:

“XOR” is the classic “hello-world” case for backpropagation. It’s a non-linear function that takes two binary inputs and outputs “1” when exactly one input is 1, otherwise “0” (represented by in0, in1, and exp0 in the left highlighted section above). The test files xor-config.json and xor-data.tsv in the datasets directory configure an XOR test that uses a network with one hidden layer of eight neurons, trains over 100,000 iterations and tests with the full data set.

Our little network did pretty good work! The “out0” column shows final predictions from the network, which are very very close to the ideal 0 and 1 values. The right-side highlight gives a good sense of how the network learned over time. It shows the average error at intervals during training: our initial random weights were basically a coin flip (.497), with rapid improvement that flattens out towards the end.

I mentioned earlier that setting “hyperparameters” in these models is as much an art as a science. It’s fun to play with the variables — try changing the learning rate, the number of hidden layers and how many neurons are in each of them, and so on. I’ve found I can burn a lot of hours twiddling this stuff to see what happens.

Normalization

So XOR is cute, but not super-impressive — let’s look at something more interesting. Way back in 1987 Jeff Schlimmer extracted observable data on a bunch of poisonous and edible mushrooms from the Audobon Society Field Guide. More recently in 2020, some folks refreshed and expanded this dataset and have made the new version available under a Creative Commons license — 61,069 training samples, woo hoo! Many thanks to Wagner, Heider, Hattab and all of the other folks that do the often-underappreciated job of assembling data. Everybody loves sexy algorithms, but they’re useless without real-world inputs to test them on.

OK, go big or go home — let’s see if our network can tell us if mushrooms are poisonous. Before we can do that, though, we need to think about normalization.  

The mushroom data set has twenty input columns — some are measurements like “cap-diameter,” others are labels like “gill-spacing” (“c” = close; “w” = crowded; “d” = distant). But our Network model requires that all inputs are floating-point values. Before we can train on the mushroom set, we’ll have to somehow convert each input variable into a double.

The measurements are already doubles, so you might think we can just pass them through. And we can, but there’s a problem. An example — the maximum cap-diameter in the data set is about 62cm, while the max stem-height is just over half of that at 34cm. If we pass through these values unaltered, we will bias the network to treat cap-diameter as more important to classification than stem-height, simply because we’re just pushing more activation into the machine for one vs the other.

To be fair, even if we are naïve about this, over time the network should learn to mute the effect of “louder” inputs. But it would take a ton of training iterations to get there — much better to start from an even playing field. We do this by normalizing all of the inputs to a consistent range like 0 to 1. This is simple to do — just find the minimum and maximum values for each numeric input, and scale inputs at runtime to fit within those bounds. Not too bad.

But what about the label-based inputs like “gill-spacing?” “Close”, “crowded” and “distant” don’t naturally fit any numeric scale — they’re just descriptions of a feature. One option is to search the input for all unique values in each set, and then evenly distribute between 0 and 1 for each one, e.g.,:

  • Close = c = 0
  • Crowded = w = 0.5
  • Distant = d = 1

The only problem with this approach is that the numbers are assigned in arbitrary order. Remember back to forks and spoons — we’re trying to find lines and curves that segment the input space into categories, which works best when “similar” inputs are close to each other. This makes intuitive sense — a mushroom with a 1cm cap-diameter is more likely to be related to one measuring 1.2cm vs 40cm.

In the gill-spacing case above, the order shown is actually pretty good. But how would you arrange a “cap-surface” value of fibrous, grooved, scaly or smooth? Sometimes we just have to do the best we can and hope it all works out. And surprisingly, with neural networks it usually does.

Of course, in most real-world data sets you also have to decide how to deal with missing or malformed data in your set. But we’ve covered a ton of ground already; let’s leave that for another day.

Normalize.java has our implementation of all of this as used for the mushroom dataset. It tries to auto-detect column types, assigns values, and outputs the normalized data to a new file. It can also provide a data dictionary to aid in reverse engineering of network performance.

So are we mushroom experts now?

I ran trainAndTest.sh with the normalized data and a network configured for two hidden layers of forty neurons and a 5% holdback — the results after 2M training iterations are pretty amazing! The graph shows average error by iteration along the way — just as with XOR we start with a coinflip and finish far better; our average error is just 0.009. If we bucket our outputs using .5 as the threshold, we’d be incorrect in only 8 out of 3,054 samples in the test set — a 99.7% success rate — not too shabby.

Of the eight we missed, the good news I suppose is that we failed “the right way” — we called edible mushrooms poisonous rather than vice versa. Unfortunately this wasn’t always true in other runs — I saw cases where the network blew it both ways. So do you eat the mushroom using a classifier that is 99.7% accurate? It turns out that in this case you don’t have to face that hard question — the authors got to 100% success using a more complex learning model called a “random forest” — but risk and philosophical grey areas always lurk around the edges of this stuff, just like with human experts.

Whew — a lot there! But you know, we’re only caught up to about 1990 in the world of artificial intelligence. The last few years especially have seen an explosion in new ideas and improved performance. I am perfectly confident reaching into the back seat to grab a soda while my Tesla does the driving for me — at 75mph on a crowded highway. Seriously, can you believe we live in a world where that happens? I can’t wait to see what comes next.

What should we watch tonight?

Just before Thanksgiving, my wife had a full knee replacement. If you haven’t seen one of these, the engineering is just amazing. They’re flex-tested over 10 million cycles. L’s implant even has online installation instructions. And her doc is a machine, doing four to five surgeries in a typical day. It’s crazy.

But for all that awesome, the process still sucks. There is a ton of pain, and it’s a long, tough recovery. I am incredibly impressed with the way that L handled it all. My job was to try and make it bearable, part of which meant being Keeper of the Big Whiteboard of Lists.

I’ve blurred out all the medical junk in that image, but left what clearly became the most important part — the TV watchlist. Despite having effectively the entire media catalog of the world at our fingertips, somehow we are always the couple on the couch swapping “I dunno, what do you want to watch?” And when we do finally make a decision, who the heck can remember which of our two dozen streaming channels that particular show lives on?

A few months on from the surgery, the whiteboard is back in storage, but the TV list is more relevant than ever. We watch shows with our far-away adult kids too, which has the same issues times ten. All in all, a great excuse to dive into some new technology. Our household is pretty much all-in on Roku, so that’s where I chose to start.

If you just want the punchline, click over to Roku Sidecar. If you’re interested in the twists and turns and neat stuff I learned along the way, read on.

TV apps are still mostly dumb

I last poked at built-for-TV apps back around 2003, when my friend Chad S. was trying to start up a “new media” company. I originally met Chad at drugstore.com, where he was telling us all that blueberry extract was going to change the world (he’s an idea guy — kind of a smarter version of Michael Keaton in the movie Night Shift). At that point cable set-top boxes were the primary way folks watched TV, and they were at least theoretically codable. Chad’s concept was to build “overlay” apps that wrapped extra content around videos — kind of like WatchParty does with its chat interface today. It was a great idea and super-fun to explore, but back then there just wasn’t enough CPU, memory and (most importantly) bandwidth to make it happen. The development environment was also super-proprietary, expensive, and awful. Ah well.

Fast forward to today. My first idea was to build a channel (app) for the Roku to display a shared watchlist. Because it’s still (and will probably always be) a pain to work with text and even moderately complex user interfaces on the TV, I was imagining a two-part application: a web app for managing the list, and a TV app for browsing, simple filtering and launching shows. After a little while playing with the web side, I realized it was going to be way more efficient to just use a Google Sheet instead of a custom app — it already has the right ux and sharing capabilities way better than anything I was going to write.

That left the TV app. Roku has a robust developer story, but it’s clear that their first love is for content providers, not app developers. The reason so many Roku apps look the same is that they are created with Direct Publisher. Rather than write any code, you just provide a JSON-formatted metadata feed for your videos, host them somewhere, and you get a fully-functional channel with the familiar overview and detail pages, streaming interface, and so on. It’s very nice and now I understand why so many tiny outfits are able to create channels.

On the other hand, for apps that need custom layout and behaviors, the SDK story is a lot more clunky — like writing an old Visual Basic Forms application. User interface elements are defined with an XML dialect called SceneGraph; logic is added using a language they invented called BrightScript. There is a ton of documentation and a super-rich set of controls for doing video-related stuff, but if you stray away from that you start to find the rough edges. For example, you cannot launch from one Roku channel into another one — which put a quick end to my on-TV app concept.

Of course this makes perfect business sense; video is their bread and butter and it’s why I own a bunch of their devices. I’m not knocking them for it, it’s just a little too bad for folks trying to push the envelope. In any case, time well spent — it’s always entertaining to dig under the covers of technologies we use every day. If you want to poke around, this page has everything you need. Honestly, my favorite part was the key combination to enable developer settings — takes me back to Mortal Kombat:

Roku ECP saves the day

But wait! It turns out that Roku has another interface — the “External Control Protocol” at http://ROKU_ADDR:8060. It seems a little sketch that any device on the local network can just lob off unauthenticated commands at my Roku, but I guess the theory is that worst case I get a virus that makes me watch Sex and the City reruns? Security aside, it’s super-handy and I’m glad I ran into it before giving up on my app.

In a nutshell, ECP lets an app be a Roku remote control. You can press buttons, launch channels, do searches, send keystrokes, that kind of thing. There are also commands to query the state of the Roku — for example, get a list of installed channels or device details. It’s primarily meant for mobile apps, but it works for web pages too, with a few caveats:

  1. Because requests must originate from the local network, you can only make calls from the browser (unless you’re running a local web server I guess).
  2. But, the Roku methods don’t set CORS headers to allow cross-origin resource sharing. This is frankly insane — it means that the local web page has to jump through hoops just to make calls, and (much) worse can’t actually read the content returned. This renders methods like /query/apps inaccessible, which is super-annoying.
  3. It’s also kind of dumb that they only support HTTP, because it means that the controlling web page has to be insecure too, but whatever.

Probably I should have just written a mobile app. But I really wanted something simple to deploy and with enough real estate to comfortably display a reasonably-sized list, so I stuck to the web. The end result seems pretty useful — the real test will be if the family is still using it a few months from now. We’ll see.

Roku Sidecar

OK, let’s see how the app turned out. Bop on over to Roku Sidecar and enter your Roku IP Address (found under Settings / Network / About on the TV). This and all other settings for the app are purely local — you’re loading the web page from my server, but that’s it — no commands or anything else are sent or saved there.  

The first thing you’ll notice is the blue bar at the top of the page. The controls here just drive your Roku like your remote. Most immediately useful is the search box which jumps into Roku’s global search. This landing page is where the rest of the buttons come in handy — I certainly wouldn’t use this page instead of my remote, but when I’m searching for something and just need a few clicks to get to my show, it works great.

You can also just launch channels using the second dropdown list. Unfortunately, the issue about CORS headers above means I can’t just get a list of the channels installed on your Roku — instead I just picked the ones I think most people use. This is awful and the second-worst thing about the app (number one will come up later). That said, my list covers about 90% of my use of the Roku; hopefully it’ll be similar for you.

To use the actual watchlist features, you’ll need to hook the page up to a Google Sheet. You can actually use any openly-available TSV (tab-separated values) file, but Google makes it super-easy to publish and collaboratively edit, so it’s probably the best choice. The sheet should have four columns, in this order:

  • Show should be the full, official name so Roku finds it easily.
  • Channel is optional but helps us jump directly to shows.
  • Tag is any optional string that groups shows; used for filtering the list.
  • Notes is just to help you remember what’s what.

Publish your sheet as a TSV file by choosing File / Share / Publish to Web, change the file type dropdown from “Web page” to “Tab-separated values (.tsv)”, and copy the resulting URL (note this technically makes the data available to anybody with that URL). Paste the URL into the “Watchlist URL” box back on Sidecar and click Update. If all goes well, you should see your shows displayed in a grid. Use the buttons and checkboxes on the right to show/hide shows by their tag, and click the name of the show to start watching!

If you’ve listed one of the known channels with the show, most of the time it will just start playing, or at least drop you on the show page within the channel. Whether this happens or not is up to Roku — the ECP command we send says “Search for a show with this name on this channel, and if you find an exact match start playing it.” If it doesn’t find an exact match, you’ll just land on the Roku search page, hopefully just a click or two away from what you want.

This “maybe it’ll work” behavior is the most annoying thing about the app, at least to me. The dumb thing is, there’s actually an API specifically made to jump to a specific title in a channel. But Roku provides no way for an app to reliably find the contentId parameter that is used to specify the show! There is already a public Roku search interface; how hard would it be to return these results as JSON with content and channel ids? I messed around with scraping the results but just couldn’t make it work well. Bummer. Separately, it seems like YouTube TV isn’t playing nice with the Roku search, as queries for that channel seem to fail pretty much all the time.

All in all, it’s a pretty nice and tidy little package. I particularly like how everybody in the family can make edits, and use tags to keep things organized without stepping on each other’s entries.

A quick look at the code

roku.html is the single-page-app that drives the ux. To be honest, the CSS and javascript here are pretty crazy spaghetti. It’s the final result of a ton of experimentation, and I haven’t taken the time to refactor it well. My excuse is that I have some other writing I need to get to, but really I just freaking hate writing user interfaces and that’s what most of this app is. Please feel free to take the code and make it better if you’d like!

This is the first time I’ve used localStorage rather than cookies in an app. I’m not sure why — it’s way more convenient for use in Javascript. Of course, these values don’t get sent to the server automatically, so it doesn’t work for everything, but I’m glad to have added it to my toolkit.

callroku.js is ok and would be easy to drop into your own applications pretty much as-is. It’s written from scratch, but as the header of the file indicates I did take the hidden post approach from the great work done by A. Cassidy Napoli over at http://remoku.tv. Because the Roku doesn’t set CORS headers, normal ajax calls fail. We avoid this by adding a hidden form to the DOM, which targets a hidden iframe. To call the ECP interface we set the “action” parameter to the desired ECP url, and submit the form. Browser security rules say we can have a cross-origin iframe on the page, we just can’t see its data. That’s ok for our limited use case, so off we go.

And that’s about it! A useful little app that solves a real problem. Twists and turns aside, I do love it when a plan comes together. Until next time!

Math works, even against COVID

A year after leaving Adaptive Biotechnologies, I am still blown away by the science that happens there. They’ve built a unique combination of wet lab techniques and creative computational analysis, and it continues to pay off. Most recently the team published a paper that describes a novel method for assessing the probable impact of mutations for folks with vaccinated or natural immune memory against a virus (in this case, Omicron vs. previous SARS-CoV-2 strains). Even in the avalanche of COVID noise, this deserves a closer look.

The punchline is that Omicron probably impacts out T-Cell response by about 30%. But it’s how they got to that number that makes me optimistic about our ability to get ahead of this bullsh*t. Starting with immunoSEQ in 2009, Adaptive has built layer upon layer of understanding about how immunity works — relentless focus by smart folks in an insanely complex environment, using math.

I’m going to walk through each of the methods that led up to the most recent paper. I’ll try to give enough detail to make sense, but also provide links to much more. To my ex-colleagues, apologies in advance for everything I get wrong and/or over-simplify.

What’s a T-Cell?

T-Cells are part of our “adaptive immune” system. They are manufactured in our thymus, each with a unique “receptor” site in their DNA, generated at random through VDJ recombination. This receptor sequence translates into a protein structure on the surface of the T-Cell that “binds” with (in practice) exactly one antigen. Antigens are fragments of DNA from invading molecules — including the SARS-CoV-2 virus that causes COVID-19.

T-Cells circulate around our body in blood and lymph, “hunting” for their corresponding antigen. When a match is found, that T-Cell makes a bunch of copies of itself and takes action either by killing infected cells directly (“Cytotoxic” T-Cells) or recruiting B-Cells to go on the attack (“Helper” T-Cells). Assuming the foreign body is killed before the host dies, some of the matching T-Cells eventually retreat into “memory” status, where they lay in wait for the same enemy to return.

Obviously that is a much-abridged version of the story; Khan Academy has a great article that provides more context. For our purposes today, the key points are: (a) T-Cells are central to our bodies’ ability to fight off foreign invaders; and (b) each T-Cell is coded to match a very specific fragment of foreign DNA.

Layer 1: immunoSEQ

Way back in 2009, a your-peanut-butter-my-chocolate moment inspired Harlan Robins and Chris Carlson to invent what became immunoSEQ — a method for cataloging the millions of T-Cell receptor sequences found in a tissue or blood sample. Refined over many years, this assay is now the de facto industry standard for assessing diversity and tracking specific T-Cells in a ton of diverse research and clinical settings.

immunoSEQ works because of the way receptor sequences are generated — each one starts with one of 168 “V(ariable) Gene” archetypes and ends with one of 16 “J(oining) Gene” archetypes. There is extreme diversity between them, but the bounded end sequences make T-Cell receptors amenable to isolation and amplification with Multiplex PCR. Adaptive has created a library of primers that do exactly this. The resulting product is run through Illumina sequencing machines and custom bioinformatics to generate a list of unique receptor sequences in the sample, together with the absolute number of each. This counting process is pretty amazing in and of itself, but not important for this post.  

The final output is a snapshot in time — a quantitative representation of the state of the adaptive immune system. When you consider that T-Cell immunity is central to virtually every disease pathway — viruses, infections, autoimmune conditions, allergies, cancer, everything — it’s pretty easy to imagine how transformative this data can be. clonoSEQ uses it as an early warning system for recurrence in many blood cancers. Diversity of the T-Cell population (or lack thereof) can help inform the use of therapies such as PD-1 blockade. The technology underlies more than 650 published papers. Net net, it’s cool.

Layer 2: Individual receptor affinity

And yet, as amazing as this is, until recently there has been a pretty glaring gap in our understanding of immunoSEQ data. We know that each unique receptor is activated by a specific, unique antigen. But understanding which antigen, and what disease is it associated with, turns out to be something of a holy grail, with many folks hitting it from many different angles.

The physics of this “binding affinity” comes down to the magic of protein folding. The DNA sequence identified by immunoSEQ is translated into a chain of amino acids, which in turn self-arranges into a particular physical shape. That shape is what matches up with a target antigen, just like a key in a lock. But predicting how a particular chain will fold is really, really, really (really) hard. One of the best at it is David Baker at UW1 — he’s made some pretty sweet progress in the last year or two and is slowly getting there.

If we knew how each receptor folded AND could algorithmically pair it to the universe of antigens, we’d be set — but that’s a long road. There are other ways to get at the same information, and Adaptive has pioneered two in parallel: machine learning and MIRA.

Layer 2a: Machine learning

Most of our childhood is spent building up internal recognition algorithms — first everything that moves is a choo-choo, then we figure out cars vs. trucks, then we see sports cars vs. sedans, then finally (some of us) distinguish Chargers from Mustangs, and on and on it goes. Pattern recognition is probably the most central feature of our intelligence, and we are shockingly good at it, at least for things where our five senses can pick out the “features” that matter.

It turns out that machines can also learn this way, so long as we identify the features that distinguish whatever we’re trying to classify. And the cool thing is that for computers, the features don’t have to be things we see/hear/smell/taste/feel. And they don’t even have to be that perfect. If there’s a pattern hiding in there somewhere, the Matrix can pick it out.

Back in 2017, Adaptive figured out how to do this for T-Cell repertoires, using cytomegalovirus (CMV) diagnosis as a proof of concept. The process was basically this:

  1. Collect a bunch of samples that are already categorized as having CMV or not.
  2. Use immunoSEQ to generate the list of T-Cell receptors in each sample.
  3. Apply an algorithm called “Fisher’s exact test” on some of the samples to determine which receptors are more common (“enhanced”) in the CMV+ samples than the CMV- ones.
  4. Use the presence of these “enhanced” receptors as input features to train a model using another subset of the samples.
  5. Test the resulting model on the rest of the samples.
  6. Woot!

The hard parts of this (and this is always the case) are #1 (acquiring training data) and #3 (figuring out the features). I’ve purposely downplayed #4 (training the model) because although there is some insane math hiding in there, we’ve gotten really good at applying it over the last few years — it’s just not a big deal anymore.

Since the model works, we can be quite confident that we picked meaningful T-Cell receptors to serve as features. That’s cool, because it means they almost certainly bind to antigens that are part of the disease we’re interested in. And we can repeat this process for as many diseases as we want to — Adaptive has received emergency use authorization from the FDA for “T-Detect”, a diagnostic that uses a trained model to identify past COVID-19 infections. They’ve also started a trial to diagnose Lyme disease. Ultimately, the goal is to “map” the entire universe of T-Cell receptors and create a “Universal Diagnostic” that really would be a game-changer.

Layer 2b: MIRA

Adaptive has also figured out how to directly observe binding affinity in the lab. MIRA (Multiplexed Identification of T cell Receptor Antigen specificity) was actually first published in 2015, but has been refined year over year by Mark Klinger and his team. This approach uses creative bioinformatics to “multiplex” many antigens against a population of T-Cells in parallel:

  1. Split a sample containing T-Cells into “N” aliquots (“N” may be small or large based on the number of antigens to be tested).
  2. Prepare “N” pools containing the antigens to test. Each antigen is included in a unique subset of the pools (the image makes this much more clear, click to make it bigger).
  3. Challenge each pool with one of the aliquots from step #1.
  4. Sort the resulting T-Cells into positive (bound to one of the antigens in the pool) and negative (did not bind) subsets. This is done using cell sorting technology from folks like Bio-Rad; honestly I don’t know much about the details of how this works — I’m just glad it does.
  5. Use immunoSEQ to identify the T-Cell receptor sequences in each positive subset.
  6. A T-Cell that binds to antigen “X” will show up only in the positive subset for the pools that “X” was added to, enabling us to untangle which T-Cells responded to which antigen.
  7. Bam!

Of course there’s always more to the story. The definition of an “antigen” with respect to MIRA is really quite liberal — any peptide sequence will do. This is super-powerful, but it also glosses over an important challenge. The antigens that T-Cells bind to in real life are short little fragments of foreign DNA, each about 9-15 amino acids long. There are rules about how DNA is cut into these short sequences, but we don’t understand them very well yet (there are models, but their success varies widely). So given a complete genome, say for the SARS-CoV-2 virus that causes COVID-19, it’s not obvious which bits of that (very long) DNA sequence will actually be presented.

If we choose the wrong ones, MIRA might find (real) T-Cell “hits” for antigens that never actually occur in nature. That’s why having multiple ways to get at binding behavior is essential — they reinforce each other. If a T-Cell “lights up” both in MIRA and machine learning models, we can be confident it’s the real deal.

Layer 3: Applying the stack to COVID

Very soon after it became clear that the SARS-CoV-2 was going to f*ck things up for everyone, Adaptive pivoted a ton of work towards trying to figure it out. One thing that came out of this was immuneCODE, an enormous publicly-available repository of immunoSEQ and MIRA data — over 1,400 subjects from all over the world that led to 135,000 high-confidence T-Cell receptors specific to the virus. (Full disclosure: this is one of the last projects I worked on at Adaptive and I’m super-proud of it.)

There’s just a ton of good stuff in there, but this chart really stands out (the version below is from our initial data release; there is more data available in the current drop):

Here’s what you’re looking at. The entire genome of the SAR-CoV-2 virus is stretched out along the X axis. The colored bars denote the different “open reading frames”, most infamously the “surface glycoprotein” in teal starting at index 21,563. This is the section of the genome that codes for the “spike protein” you have surely read about, which makes the virus really good at breaking into healthy cells.

Above the ORF bars is the actual data. The grey areas represent parts of the genome that were “covered” by peptides tested with the MIRA assay. Covering the genome 100% would be cost-prohibitive, so Adaptive used every resource they could find to generate a list of peptides that were likely to occur in nature and probe the most important parts of the virus. In particular you can see the near total coverage at the spike protein. The blue bars in this same area represent unique T-Cells that responded to each part of the virus.

The resulting picture is absolutely magic. It’s a roadmap of the virus that shows us exactly which T-Cells are good at recognizing it, and exactly where on the virus they bind to. Mix in additional assays downstream from MIRA and you can see just how strong the reaction is from each T-Cell. The resolution is mind-blowing.

Remember, this data is 100% public and can be used by anyone. Start with the manuscript and then download/explore it yourself.

Layer 4: Omicron

Now we can finally come back to the paper that started us off — Adaptive’s assessment of the potential impact of Omicron on natural and vaccine-mediated T-Cell immunity.

Omicron has a bunch of mutations that make it much more transmissible (f*ck that) and, it appears at least, somewhat less virulent (thankful for small blessings). These mutations also have the potential to help the virus “escape” the protection we’ve received from vaccines and infections over the last year. Tests that measure antibody response to the virus (antibodies are made by B-Cells, which respond more quickly and visibly than T-Cells) show that some of this escape is indeed happening.

But what about the T-Cells? Again the data isn’t super-definitive yet, but it does appear that fewer people with Omicron end up in the hospital vs. previous variants. One reason for that could be the T-Cell response. T-Cells take a bit of time to rev up, so they aren’t that great at stopping initial infection — but they are aces at shutting it down.

It turns out that the immuneCODE data is perfectly suited to look at this in a quantitative way. This graph snipped from the Omicron paper should look somewhat familiar:       

The X axis here is still position on the genome, but it’s been zoomed in to only show the “surface glycoprotein” section that codes for the spike. Because the spike is so unique to the virus, COVID vaccines are mostly composed of fragments from this ORF. The black lines represent the parts of the spike with a strong T-Cell response, just as we saw before (the data is newer so the shape is a little different).

The red lines here mark where mutations have occurred in the Omicron variant. And here’s the payoff pitch: by observing where the T-Cell responses and mutations overlap, we can see where the response may be negatively impacted. Once you have the map, it’s really quite a simple thing to do. Standing on the shoulders of giants indeed.

The blue line in the chart shows Adaptive’s best guess at what this impact will be — it’s a little more complicated that I imply above, but not much. It depends on the strength of each T-Cell response, and in some cases the overlap isn’t perfect and so some assumptions are required. At the end of the day, Adaptive calculates that we probably lose about 30% of the T-Cell response to Omicron vs. previous variants. Notably, they did this same analysis against the older variants, and the impact is far less.

70% as good as before isn’t great. But all things considered, I’ll take it. Science and industry’s progress against COVID is a modern miracle. It hurts my heart that the public health side is such an utter failure by comparison. But at least for this article, I’m determined to not let that overshadow the awesome.

Math works, and it will carry the day.

1 Folks who know protein folding far better than I do corrected me here — the AlphaFold team at DeepMind (Google) won the biannual CASP protein structure prediction contest in 2018 and again in a dominating fashion in 2020. Protein folding is not yet completely solved but clearly it’s moving fast — super-encouraging!

Putting the smart in smart contracts since this week

In my first “crypto” post I covered a bunch of stuff, mostly in the abstract. All well and good — tons of new concepts to get a handle on — but it ain’t real until it’s running code. So that’s the task for today: build, deploy and run an Ethereum smart contract that does something at least marginally interesting. Things are going to get pretty wonky and probably a little boring if you don’t love code, but everyone is welcome to come along — embrace your nerd!

Round Robin Lotto

We’re going to build a contract I called “Round Robin Lotto” — the full source (there ain’t much) is on github in rrlotto.sol. RRLotto is a game that works like this:

  1. The system initializes a pseudo-random counter between 2 and 25.
  2. Accounts play the lotto by executing the “play” method of the contract and sending along .001 ether ($3.89 on mainnet as I write this).
  3. Each play decrements the counter by 1. When the counter hits zero, three things happen:
    1. The current player receives 95% of all ether in the contract account.
    1. The remaining 5% is sent to the “house” (the account that deployed the contract).
    1. The counter is reset to a new pseudo-random value between 2 and 25 (starting a new round).

The effect is more or less that of a 95% payout slot machine (albeit one with only a single prize). The jackpot will range from 0.0019 ether ($7.39 today) to 0.02375 ether ($92.43) based on the initial counter value for the round. Alert the IRS.

Running a lotto on Ethereum is interesting because (1) there is no true randomness on the blockchain and (2) all code and state data is public. If plays are infrequent, it would be easy for a sneaky actor to write code that waits until the counter is 1 and then plays immediately, winning every time. The trick is to set the maximum counter value roughly equal to the number of active players, making it very difficult to manipulate the order in which plays are processed / mined.

A better solution might be to run the pseudo-random generator on every play, and just use it to pay out with the desired frequency. The problem here is that our “pseudo-random” number is actually completely deterministic based on the current block difficulty and timestamp. Since miners set the timestamp, it’d be easy for them to pick timestamps that result in payouts to known accounts. Of course, this would be a ton of work spent to abuse our piddly little lottery — but it does highlight some of the unique quirks of writing for Ethereum.

And I guess you could also say that we don’t care too much anyways, since in all cases the house walks away with its 5%!

Writing for the EVM

Ethereum smart contracts run within a purpose-built virtual machine environment (the EVM). The “assembly” language of the EVM is a set of opcodes that look more or less as you’d expect. Nobody uses these to actually write code; there are high-level languages for that. Of these, the dominant one is Solidity, which looks a lot like C++ or Java; that’s what we’ll be using.

The primary construct in the EVM is the “contract”. Contracts as expressed in Solidity are just objects — they have constructors, methods and members, support multiple inheritance, etc.. What is quite unique about these contracts is their lifecycle on the blockchain:

  1. The Ethereum blockchain maintains the state of a single huge distributed EVM. All contract instances exist within this shared memory space.
  2. Contracts are instantiated by deploying them to the blockchain. The contract constructor is called during deployment to set up initial state. Really important: if you deploy the contract 3 times, you have created 3 distinct “objects” in the EVM — each with its own contract address and distinct internal state (in our case, running 3 completely independent lottos).
  3. Contract methods are called by sending a transaction to a contract address (just like dereferencing an object pointer). Contracts can use these “pointers” to call each other as well.
    1. All code execution happens within the context of a method call transaction. There are no background threads or scheduled events in the EVM.
    1. There is no parallel execution in the EVM. All code runs in one big single thread.
  4. Contracts can be destroyed when they are no longer needed. This doesn’t remove any of their state or transaction history of course, but it does free up some memory in the EVM and ensures that their methods can no longer be called.

It’s worth reiterating that each node holds every contract and its state in a single EVM instance. All of them run all of the contract-related transactions deployed to the blockchain (miners when creating blocks; validators as part of validating the resulting block states). This can be a little hard to wrap your mind around — the “world computer” is really a crap ton of copies of the same computer. This leads to more interesting quirks of the environment that we’ll see as we keep digging in.

Our contract in Solidity

There are a ton of solid “hello world” tutorials for Solidity; I’m not going to try to replicate that here. Instead, I’ll just walk through the bits and bobs of our contract so you can see how it all fits together. Maybe that ends up being the same thing? We’ll see. Remember that this code is on github; you may find it easier to load that up and see the fragments in context.

pragma solidity ^0.8.10;

While the EVM opcodes are pretty stable, the Solidity compiler and language are still moving relatively quickly. The pragma here with the caret says “I need to be compiled using at least version 0.8.10 but not 0.9.0 or higher.” This is kind of annoying, because probably 0.9.0 will be fine. At the same time, these contracts move real money and so I can see the benefit of being conservative.

contract RoundRobinLotto

This is the name of our contract. Solidity doesn’t care about file names matching this, and you can put multiple contracts into one file — I appreciate the lack of judgment here. This is where you specify inheritance using the “is” keyword (e.g., contract MyNewContract is SomeOtherContract).

address house;
uint countdown;

These are our member variables. house is an “address”; a built-in type that includes methods for working with its balance. We use it to remember the EOA account that deployed the contract and therefore receives the 5% commission. countdown is the pseudo-random pool size we talked about. “uint” is an unsigned 256-bit integer — I could have saved some gas here by using a smaller size (e.g., uint8) and you can also drop the “u” for a signed int (e.g., int32).

uint constant MAX_CYCLE = 25;
uint constant WEI_TO_PLAY = 0.001 ether;
uint constant HOUSE_PERCENTAGE = 5;

Constant variables are just language sugar to make code more readable and maintainable. Note the literal value “0.001 ether” — “ether” there is a keyword that automatically converts its value from ether to “wei”, which is the unit denomination of Ethereum transactions.

constructor() {
    house = msg.sender;
    resetCountdown();
}

Our simple, parameter-less constructor just sets member variables to their initial state. msg is one of a few global variables and functions that supply context or utilities; msg.sender holds the address of the account that initiated the current transaction (in this case, the account that initiated the contract deployment). The countdown member is initialized using the private function described next.

function resetCountdown() private {
    countdown = (uint(keccak256(abi.encodePacked(block.difficulty, block.timestamp))) % (MAX_CYCLE - 1)) + 2;
}

As noted earlier, the EVM doesn’t support true randomness. This makes sense when you think about the block creation protocol. If validation is going to succeed, every node must end up with exactly the same end state. For our purposes, “pseudo-random” works fine, and our private resetCountdown method takes an approach that’s pretty common in the Ethereum world — take some values that are deterministic but not easily predictable (the current network block difficulty and the current block timestamp), compute their hash and cast it to a 256-bit number, then use mod to reduce the result into the desired range. The Keccak256 hash computation is another one of those globally-available functions.

event Payout(address indexed to, uint amount);

This line defines an “event” that our code can emit as a notification when something notable occurs during execution. Events are stored within the transaction log (i.e., on the blockchain), and can be received by off-chain applications that subscribe using methods of a node’s JSON-RPC interface (we’ll talk a lot about the JSON-RPC interface in a bit). Since method return values are inaccessible to off-chain code, events are really the only way to send data back to the outside world.

function play() public payable {
    require(msg.value == WEI_TO_PLAY);

This is the first part of the method called by lotto players. It is marked public so that it can be called by external accounts, and payable so that it can receive ETH. “require” is a global function useful for enforcing conditions — in our case, verifying that the .001 ETH cost to play was sent along with the transaction.  

if (--countdown > 0) {
    return;
}
resetCountdown();
payable(house).transfer(address(this).balance * HOUSE_PERCENTAGE / 100);
uint payout = address(this).balance;
payable(msg.sender).transfer(payout);
emit Payout(msg.sender, payout);

This second part of the play method is where the most interesting stuff happens. The first three lines just exit quickly when the countdown values remains greater than zero. Following that, we reset the counter, pay the house and the msg.sender, and emit our “Payout” event so that listeners can react if desired (e.g., by popping up a congratulations dialog box in the browser).

The ”payable” method casts variables of type “address” into a form that can receive ETH. The “transfer” method atomically transfers value between accounts. Either of these may cause an exception, in which case the transaction will be reverted and all value/state will be reset.

function destroy() houseOnly public {
    selfdestruct(payable(house));
}

This method calls the built-in method “selfdestruct” to destroy the contract, sending any remaining ETH balance to the house. The methods of destroyed contracts cannot be called, and the contract’s state is removed from the EVM. Of course all transaction and state history remains as part of the blockchain.

This function is marked public, but also with the nonstandard modifier “houseOnly”, described next:

modifier houseOnly {
    require(msg.sender == house);
    _;
}

Modifiers are commonly used like this to enforce prerequisites in a readable and resusable way. The method code is “wrapped” with the modifier code — the “_” marker indicates where in this process the method code should be inserted (so both pre- and post- method code can be written).

Compiling the contract

Before we can deploy our contract, we need to compile it into EVM opcodes. I love love love that the Solidity compiler is a single executable. You can install it with your package manager or whatever, but you can also just download one file and be good to go. Maybe it doesn’t take much to make me happy, but in a minute we’ll be using nodejs and it’s just the freaking worst by comparison. Anyways, go here and install solc for your system: https://docs.soliditylang.org/en/v0.8.10/installing-solidity.html.

Once that’s done, the compile is even simpler: solc --bin rrlotto.sol. Assuming you don’t hit any build errors, you’ll get a big binary string representing the compiled contract. Super cool!

Getting ready to deploy

The next step is to deploy the binary contract somewhere so that we can run it. This is the part in most Solidity tutorials where they tell you to use Remix, which is really very cool, but has a ton of under-the-covers magic built in. To help us really understand what is happening where, let’s take a closer-to-the-metal approach. First a few prerequisites — hang in there, it’ll be worth it!

Get some test ether

1 ETH on the actual Ethereum Mainnet goes for about $4,000 USD as of this writing, so we’ll be steering clear of that world. Instead we’re going to deploy on the Ropsten Test Network, where the ETH is free and the living is easy. The first step is to get some of that sweet fake ETH into an account under your control. There are tons of ways to do this; this was my approach:

  1. Add MetaMask to your browser. Set up your wallet, choose the Ropsten network and copy your account number. Important: While you can use the same MetaMask account on both Testnets and Mainnets, I don’t recommend it. I use a different wallet for “real stuff” and my MetaMask account only for testing and development.
  2. Visit a “faucet” site and request some free Ropsten ETH. I like this one; it drops 5 ETH per request which is way more than you’ll need for this exercise. It can take an hour or two for your request to float to the top of their queue; be patient!
  3. In MetaMask, under the “three dot” menu choose Account details / Export Private Key. Enter your password and copy the key; you’ll need this and your account number later.

Get access to a Ropsten node

Our code will work with any node. Running one yourself is semi-complex and deserves an article in its own right, so I suggest that you skip that for now and sign up for a free developer account at https://infura.io/. Once you’re signed in there, create a “project” to get direct access to their JSON-RPC endpoints. Whichever node you use, make sure you’re talking to the Ropsten network there as well.

Add some tools to your environment

We’re going to deploy and test our contract by interacting directly with the Ethereum JSON-RPC interface. I’ve used a few different tools to wrap this up in a set of bash scripts which require access to the following:

  • A bash environment (native on Linux or the Mac, WSL on Windows).
  • curl for making HTTP requests, installed with your package manager or from https://curl.se/download.html.
  • jq for working with JSON, installed from https://stedolan.github.io/jq/download/ (BTW jq is awesome and should be in your toolchest anyways)
  • nodejs and npm installed with your package manager or from https://nodejs.org/en/. Hopefully your installation is less wonky than mine.
  • web3.js, installed once you have node with some variant of “pip install -g web3”. I know, global is bad, blah blah blah.
  • The Ethereum bash scripts themselves from the shutdownhook github; clone the repo or just download the files.

The node and web3.js stuff is needed to support cryptographic signatures — getting signatures right is a finicky business and beyond what I wanted to attempt by hand in bash. Other than this, everything we do will be pretty straightforward and obvious.

Setup environment variables

Our deployment and test scripts rely on three environment variables. You can set these by hand at runtime, or add them to ~/.bashrc so that they’re always available. Note if we were using account details with value on a real production blockchain, I’d be recommending much tighter control over your account’s private key. With great power comes great responsibility.

export ETH_ACCOUNT=0x11111111111111111111111111111111
export ETH_PK=0x2222222222222222222222222222222222222222222222222222222222222222
export ETH_ENDPOINT=https://ropsten.infura.io/v3/00000000000000000000000000000000

The endpoint example above assumes you’re using the https://infura.io/ nodes; the zeros will be replaced by your project identifier shown on their dashboard. ETH_ACCOUNT and ETH_PK are as copied out of MetaMask or your chosen test ETH wallet.

The Ethereum JSON-RPC interface

Most of the blockchain stuff you read about is what happens “on-chain” — how blocks are assembled and mined, how transactions move value around, the EVM operations we detailed earlier, etc.. But none of that happens in a vacuum; it’s all triggered by real-world (“off-chain”) actions using an external API that bridges the two worlds. For Ethereum, this is the JSON-RPC interface exposed by every node on the network.

Nodes expose JSON-RPC over HTTP or WebSockets, typically tied to localhost to prevent unwanted access. “Unwanted” is the operative word here, because for the most part security over the interface isn’t an issue. Transactions are signed before being sent to the node, so private keys never exist on-chain. And all data in the chain is public by definition, so what is there really to protect? Three things to consider: (1) DoS or other attacks at the network level could impact your node’s performance; (2) accepting transactions does use some network and compute resource that you’re presumably paying for; (3) many node implementations DO allow you to configure private keys locally so that you can use functions like eth_sendTransaction for specific accounts without signing on the client side. This last is the source of much confusion and, while I get the convenience factor, it just seems like a bad idea.

HTTP requests to the JSON-RPC interface consist of a POST with a JSON body that identifies the method to call and parameters to send. For example, the following curl command will fetch the current network “gas price”:

$ curl -s -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"eth_gasPrice","params":[],"id":1}' $ETH_ENDPOINT
{"jsonrpc":"2.0","id":1,"result":"0x5968313b"}

The format of the “result” field depends on the method called; in the case of eth_gasPrice the return value is the current price of gas in wei, expressed as a hexadecimal number. This request is packaged up in the eth-gasprice script with a slightly more useful output format:

$ ./eth-gasprice
WEI:  1500006255
GWEI: 1.500006255
ETH:  .000000001

Methods like eth_gasPrice are easy because the data package doesn’t need to be signed. In similar fashion, eth-nonce will return the transaction count for your account (probably zero at this point) and eth-version will just return some info about the software running on your node.

Submitting Transactions via JSON-RPC

Transactions are a little more complicated, in two ways. First, the data package needs to be signed, which we accomplish with the eth-signtx script. This script is a bit of a cheat; we use nodejs to load up the web3.js library and just call its internal method rather than doing it ourselves. Before you give me too hard of a time here, go poke around and you try to make it work in bash alone. 😉 This is the forever story of crypto development: the math to compute hashes and signatures is complex but really not a big deal, but the “setup” to get all of the input bytes in exactly the right format is always finicky black magic. A single bit in the wrong place renders your output useless to the rest of the network. So except in some really simple or really ubiquitous situations, better to just rely on an existing implementation.

The second issue comes from the asynchronous nature of transaction execution. A successful transaction submission returns its “transaction hash”, a handle that you can use to query its status. It can take anywhere from a few seconds to a few minutes for a miner to actually pick up the transaction and get it into a block, and even longer for that block to get enough “confirmations” to be confident that it’s golden.

The eth-sendwei script shows how this works for a simple transaction that just sends ether from one account to another (no smart contracts involved). There’s no law against sending your own ether to yourself, so you can try it out like this:

$ ./eth-sendwei $ETH_ACCOUNT 1000000000000000
Transaction Hash is: 0x84015099d0c6cf6edbe0902257d7b95b51fa47f296b46ad2a8c6f83a470fdf2b
waiting...
waiting...
{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "blockHash": "0xfc47ee3dd2b233996309cfb7b38dac02b793d1117a9731041acbe0efe7a19846",
    "blockNumber": "0xb186a8",
    "contractAddress": null,
    "cumulativeGasUsed": "0x1593e7",
    "effectiveGasPrice": "0x5b3a1690",
    "from": "0x5de0613c745f856e1b1a4db1c635395aabed82c8",
    "gasUsed": "0x5208",
    "logs": [],
    "logsBloom": "0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
    "status": "0x1",
    "to": "0x5de0613c745f856e1b1a4db1c635395aabed82c8",
    "transactionHash": "0x84015099d0c6cf6edbe0902257d7b95b51fa47f296b46ad2a8c6f83a470fdf2b",
    "transactionIndex": "0xd",
    "type": "0x0"
  }
}

The script returns as soon as the block has been mined. “Confirmations” is a term defined as the count of blocks mined since the one your transaction is in; this is captured in the eth-confirmations script (the number will continue to grow as more blocks are mined):

$ ./eth-confirmations 0x84015099d0c6cf6edbe0902257d7b95b51fa47f296b46ad2a8c6f83a470fdf2b
2

$ ./eth-confirmations 0x84015099d0c6cf6edbe0902257d7b95b51fa47f296b46ad2a8c6f83a470fdf2b
6

Wait, what’s that “gas” business?

If you look at the Etherscan transaction details for the transaction above, you’ll note a transaction fee of 0.00003214120392 ether — sending .001 ether from ourselves to ourselves cost us some (fake) money! This amount is paid as a “gas fee” to the node that mines the block our transaction lives in. “Gas” is the resource that makes the Ethereum blockchain work, and it can add up quickly, so it’s important to understand.

First the nuts and bolts. Every Ethereum action is assigned a cost in “units of gas” — e.g., sending ether from one account to another costs 21,000 gas. It costs a certain amount of gas to run each EVM opcode and to store state data in EVM memory. The more code and the more memory, the more gas is consumed. This enables the network to assess the resource cost of running a transaction, which is very important given the Turing-complete nature of the EVM. I could write a method in a smart contract that runs for hours or days — obviously there has to be a way to recoup those costs and prevent bad code from taking over the whole blockchain.

Gas is paid by the submitter of a transaction to the miner that performs the work involved in it. At any given point in time, gas has a price in ether — this number is pure supply and demand, dependent on how many miners are working and how many transactions are running. Actually, even this price is kind of a fiction — when a user submits a transaction, they just say what price they are willing to pay per unit of gas, and miners decide if that price is worth their time. A user can offer zero and maybe some miner will feel charitable, or they can offer a ton of ETH and have miners jump at the opportunity. The “current” gas price is just what the collective market considers reasonable at a point in time.

Transactions are also submitted with a maximum number of gas units the submitter is willing to spend. If the transaction “runs out of gas” before it is completed, an exception is thrown and the transaction is reverted. This is obviously suboptimal, and any unused gas is returned to the submitter, so generally people submit a much higher max value than they expect to be used.

As noted above, value transfers always cost 21,000 gas — but how do you even begin to make an estimate for smart contract code? This really is still a bit of an art — but there are two tools that can help. The solidity compiler can perform static analysis to estimate gas use:

$ solc --gas rrlotto.sol
======= rrlotto.sol:RoundRobinLotto =======
Gas estimation:
construction:
   infinite + 260600 = infinite
external:
   destroy():   32022
   play():      infinite
internal:
   resetCountdown():    infinite

There are a couple of interesting things here. First notice the “infinite” (really should be “unknown”) values — solc is extremely conservative about making estimates. In the resetCountdown method, we perform calculations based on the current block difficulty and timestamp. Since those can’t be known ahead of time, solc just punts, and that bubbles up to the other methods that call resetCountdown. Some of these “punts” are unavoidable in static analysis — others I think just reflect the fact that nobody has really worked too hard on this particular feature yet.

The other thing is that the cost for construction is presented as two numbers. The first one is the cost of execution (infinite as far as solc is concerned), and the second is the cost of state storage in the EVM. Our two values (house and countDown) will cost 260,600 gas to store. Keeping state can get really expensive in Ethereum; it’s definitely in your best interest to use as little as possible.

You can also use the eth_estimateGas JSON-RPC method to estimate the gas needed by a transaction (you can see this call in the eth-estimate-tx script). In this case the transaction is actually “dry-run” in an isolated EVM on the node, without impacting blockchain state, and the actual amount of gas consumed is returned. On the surface this seems like the obvious winner — it’s an exact calculation after all. But not so fast! Depending on the state of the EVM, costs can change significantly. Take for example the play() method in RRLotto … most of the time it decrements a counter and exits quickly. But once in awhile, it executes transactions, emits an event and computes new hash values. In order to be safe, you’d need to call eth_estimateGas with the “worst case” inputs and starting state. That’s not always a simple thing to figure out … so gas estimation remains a fuzzy art.

Finally! Let’s deploy some code

We finally have all the pieces we need to deploy RRLotto to the blockchain. We just need to construct a transaction with:

  1. The “to” field set to null — the null address is a special case address that means “please deploy this smart contract”.
  2. The “data” field set to the binary version of our compiled contract, as output by solc. If your constructor has parameters, this gets a little more complicated. This article does a remarkable job of explaining the details, but I haven’t included that in my scripts yet.

We also need a gas estimate — between solc and eth_estimateGas it looks like we’ll use about 360,000 gas for our constructor and storage. We’ll set a max to 500,000 just to be safe.

The eth-deploy script puts all of this together:

$ ./eth-deploy ../rrlotto/rrlotto.sol 500000
Transaction Hash is: 0x246ba2c89621a63aa49e9a6f3b5de75e60edea6d1132ed9fa187760cc73f9a1d
waiting...
waiting...
{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "blockHash": "0x66715659b4672729f59710b7b5db81d37ad342d26a98fc364176ec328ed5b742",
    "blockNumber": "0xb18a63",
    "contractAddress": "0xe588f20df3c5dad47d66722c2d6c744d3a41593c",
    "cumulativeGasUsed": "0x42e496",
    "effectiveGasPrice": "0x59682f07",
    "from": "0x5de0613c745f856e1b1a4db1c635395aabed82c8",
    "gasUsed": "0x5e701",
    "logs": [],
    "logsBloom":
0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000",
    "status": "0x1",
    "to": null,
    "transactionHash": "0x246ba2c89621a63aa49e9a6f3b5de75e60edea6d1132ed9fa187760cc73f9a1d",
    "transactionIndex": "0xb",
    "type": "0x0"
  }
}

We did it! Our new contract is deployed at the “contractAddress” 0xe588f20df3c5dad47d66722c2d6c744d3a41593c — use this link to see details of what happened:

  • We paid .00058 ether as a transaction fee.
  • The miner at 0x68268… received that transaction fee.
  • The new contract at 0xe588… was created with a 0 ETH balance.

Click the little “down arrow” icon next to the contract address to see the state of storage in the contract. Storage address 0x0 holds the house address and you can see it equals my account. Address 0x1 holds the countDown value and in this case was initialized to 16 (0x10).

It took awhile to get here, but this is a pretty neat milestone. Our smart contract is deployed and active on an actual blockchain. Sure it’s the test network, but the only thing standing between us and Mainnet is about $97 in gas fees (386,817 gas x 63 gwei/gas x $3,984.07 USD/ETH). Have I mentioned that gas is kind of expensive?

That bears repeating — there is nothing stopping us from launching that contract on the Ethereum mainnet other than $97. We don’t need anybody to approve our account, or set up a monthly hosting contract, or anything — it’s just there, live, and completely anonymous managing real money. This is both super-cool and a little unsettling at the same time … no training wheels!

Calling contract methods

With the contract deployed, we can actually play the lotto. Of course, we could do that by submitting a transaction using the JSON-RPC interface — that code is in the play-rrlotto script. But if this web3 thing is going to go anywhere, I’m pretty sure that bash scripts aren’t going to be the ux that makes it happen. Instead, let’s build an actual web site that lets us play. The code for this is in rrlotto.html; hosted here if you want to give it a try live. As always, be prepared to have your mind blown by my web design skills.

The bridge that enables normal web pages to use smart contracts is the humble browser plugin — in our case MetaMask, the wallet application we used earlier to set up our test account. MetaMask and other browser-based wallets “inject” a javascript object (window.ethereum) into every web page you visit. Web developers can access this object from code on their pages, calling smart contract methods, initiating value transactions, and so on. It’s a pretty smooth trick actually.

The window.ethereum object is typically wrapped up in another javascript library to make it easier to use. Web3.js that we saw earlier is the granddaddy of these. For the lotto page I’ve chosen to use ethers just to give you a look at a second (but mostly equivalent) approach. It’s important to be clear on the difference between all of these things:

  • window.ethereum is provided by a browser plugin according to the EIP-1193 standard. It basically makes the JSON-RPC methods accessible to javascript on a web page (it turns out that MetaMask by default passes them through to the same infura.io nodes that we’ve been hitting directly).
  • The software that implements this standard is almost always (and certainly in the case of MetaMask) also a wallet, which has the job of holding account private keys — it isn’t essential that they be the same thing, but it makes web development easier.
  • A third-party javascript library like web3.js or ethers is usually used to make accessing window.ethereum methods simpler. This is purely for developer convenience — while not pleasant, it would be 100% possible to call smart contracts without it.

OK, let’s get down to it. Our page consists of a button (#playButton) and a div to display messages (#output). When the user clicks the button, the first thing we do “wire up” all the Ethereum pieces in connectEthereum starting at line 38:

  1. If we’ve already gone through all of this, just bail out — we’re ready to go.
  2. If the window.ethereum object doesn’t exist, it means the user hasn’t installed an Ethereum plugin — nothing more we can do.
  3. Call the eth_requestAccounts method. MetaMask prompts the user to allow the page access before returning an “unlocked” account number.
  4. Set up the ethers objects we’ll use to call the contract later on. Notice the “abi” parameter we pass when creating the Contract — ethers needs this metadata to be able to format method call transactions properly. We generated the abi structures using solc with the –abi parameter (solc --abi rrlotto.sol).
  5. Attach an event processor to the contract that displays a message to the user when a payout occurs. Remember that method calls made by an EOA (account for a user) can’t see return values directly, so events like this are the primary way that on-chain data comes back our way.

Next at line 66 we do a quick check to verify that MetaMask is configured to use the Ropsten network where our contract is deployed (a list of these identifiers can be found at https://chainlist.org/), and then finally at line 74 we attach our signing provider (with our private key ready to sign transactions) to the contract, call the play() method and wait for it to complete:

var superContract = contract.connect(signer);
tx = await superContract.play({ value: weiToPlay });
log("waiting...<br/>transaction hash is: " + tx.hash);

await tx.wait();
log("transaction complete! <a target='_blank' href='https://ropsten.etherscan.io/tx/" + tx.hash + "'>view on etherscan</a>");

Before calling the method (and sending ETH to the contract!), MetaMask prompts the user for confirmation, providing some guidance on how much gas to send along. On the one hand this flow is amazing and cool — on the other it’s a ux disaster. I’m sure it’ll get worked out over time, but for now grandma ain’t gonna be playing our lotto.

The transaction proceeds asynchronously on the blockchain — on our page we disable the “play” button and wait for confirmation, but the browser overall is ready to do other stuff. When the transaction completes, MetaMask shows an alert dialog and our page comes back to life, ready to play again. Anytime a payout occurs, the event fires and our handler displays a message. Woo hoo!

We made it!

Whew — that was a lot. We wrote and compiled a real smart contract in Solidity, figured out enough about gas fees to know how much it would cost to deploy, called the JSON-RPC interface to deploy the contract, and finally called it from a regular old web page using MetaMask and its Ethereum provider. We wrote in a bunch of languages across a ton of distributed services. I hope it all made sense. If you get stuck please let me know, I’d be happy to help out if I can.

The question I’m left with is … in a world where I’m pretty sure the chains fall over at some point — beyond being an awesome nerdfest, does any of this matter? Will “web3” become a meaningful part of our world, or will it fade away after making a few folks richer and a lot of folks poorer? Still not sure, but I must say I’m enjoying the ride. Until next time!

Rummikub on the Glowforge, a Journey

This summer our neighbors introduced Lara and me to Rummikub, which despite the weird name turns out to be a super-fun game played with a set of 106 tiles numbered 1-13 in four colors/suits plus two jokers. Tile games like this should be as nice to look at as they are to play — but our friends’ set is this awful 1990s sickly off-white plastic pile of junk, with jokers that looked like evil scary clowns. Ew. I set out to design something better on the Glowforge, sure I’d have it done in a few hours.

A few months later and I finally have something I’m mostly happy with. It’s not an incredible heirloom treasure — but it is nice and we’ll have a good time playing it. I learned a bunch of new techniques, and I suppose it’s healthy for the universe to put me in my place every once in awhile. Hopefully you’ll learn from my tribulations and make a set for your family, or at least enjoy a few laughs at my expense as you read along.

The Design

I chose to stick with the original tile size: 38mm high by 26.5mm wide. I could easily fit 6 rows of 13 tiles onto the bed of the Glowforge (about 280mm x 457mm working area), overflowing two rows of 13 and the two jokers onto a second piece, leaving a reasonable amount left over to recut any goofs. Measurements in hand, I set about creating the files in Inkscape. I eyeballed the radius for rounded corners and after looking at dozens and dozens of fonts went with … Arial Bold. Don’t judge.

The tiles are grouped into four colors, but in the original set each one just has the same circle under the number. I decided to represent each group with a symbol as well: circle, square, triangle and starburst (I can play endlessly with the corner count and spoke ratio of Inkscape’s “star and polygon” tool). Because I’m not a monster, I replaced the scary clown joker with a happy face (Wingdings character code 0x4A). Lastly, I added a Script MT Bold “N” for the back of the tiles. The “N” is for Nolan — I considered adding some form of our family crest but couldn’t find one I liked for the space. (New project: figure out how to draw a simple vector crest!)

For material, I used quarter-inch thick MDF board clad with cherry veneer on both sides. On my Glowforge Basic, I used 800 / 80 for engraving and 120 / Full for cutting. Feel free to use and alter these SVG files however you like:

Double-Sided Engraving

Engraving tiles on both sides presents an interesting challenge on the Glowforge; it’s tough to align designs precisely on a physical piece. Within a design, alignment is no problem — I can easily cut the tiles and center the numbers and shapes inside them. But getting the “N” engraved on the reverse side of already-cut pieces is a different story. The smaller the target piece, the harder it is to get exactly right; 106 individual tiles would be a nightmare!

This approach is the simplest and most effective I’ve found for the general alignment problem. But lucky for us, there is a super-cool trick that works for aligning front and back engravings for any shape that is symmetrical along its North-South axis — like our tiles! Details are in the link, but in brief it works like this:

  1. Create your design with the elements for both sides aligned as desired. In the rummikub SVGs, every tile has an “N” stacked right on top of the number and shape. Choose different colors for the front and back elements so that Glowforge groups them into distinct engraving steps.
  2. Fix the material to the Glowforge bed with tape or clips or whatever. This doesn’t have to be super-strong, but it should keep you from accidentally bumping the material once you’ve started. It’s really important that the material not move between steps!
  3. Upload the design and configure settings for the cuts and front-side engravings. Set the back-side elements to “Ignore” and run the print.
  4.  Open the lid, flip each cut piece to expose the back side, then place it back into its hole in the base material. It can be a bit tricky to pull the pieces out; I use a large hat pin to get a grip inside the cut.
  5. Now configure the settings to engrave the elements on the back and ignore the others (including the cuts), and run the print again. Note that when you close the lid and the Glowforge rescans the bed, it may show the elements out of alignment. This is a lie! As long as you don’t move the physical material, your second print run will cut aligned with the first.
  6. Tada! Perfect alignment on every tile.

Adding Color

Each tile “shape” has its own color. I chose red for the star, blue for the triangle, yellow for the square and green for the circle — plus black for the two jokers. I used this super-cool technique to apply color to the tiles neatly and absolutely love the result.

Pretty much everyone uses some kind of masking tape to cover their material and protect it from scorch marks that the laser can otherwise leave behind. When you buy “proofgrade” material direct from Glowforge it comes with a mask already in place; something like this item at Amazon does the trick for other stuff. It turns out that you can take advantage of this mask to fill in engravings with color as well.

With the masks still in place, I first cleaned ash and residue from each tile using baby wipes. This is a messy job and requires some care so that you don’t get the masks too wet; they need to stay adhered to the tiles for the next step. Then I just painted over the front side of each tile using acrylic paint from this set. Because the engraving is dark, it took a few coats to cover — four for most colors and five for the yellow. I let it dry for about ten minutes between coats, and then an hour before peeling off the masks. This was incredibly satisfying — the edges came away clean and crisp, and the color is bright and bold.

The last step was to cover the tiles on all sides with a few coats of clear enamel spray. The final result is a durable, nice looking set. Woo hoo!

Wait, this all seems fine?

“Seems” is the operative word here. The above is a textbook example of social media whitewashing — it’s all true, but skips over all of the not-so-pretty goofs and gotchas along the way. To wit:

Ghosts in the design

106 tiles, each with one cut and three engravings, is a lot of Glowforging and takes quite a bit of time (about two hours). The first hiccup came about two thirds of the way into this session as, seemingly at random, some of the engrave elements were just skipped. I couldn’t figure out a pattern to this so just wrote it off as “some bug,” maybe because of the large number of elements, and built new files to remake the ones that had been messed up.

Unfortunately, the “bug” kept showing up on some (not all) tiles every print, which was increasingly annoying as I quickly burned through my extra material trying to complete the set. Finally, as I was moving tiles around in the Glowforge interface for yet another run, I happened to notice that some of the engraving elements were showing up just a little darker than the others. WTF?

It turns out that, at some point during the dozens of copy/pastes involved in this design, I double-pasted some of the engraving elements — the elements that were showing up darker were actually two identical copies, one atop the other. And when a design has overlapping filled vector shapes, Glowforge just ignores the overlap area. I can’t find this documented anywhere, but it does show up in the community support forums with some regularity. Anyways, when I finally got the files fixed up, everything started working as intended.

Burn marks

A few of the tiles ended up with a weird pattern of burn marks along the edges on the back. I think there must have been something about the position of the tiles on the honeycomb tray that the material sits on in the Glowforge bed — some kind of reflection during the cutting process. It’s possible that I could have dialed down the power and still cut through the material without that side-effect. Anyways, while the aesthetics weren’t that horrible, part of the game is selecting tiles from a face-down draw pile. Unique patterns on the backs would be the equivalent of using a marked deck of cards … whoops! So I remade them.

Paint bleed under the mask

I actually expected this to be much more of a problem than it was. On a few of the pieces, the mask detached just a bit from the tile, causing the paint to bleed outside of the engraved area. I initially thought I might be able to scrape or sand off the extra, but that didn’t work very well — bits of the paint sort of smeared across the tile and it just looked bad. This only happened on a handful of tiles, so I just remade them instead of continuing to fight it.

Light and dark

The material I used has cherry veneer on both sides, but if you look closely one side is notably darker than the other (this was consistent across all the sheets of this stuff). I used light for the fronts, dark for the backs. Once I applied the masking tape, though, it was no longer clear which side was which. And after I got all the way through and was setting up to apply the enamel, I realized that I had flipped the sheet for a whole group of the tiles. And of course, the same thing about marked cards applies here — if you know that half of the green tiles appear lighter in the draw pile, it doesn’t make for a very fair game. I was just able to squeeze enough space out of my extra material to remake these too … whew.

Not the enamel too!

After this seemingly endless process of remaking and repainting problem tiles, I finally had a complete set. Home stretch! All that was left to do was to apply a few coats of clear enamel spray to protect the tiles during play. I’ve had trouble in the past with the enamel sticking the pieces to whatever they were sitting on (in this case a big piece of pressboard). My wife suggested that I put a penny under each tile to hold it just off the surface. A great idea, but because it was only 40° outside where I was painting, I was going to have to move the tiles inside to dry in a warmer environment — the pennies were just too slippery to stay in place during this move. Instead, I put little one-inch pieces of non-stick drawer liner under each one. Two coats on the fronts, wait for them to dry, flip them over, two coats on the backs, shuttling back and forth inside and outside, Bob’s your uncle.

Or not. Apparently tripling the dry time for the enamel (and touch-testing the tiles of course) was not sufficient. When I picked them up (thinking the project was completely finished, mind you) I discovered that the drawer liner was sticking to the tile fronts, leaving grey bits embedded in the enamel as they pulled away. Nooooooooooo! Through my tears of frustration, I used the flat side of a sharp knife to scrape off as much as I could — carefully touched up the paint where it had been damaged — and resprayed the fronts with two more coats of the enamel. The end result actually was fine — the scraping left notable marks at first, but the enamel coats fused together and left the final products looking ok.

Twist the knife

This last is just funny. As I was writing all of this up, I realized that the game wants tiles numbered 1-13 and I had created 1-14. Certainly better too many than too few, but come on.

I don’t remember any project where everything went exactly to plan. But this one takes the cake for sheer number of own-goals. Ah well … they will be fun to play with, and I will use both of those new techniques again for sure, and I can’t help but think of the old saw that a bad day fishing beats a good day working. Amen to that.

An Old Guy Looks at Crypto

The first time I co-founded a company was back in 1996 with a group of four other guys (and yes we were certainly “guys” in the most utterly-stereotypical tech startup sense of the word). One of these was T, who even today literally glows with Chicago-style marketing/finance enthusiasm — I’d challenge anyone to take a coffee-and-Howard-Stern-fueled “perception is reality” road trip with T and not end up a lifelong fan. When I tell fellow nerds that they need a great business person to start their company with, I’m thinking about him.

So when T called a few weeks ago and asked what I knew about Ethereum and Smart Contracts, I figured I should take a look. I’ve spent the last decade dismissing blockchain and crypto without really spending any brain cycles on them … just has never smelled right to me. But obviously there are a ton of people who think it’s the future, and it’s survived more than a few existential crises already, so ok, let’s see what’s going on in there.

Fair warning — the innards of these technologies are quite complex; clearly the vast majority of people “explaining” them online don’t have a clue what is really going on under the covers. I’m going to try to share what I’ve learned in relatively simple terms, but almost certainly will get some of it wrong. I’d love to be corrected where I’ve messed up so please let me know. And if you actually invest your dollars based on any of this, well, that’s 100% on you my friend.

What problem are you trying to solve?

The root of all of this stuff is technology that enables two strangers to exchange value without a trusted third party mediating the transaction. A distributed network of computers (not majority-owned by any single entity) coordinates to ensure that transactions are permanently recorded, that value is owned by exactly one entity at a time, and that it cannot be counterfeited. For sure it takes resources to run the network, but the costs are diffused across a ton of folks and transaction fees can at least theoretically be kept pretty low (more on this later).

The appeal is pretty obvious. Right now we count on trusted third parties for all of our transactions, from banks to mortgage escrow companies to Visa and Paypal and Square. Not only do these folks eat into our assets via transaction fees and float; they exert an incredible amount of power on the market overall. Remember “too big to fail?” Oh yeah, and they also keep (and profit from) our personal information. A fair, anonymous trading platform that avoids these issues seems pretty cool.

Part 1: Hashes, keys and signatures

First, a few core concepts. None these are new; we’ve been using them forever to do things like keep websites secure. But they’re building blocks for all that comes later so worth setting up a bit of a glossary.

A one-way hash is an algorithm for representing any arbitrarily-sized chunk of data with a small, opaque, unique label. That’s a lot of words. “Small” is important because you can represent huge files, like an entire video, in a small, easy-to-manipulate string (typically 256 bits these days). Note this isn’t some magic compression technology; “one-way” or “opaque” means that you cannot reasonably get back to the original bytes using only the hash, and you can’t predict what a hash will look like from the original bytes. Last, a hash is “unique” because you will never (in practice) get the same hash for two pieces of original data, even if they only differ by only a single bit.

A key is a secret used to encrypt to decrypt data. In stories we usually see symmetric keys, where two spies exchange a password that is used to both encrypt and decrypt a message. The problem with symmetric keys is that they need to be shared, which leaves them at risk to be stolen. Stolen keys not only mean that the wrong folks can read messages, but it also mean that the wrong people can encrypt messages — all kinds of potential for nastiness there.

Public/private key pairs work differently. A person’s private key is something that is never, ever shared — but the corresponding public key can be shared openly (it’s “public” after all). A message encrypted with my public key can only be decrypted using the corresponding private key. Because the private key never leaves my control, it’s a much more secure means of communication.

A digital signature is another way to use public/private key pairs — to prove that something is genuine and unaltered. A signature is typically computed by first generating a one-way hash of the data, and then using a private key to encrypt the hash. The resulting signature is sent to a recipient along with the original content, who computes his own one-way hash of the data, decrypts the signature using the corresponding public key, and compares the two results. If they match, they can be confident that the data (a) did in fact come from the owner of the key and (b) was not altered in transit.

Part 2: Blockchains

A blockchain is just a ledger — an ordered list of all transactions that have ever been processed in a particular system. Transactions are collected into blocks, and each block is hashed using input data from those transactions and the previous block. In this way the integrity of each block is reinforced by all the blocks that have come before it. Only one single, exact, ordered sequence of transaction and header data can culminate in the unique hash at the end of the chain; by following the hashes from start to finish, anyone can easily (albeit laboriously) verify that no transactions have been lost or tampered with along the way.

Part 3: Consensus

The blockchain data structure itself is actually pretty simple; the process of creating one in a distributed, “trustless” way is anything but. Thousands of independent nodes work together to create a “consensus” about what the blockchain should look like, hopefully making it really hard for a bad guy to screw things up.

Consensus is another long-standing problem in computer science, and many ways of dealing with it have been developed over the years. The approach used by Bitcoin is popularly called proof of work, and it’s useful to start there because it is pretty much the gold standard (ha ha, get it?) as far as crypto goes, and once you understand it the others make much more sense. Actually, PoW is really just one part of the complete consensus protocol, but that really only matters to the pedants out there. Here’s how it works (buckle up):

1. The Bitcoin peer-to-peer network is made up of thousands of nodes, where each node is effectively a computer running the Bitcoin client software. Nodes are configured as miners, full nodes or light nodes. (More on this later. There is also a subclass of full nodes called super nodes that I’m ignoring, sorry.)

2. Transactions are submitted directly to an arbitrary node through the Bitcoin RPC interface, most commonly using a wallet. Transaction data is signed prior to submission so that private keys need never leave the user’s chosen wallet or other personal storage.

3. Nodes relay transactions around the network, where miners pick them up and bundle them together into blocks. Once enough transactions are collected, the miner verifies signatures, accounts and balances, hashes the transaction data (this is called the “merkle root”) and starts hunting for a valid block hash.

This is where things get a little complicated. A “valid block hash” is one where the numerical representation is less than the current network “difficulty” threshold. Difficulty ratchets up and down according to a global algorithm that attempts to keep the rate of block creation constant, based on how much miner capacity is running across the network.

Remember that the block hash uses the transaction data (merkle root) and previous block hash as input. It also uses a nonce value, which is just a random number picked by the miner. The nonce is the only input to the block hash that can change. And remember that nobody can predict what the hash value will be based on the input — so miners just randomly pick a nonce and compare the resulting hash against the difficulty threshold until they find a valid one. This brute-force hunting process is what “proof of work” means … you cannot create a valid block hash unless you do a bunch of computing work, which costs real-world resources — making it infeasible to game in a way that makes economic sense.

Once a valid block hash is found, the miner tacks it onto the end of the chain and broadcasts their accomplishment across the network. Block creation includes a “block reward” for the miner (currently 6.25 BTC per block) — this is how new Bitcoin comes to exist, and why the term “miner” makes sense.

4. The story isn’t finished yet! Thousands of miners are all doing this work at the same time, using the same transactions1 and trying to find valid block hashes so they can get their reward. This is where the consensus part comes into play. All nodes on the network (including miners but also full and light nodes) work to validate the blocks created by miners. The obvious part of this is just making sure all the math lines up — the signatures and hashes check out, the block hash is below the target difficulty, and so on.

Nodes also have to decide which block “wins” when more than one miner finds a valid block hash for the same transactions. This part is pretty cool — nodes just always prefer longer blockchains. Whenever the node validates a block in a chain that is longer than their current view of the world, they accept that one. The effect here is that all nodes converge on the same chain / view of the world — but it takes time for that to happen. Most bitcoin clients consider a transaction “final” when they see six “confirmations” — meaning that there are six blocks in the chain after the one including the transaction. Six is an arbitrary number but seems to work well as a conservative threshold.

A side note: The value “six” here is just one of a bunch of constants and other parameters that are part of the Bitcoin system … the maximum number of coins in the system, the blockchain reward, the difficulty threshold, and so on. Whoever figured all this out up front, mostly before the algorithms were ever really live in the real world, is a freaking genius. The mystery around that genius is great drama in and of itself. Who is Satoshi?

Still with me? Hard to believe. But if you are, you now have a better understanding of how Bitcoin and the other PoW blockchains work than most folks spending money on them. Thumbs up!

Part 5: Ethereum and Smart Contracts

One thing I glossed over in previous sections is the specific data in each “transaction”. At one level it’s useful to understand that this just doesn’t matter — the blockchain structure and consensus protocols are largely agnostic to the claims made in the transaction itself. But of course for any specific blockchain implementation, those transactions are the whole point of the exercise and matter a lot.

Bitcoin (BTC) was the first mainstream cryptocurrency, and its blockchain is hyper-focused on financial transactions. Each transaction just describes movement of Bitcoin value between “input” and “output” accounts. Actually that’s a little bit of a lie, Bitcoin transactions are built on some rudimentary scripting primitives — but in practice Bitcoin transactions just end up being value exchange.

The second-best-known blockchain is Ethereum (ETH). Naïve traders view it as just another cryptocurrency, but that’s super-wrong. ETH transactions certainly support value exchange just like BTC, but their real purpose is to manage smart contracts. A smart contract is a full-on piece of software that lives on the ETH blockchain with its own address that can hold funds and arbitrary state. Smart contracts can also expose an API which can be called by submitting (you guessed it) transactions on the chain. Folks sometimes refer to Ethereum as the “world’s computer.”2 ETH the currency is used to pay for processing and storage time on the chain, so theoretically there is a little more “oomph” behind its value vs. something purely abstract like BTC, maybe.

Anyways, this is cool because it makes the mechanics of trustless, distributed transactions available in a bunch of new contexts. A few popular use cases you probably have already heard of:

  • Fungible tokens based on the ERC-20 standard that enable arbitrary “currencies” — these could be purely abstract like airline miles, or tied to physical stuff like shares of a company or even fiat currencies like US Dollars. Of course, ties to the physical world are based entirely on contract law; ETH just provides a great platform for representing and exchanging them in a robust way. Token behavior could be implemented by any smart contract, but adherence to the ERC-20 standard means that most Ethereum wallets will be able to hold and trade the token.
  • Non-fungible tokens based on the ERC-721 standard that enable collectibles and ownership of other unique assets. These differ from fungible tokens in that each one is unique — e.g., there is only one NFT representing a particular painting and only one account can own it at a time (modulo fractional ownership). Obviously the same caveats about physical objects apply, but even for digital NFTs (e.g., an NBA Top Shot video) it’s a little weird — block data is too limited to store large files directly, so NFTs typically hold a hash of the actual asset, which is stored somewhere else (“off-chain”, i.e., basically unprotected). And of course a digital asset can be bit-for-bit copied with no loss of fidelity, so …. I am generally trying to reserve opinions for later, but while this is a neat idea, it really just seems kind of silly to me.
  • Decentralized autonomous organizations like the one that tried to buy a copy of the US Constitution. Member accounts (defined by ownership of a particular token or some other rule) call smart contract APIs to vote on issues, with the winning votes automatically triggering actions defined in the code of the contract. For example, members might contribute ETH to a shared giving fund and vote on which account(s) should receive donations.
  • Distributed finance applications like MakerDAO that provide financial services like loan and interest-bearing accounts without a central bank.
  • Lots and lots of games and lotteries and other crap.

Building all of this on the smart contracts framework provides some neat benefits. For example, an NFT can be coded so that when it changes hands, a royalty is sent to the original creator. Tokens might be used as an access pass to an online or real world event. Defi currencies can be coded to automatically increase or decrease supply to reduce volatility. DAOs might include a poison pill that liquidates the organization’s assets if the market reaches a certain level. And because all of the smart contract code is visible on and verified by the blockchain, there is an unprecedented level of transparency as to what is going on.

On the flip side, since smart contracts are just code, they can have bugs, and those bugs can be disastrous. It’s also very cumbersome to include “off-chain” information into smart contract algorithms … for example, a DAO might want to automatically send money to relief organizations when natural disasters occur — knowing that a disaster has occurred in a trustable, automated way is a challenge that lives outside of what Ethereum can easily do today.

In a follow-up post, I’ll walk through the concrete stuff required to deploy and run a smart contract in some detail — but this screed is long and boring enough already.

Part 6: Why are these things valuable?

The question rational people always ask about cryptocurrencies (after “WTF?”) is “why are they worth anything?” Honestly, this is a philosophical question more than anything else, and if there is anyone on the planet less well-equipped to plumb the depths of philosophy than this guy, well, I haven’t met them yet. But I think at least for me, the best answer is just:

BTC and ETH and other cryptocurrencies are valuable because people think they are.

That’s really it. Enough people believe enough in BTC and ETH that they are willing to accept them in exchange for other things they believe to have value, like US Dollars or Euros. This really isn’t so different from the reason that “everyone” attributes value to the US Dollar. And as Hamilton reminds us, even that wasn’t always the case:

Local merchants deny us equipment, assistance
They only take British money, so sing a song of sixpence

This conversation always seems to devolve into people shrinking into the fetal position and questioning the nature of reality, so I’m just going to walk away now. Decide for yourself.

Part 7: What could go wrong?

Much more interesting to me is the technical viability of blockchain-based systems. The more I learn, the more I get convinced that they are fatally flawed and likely to blow up. I’m not smart enough to guess when, but probably pretty soon. Which is not to say that blockchains are inherently bad, or that the core value proposition they offer will not survive and change the world. Just that the current implementations — and all the value wrapped up within them — are probably not up to the task.

It feels very, very much like the dot-com days to me. Some amazing, world-changing things came out of that time, but unfortunately “dot-com” became a religion, and like all religions, grew increasingly resistant to rational criticism. With real money involved, that’s some risky business. Blockchain technical issues feel the same way to me right now. Or more precisely, one specific technical issue. But before I go there, let’s look at a couple of the other technical challenges that I think are probably fine.

Energy Use

High on the list of blockchain objections is energy use. Proof of work is by design an incredible redundant protocol — thousands of nodes are all basically doing exactly the same work as fast as they can, most of which is thrown away. There are all kinds of statistics on this and how ridiculous it has become; my personal favorite is that the energy use by Bitcoin miners is roughly equivalent to my home state of Washington. Does it feel a few degrees hotter to you?

The good news is, this is pretty fixable and already being demonstrated by other chains. One alternative to proof of work is proof of stake. This model does away with miners. Instead, nodes participate in block creation by putting a “stake” into escrow on the chain. The opportunity to mine each new block is granted to one node, chosen pseudo-randomly and typically weighted by the relative size of each node’s stake. That node and only that node creates the block and receives the reward, which is then validated using the normal mechanisms. Nodes that propose invalid blocks are penalized by loss of their stake.

Because it eliminates virtually all of the redundant and expensive computation, PoS is dramatically more energy-efficient than PoW. It is also considered to be helpful in maintaining decentralization of the network, which is a little counter-intuitive because nodes basically are buying influence. But maximum weighting can be easily capped to avoid this, and the approach negates any advantage from funding huge mining data centers as happens today.

Ethereum has been planning to move to PoS for some time now, but it keeps slipping — currently planned for sometime in 2022. Vitalik, I feel your pain, man.

Transaction throughput

Bitcoin can handle about four or five transactions every second, and individual transactions take around an hour to confirm. Ethereum is faster but still in the range of 20 TPS with maybe eight minute confirmations. These numbers are frankly hilarious compared to centralized processors like Visa that have TPS rates in the thousands, and confirm credit card purchases reliably in seconds.

Folks have played around the edges of this by tweaking parameters like the number of transactions that can go into a single block. Bitcoin Cash is one of these and sees rates of about 300 TPS. Layer 2 chains are another approach, basically accepting transactions quickly in separate infrastructure and bundling them up into fewer meta-transactions on the real chain. There are security/speed tradeoffs here that will take some experimentation to really get right.

More revolutionarily, Solana has demonstrated real progress by mixing up proof of stake with (sorry, more jargon) proof of history which uses a distributed, synchronized clock to reduce the back and forth typically required to keep everything in an agreed-upon order.

All up, the problems here seem eminently solvable if not already solved. I’m not worried.

Part 8: Infinite Growth

Finally we come to the one bit I just can’t get over — I’d love to hear a non-religious explanation of why I’m wrong, but:

Blockchains by definition grow forever.

At its core, a chain works because it stores a record of everything that has ever happened ever. When I stand up an Ethereum “full” node, the first thing it does is start downloading the entirety of that history to my local machine. Ethereum’s grew from about 480GB in September 2020 to 952GB a year later, and recently crossed beyond a terabyte. Ethereum is only six years old, and those early years were pretty quiet! If you simply play things out linearly, it reaches half a petabyte by the end of this decade alone — and all the work underway to accelerate throughput means it ain’t going to be linear. Bitcoin is better off, with a linear growth that hits a couple of terabytes by the end of the decade, but that’s just a result of its low TPS rate — I’m not sure that “we survive by doing less” is a winning strategy.

Both chains have the concept of a “light” node that only downloads block headers rather than full transaction histories. Light nodes verify the structure of the chain itself, but trust that the transactions inside are OK. This is a great alternative that allows low-powered devices and wallets to participate, but isn’t sufficient on its own to keep the network running.

Various data compression techniques have been proposed or are in use — this may delay the time of death, but doesn’t address the fundamental problem.

The closest thing I’ve seen to a solution is Ethereum’s proposed “sharding” solution, in which multiple small chains operate independently and periodically roll their transactions up (layer 2 style) to a master (“beacon”) chain. This presumes that most transactions occur locally within a single shard — cross-shard transactions would be painfully slow and expensive, so they need to be rare. There are a bunch of new security and integrity issues here too. Maybe it’ll work, but it’s always “just around the corner” and I have yet to see much of a concrete proposal, and it still doesn’t fundamentally solve the growth problem — smells like religion.

At the end of the day, things that grow infinitely are infinitely bad. Sure disks keep getting bigger, but not nearly as quickly as the chains are. At best this means that soon (like, SOON) only very well-capitalized entities will be able to afford the hardware required to run nodes, in which case we’re right back to the centralized system and power structure we started with. More likely, one day investors will finally realize that the end is near, and a lot of folks will lose a lot of money.

Unfortunately, facing this question has become heresy to true believers. You don’t “get it” or “see the big picture.” They talk about solutions that don’t exist as if they did. I’ve been at this movie before, and it was called dot-com. No thanks.

So where does this leave us?

Blockchains and the systems built atop them are super-cool. More than that, they are addressing real problems in novel and revolutionary ways. But right now there is so much quick money to be made that most folks are ignoring the existential question that arises from their infinite (and accelerating) growth. I am absolutely ready to look at data that shows we can keep increasing adoption of the amazing features of these new systems without falling off of a cliff — but I’ve looked and looked hard and haven’t found it yet. Until I do, I’ll remain a grumpy old guy yelling at DApp developers to get off my lawn.

But that doesn’t mean I won’t keep learning and experimenting — there is just too much amazing stuff going on to ignore. And I’ll keep listening to T because he may well see the magic before I do. Next time we’ll write and deploy a smart contract.

1 Omer points out on LinkedIn that it’s not really the same transactions. Each miner is constructing a block from some subset of all pending transactions in the system (plus a unique “coinbase” transaction representing their hoped-for reward). So while there is great redundancy/rework in the system, it’s not true that every miner is literally racing to do exactly the same thing.

2 Also from Omer is a note that the EVM (Ethereum Virtual Machine) that runs smart contracts is Turing complete. This means that it’s a for-real general-purpose computing engine, unlike Bitcoin’s simple script. Managing this additional complexity (and of course power) is the purpose behind the “gas fees” that are part of each execution. Super cool stuff worthy of its own post!