Zero to Launch

I want to be clear up front that I’m not a “methodology” guy. Whatever the hype, software methodology is inevitably either (a) full employment for consultants; (b) an ego trip for somebody who did something good one time under specific circumstances and loves to brag about it; or (c) both. I’ve built software for decades and the basics haven’t changed, not even once.

  • Make decisions.
  • Break big things down into little things.
  • Write everything down.
  • Use a bug database and source control.
  • Integrate often.

With that said, the rest of this might sound a little bit like software methodology. You have been warned!

Crossing the Ocean

I spend a bit of time these days mentoring folks — usually new startup CTOs that are figuring out how to go from nothing to a working v1 product. “Zero to Launch” is a unique, intense time in the life of a company, and getting through it requires unique (sometimes intense) behaviors. In all cases the task at hand is fundamentally underspecified — you’re committing to build something without actually knowing what it is. In bounded time. With limited resources. Who even does that? Startup CTOs, baby.

Like an ocean crossing, getting from zero to launch is a long journey that requires confidence, faith and discipline. There are few natural landmarks along the way, but there are patterns — the journey breaks down into three surprisingly clear and consistent phases:

  1. What are we building anyways?
  2. Holy crap this is way bigger than we thought!
  3. Will this death march ever end?

Hopefully, one reason you have the job is that you know how to code fast and well — CTOs that can’t (or don’t) code drive me up the wall. And you’d better hire great people. But you’re going to need more than just those chops to get to launch. Each phase needs a different set of skills and behaviors; let’s dig in.

What are we building anyways?

You likely have two constituencies telling you what your software needs to do: (1) non-technical co-founders that see a market opportunity; and (2) users or potential users that want you to help them accomplish something. Each of these perspectives is essential, and you’d probably fail without them. But don’t be fooled — they are not going to give you clear requirements. They just aren’t. They think they are, but they’re wrong.

The first mistake you can make here is getting into a chicken-and-egg battle. Your partners ask for a schedule, you say you can’t do that without full requirements, they say they already did that, you point out the gaps, they glaze over, repeat, run out of money, everyone goes home. Don’t do that.

Instead, just understand and accept that it is up to you to decide what the product does. And further, that you’ll be wrong and folks will (often gleefully) point that out, and you’re just going to have to suck it up. This is why you hire program managers, because synthesizing a ton of vague input into clarity is their core competency — but it’s still on you to break ties and make judgment calls with incomplete information.

And I’m not just talking about invisible, technical decisions. I’m talking about stuff like (all real examples):

  • Does this feature need “undo” capability? If so how deep?
  • Do we need to build UX for this or can we just have the users upload a spreadsheet?
  • Can we cut support for Internet Explorer? (ok finally everyone agrees on that one)
  • What data needs to be present before a job can be submitted?
  • Does this list require paging? Filtering? Searching?

You get the idea. This can be hard even for the most egocentric among us, because really, what do we know about [insert product category here]? Even in a domain we know well, it’s a little bold. But there are two realities that, in almost every case, make it the best strategy:

  1. Nobody knows these answers! I mean sure, do all the research you can, listen, and don’t be stupid. But at the end of the day, until your product is live in the wild, many of these are going to be guesses. Asking your users or CEO to make the guess is just an indirection that wastes time. Take whatever input you can, make a call, consider how you’ll recover if (when) it turns out you were wrong, and move on.
  2. Normal people just aren’t wired to think about error or edge cases. For better or worse, it’s on you and your team to figure out what can go wrong and how to react. This is usually an issue of data and workflow — how can you repair something that has become corrupt? “Normal” people deal with these problems with ad-hoc manual intervention, which is a recipe for software disaster.

For this to work, you need to be obsessively transparent about what you’re building. Write down everything, and make sure all of your stakeholders have access to the documents. Build wireframes and clickthrough demos. Integrate early and often, and make sure everybody knows where the latest build is running and how they can try it. This isn’t a CYA move; that’s a losing game anyways. It’s about trying to make things real and concrete as early as possible, because people are really good at complaining about things they can actually see, touch and use. You’re going to get a ton of feedback once the product is live — anything you can pull forward before launch is gold. Do this even when it seems embarrassingly early. Seriously.

Transparency also gives people confidence that you’re making progress. As they say, code talks — a live, running, integrated test site is what it is. No magic, no hand-waving. It either works or it doesn’t; it has this feature or it doesn’t; it meets the need or it doesn’t. Seeing the product grow more complete day by day is incredibly motivating. Your job is to will it into existence. This is a key but often unstated startup CTO skill — you need to believe, and help others believe, during this phase.

Holy crap this is way bigger than we thought!

Once you’ve gotten over the first hump and folks have something to look at, things really start to heat up. Missing features become obvious. “Simple” tasks start to look a lot less simple. It can get overwhelming pretty quickly. And that’s just the beginning. Over on the business side of things, your colleagues are talking to potential customers and trying to close sales. Suddenly they desperately need new bells and whistles (sometimes even whole products) that were never on the table before. Everything needs to be customizable and you need to integrate with every other technology in the market. Sales people never say “no” and they carry a big stick: “Customers will never buy if they don’t get [insert one-off feature here].”

Herein we discover another problem with normal people: they have a really hard time distilling N similar instances (i.e., potential customers) into a single coherent set of features. And frankly, they don’t really have much incentive to care. But it’s your job to build one product that works for many customers, not the other way around.

During this phase, your team is going to get really stressed out, as every solved problem just seems to add three new ones on the pile. They’re going to want to cut, cut, cut — setting clear boundaries that give them a chance to succeed. This is an absolutely sane reaction to requirement chaos, but it’s on you to keep your team from becoming a “no” machine.

A useful measure of technical success is how often you are able to (responsibly) say “yes” to your stakeholders. But saying “yes” doesn’t mean you just do whatever random thing you’re told. It means that you’re able to tease out the real ask that’s hiding inside the request, and have created the right conditions to do that. It’s very rare that somebody asks you to do something truly stupid or unnecessary. Normal people just can’t articulate the need in a way that makes software sense. And why should they? That’s your job.

During this phase, you have to be a mediator, investigator, translator and therapist. Try to be present at every feature review, so you can hear what the business folks and users say first-hand. If you can’t be there, schedule a fast follow-up with your team to discuss any new asks while they’re still fresh. Never blind-forward requests to your team. Propose simpler alternatives and ask why they won’t (or will) work. Use a cascading decision tree:

  1. What is the real ask? If you’re getting an indirect request through sales, ask them to replay the original conversation exactly — what words were used? If it’s coming from users, ask them to walk you through how they think the feature should work, click by click. Ask what they do now. What do other similar products do? Try to find other folks making the same ask — how do they word it?
  2. Do we need to do anything? Sometimes new asks are just a misunderstanding about what the product already does. As they say in Hamilton, “most disputes die and no one shoots.”
  3. Do we need to do something right now? Beyond just schedule, there are good reasons to delay features to “vNext” — you’ll know more once you’re live. Do we really need this for launch, or can it wait? One caveat here — be careful of people who want to be agreeable. I remember one company in particular where the users would say “it’s ok, we don’t need that,” but then go on to develop elaborate self-defeating workarounds on their own. It took awhile to get everyone on the same page there!
  4. Can we stage the feature over time? This is often the best place for things to end up. Break the request down into (at least) two parts: something simpler and easier for launch and a vNext plan for the rest. You’ll learn a ton, and very (very) often the first version turns out to be more than good enough. Just don’t blow off the vNext plan — talk it out on the whiteboard so you don’t have to rebuild from scratch or undo a bunch of work.
  5. Is there something else we can swap for? Sometimes yes, sometimes no. And don’t turn stakeholder conversations into horse trading arguments. But costs are costs, and if you can remove or delay something else, it makes launch that much closer. Again, you’re always learning, and there’s no honor in “staying the course” if it turns out to be wrong. Be smart.

This phase is all about managing up, down and sideways. Things will get hot sometimes, and people will be frustrated. Reinforce with your stakeholders that you’re not just saying “no” — you’re trying to figure out how to say “yes.” Remind your team that you understand the quantity-quality-time dilemma and that if there’s a fall to be taken, it’s on you not them. And tell your CEO it’s going to be OK … she’ll need to hear it!

Will this death march ever end?

You might notice that, so far, I haven’t mentioned “metrics” even once. That’s because they’re pretty much useless in the early stages of a product. Sorry. Products start out with one huge issue in the database: “build v1.” That becomes two, then four, and suddenly you’re in an exponential Heather Locklear shampoo commercial. New features come and go every day. Some are visible and quantifiable, but many are not. You are standing in for metrics at first — your gut and your experience. Read up on proton pump inhibitors my friend.

But as you get closer to launch, this balance shifts. Requirement changes slow down, and issues tend to look more like bugs or tasks — which tends to make them similar in scope and therefore more comparable. There’s some real comfort in this — “when the bug count is zero, we’re ready to launch” actually means something when you can measure and start to predict a downward trend.

But things get worse before they get better, and sometimes it feels like that downward shift will never happen. This is when the most grotty bugs show up — tiny miscommunications that blow up during integration, key technology choices that don’t stand up under pressure, missing functionality discovered at the last minute. Difficult repros and marathon debugging sessions suck up endless time and energy.

The worst are the bug pumps, features that just seem to be a bundle of special-cases and regressions. I’ve talked about my personal challenge with these before — because special-cases and regressions are exactly the symptoms of poor architecture. Very quickly, I start to question the fundamentals and begin redesigning in my head. And, sometimes, that’s what it takes. But just as often during this phase, you’re simply discovering that parts of your product really are just complicated. It’s important to give new features a little time to “cook” so they can settle out before starting over. Easy to say, tough to do!

During this home stretch, you need to be a cheerleader, mom and grandpa (please excuse the stereotypes, they’re obviously flawed but useful). A cheerleader because you’re finding every shred of progress and celebrating it. A mom because you’re taking care of your team, whatever they need. Food, tools and resources, executive air cover, companionship, music — whatever. And a Grandpa because you’re a calming presence that understands the long view — this will end; it’s worth it; I’ve been there.

I can’t promise your company will succeed — history says it probably won’t. But I can promise that if you throw yourself into these roles, understand where you are in the process, stay focused, hire well and work your butt off, you’ve got a really good chance of launching something awesome. I’m not a religious guy, but I believe what makes humans special is the things we build and create — and great software counts. Go for it, and let me know if I can help.

Predicting the Future (Tides)

The beach outside our Whidbey place is amazing. There’s about twenty yards of firm sand and rocks along the shore, then a broad, flat, soft expanse of sand/mud/clay for just under 100 yards, then maybe 30 yards of firm sandbars. Beyond the sandbars, the channel drops to a depth of about 500 feet or so (the first “steps” along this drop-off are the best places to drop a crab pot).

The tide sweeping in and out over this shallow area changes our back yard dramatically from hour to hour. At the highest high tide there’s no beach at all — in the Spring whales swim just a few yards away, sucking ghost shrimp out of the mud flats. During summer low-low tides, we head out to the sand bars where you can dig for horse clams and pick up crabs hiding in the eel grass (while Copper chases seagulls for miles).

I know it sounds a bit out there, but the rhythm of our days really does sync up with the water — and it’s a wonderful way to live. “What’s the tide doing today?” is the first question everybody seems to ask as they come down for coffee in the morning. And that, my friends, sounds like fodder for another fun project.

What’s the tide doing today?

NOAA publishes tide information that drives a ton of apps — I use Tides Near Me on my phone and the TideGuide skill on Alexa, and both are great. But what I really want is something that shows me exactly what the tide will look like in my back yard. For some reason I have a really hard time correlating tide numbers to actual conditions, so an image really helps. (As an aside, difficulty associating numbers with reality is a regular thing for me. I find it very curious.) For example, if you were to stand on the deck in the afternoon on September 30, what exactly would you see? Maybe this?

Those images are generated by (a) predicting what the tide and weather will be like at a point in time, and then (b) selecting a past image that best fits these parameters from a historical database generated using an exterior webcam, NOAA data and my Tempest weather station. So the pictures are real, but time-shifted into the future. Spooooky!

Actually, my ultimate goal is to create a driftwood display piece that includes a rotating version of these images together with a nice antique-style analog tide clock. But for today, let’s just focus on predictions and images.

How Tides Work

Ocean Tides are a rabbit hole you can go down a looong way — fascinating stuff. This National Geographic article is a nice intro, and this primer by UW professor Parker MacCready really gets into the weeds. To my understanding, there are at six primary factors that contribute to tide action:

  1. Variations in pull from the Moon’s gravity on the Earth. The side facing the Moon has increased gravity, and the side opposite the moon has slightly less. Both of these cause liquid water on the surface to “bulge” along this axis (more on the closer side, less on the far side).
  2. The same thing happens due to the Sun’s gravity, but less so. Tides are most extreme when the sun and moon “line up” and work together; least so when they are at right angles to each other.
  3. The Earth is spinning, which combines with orbital movement to change which parts of the Earth are being pulled/pushed the most at any given time.
  4. The Earth is tilted, which changes the angles and magnitude of the forces as the seasons change. One consequence of this is that we tend to have daytime lows in the Summer and nighttime lows in the Winter.
  5. Weather (short-term and seasonal) can change the amount of water in a specific location (storm surges being a dramatic example).
  6. Local geography changes the practical impact of tides in specific locations (e.g., levels present differently over a wide flat area like my beach vs. in a narrow fjord).   

All of this makes it really tough to accurately predict tide levels at a particular time in a particular place. Behavior at a given location can be described reasonably well by combining thirty-seven distinct sine waves, each defined by a unique “harmonic constituent.” NOAA reverse-engineers these constituents by dropping buoys in the ocean, measuring actual tide levels over a period of months and years, and doing the math. Our closest “harmonic” or “primary” station is across the water in Everett.

“Subordinate” stations (our closest is Sandy Point) have fewer historical measurements — just enough to compute differences from a primary station (Seattle in this case). But here’s the really interesting bit — most of these “stations” don’t actually have physical sensors at all! The Sandy Point buoy was only in place from February to April, 1977. In Everett, it was there for about five months in late 1995. To find an actual buoy you have to zoom all the way out to Port Townsend! This seems a bit like cheating, but I guess it works? Wild.

You can query NOAA for tide predications at any of these stations, but unless there’s a physical buoy all you really get is high and low tide estimates. If you want to predict water level for a time between the extremes, you need to interpolate. Let’s take a look at that.

The Rule of Twelfths

Image credit Wikipedia

It turns out that sailors have been doing this kind of estimation for a long, long time using the “Rule of Twelfths.” The RoT says that if you divide the span between extremes into six parts, 1/12 of the change happens in the first part; 2/12 in the next; then 3/12, 3/12 again, 2/12 and 1/12 to finish it out. Since the period between tides is about six hours, it’s a pretty easy mental calculation that would have been good to know when I was fifteen years old trying to gun my dad’s boat through the channel off of Ocean Point (spoiler alert: too shallow).

Anyways, I use this rule together with data from NOAA and simple interpolation to predict tide levels on my beach for any given timepoint. The code is in NOAA.java and basically works like this:

  1. The NOAA class exposes a single method “getPredictions” that queries NOAA for tide extremes from one day before to two days after a given timepoint.
  2. The extremes are added to a list, as well as five RoT timepoints between each of them.
  3. The resulting list is returned to the caller as a Predictions object.

The Predictions object exposes a few methods, but the most interesting one is estimateTide, which does a binary search to find the predictions before and after the requested timepoint, then uses linear interpolation to return a best-guess water level. The resulting estimations aren’t perfect, but they are really very accurate — more than good enough for our purposes. Woo hoo!

Stepping Back

OK, let’s back up a bit and look at the code more broadly. Tides is a web app that primarily exposes a single endpoint /predict. It’s running on my trusty Rackspace server, and as always the code is on github. To build and run it, you’ll need a JDK v11 or greater, git and mvn. The following will build up the dependencies and a fat jar with everything you need:

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox && mvn clean package install
cd ../weather && mvn clean package install
cd ../tides && mvn clean package

To run the app you’ll need a config file — which may be challenging because it expects configuration information for a Tempest weather station and a webcam for capturing images. But if you have that stuff, go to town! Honestly I think the code would still work pretty well without any of the weather information — if you are interested in running that way let me know and I’d be happy to fix things up so that runs without crashing.

The code breaks down like this:

  • Camera.java is a very simple wrapper that fetches live images from the webcam.
  • NOAA.java fetches tide predictions, augments them with the RoT, and does interpolation as discussed previously.
  • Weather.java manages interactions with the Tempest. It relies on code I wrote awhile ago and discuss here.
  • TideStore.java is a simple SQL and file system store.
  • Tides.java is a domain layer that pulls all the bits and pieces together.
  • Server.java implements the web interface, using the WebServer class I build long ago.

Capturing Images and Metadata

None of this works without a pretty significant collection of metadata-tagged historical images. And you can’t capture images without a camera — so that was step one here. I have a ton of Ring cameras and I love them, but they are nearly impossible to access programmatically. Sure there are some reverse-engineered libraries, and they “kind of” work, but reliably capturing an image “right now” is a complicated and ultimately only semi-successful mess. So instead I just picked up a simple camera that is civilized enough to just expose the damn image with a URL.

Running the app with the parameter “capture” tells it to call Tides.captureCurrentTide rather than running the web server. This method:

  1. Captures the current “day of year” (basically 1 – 365) and “minute of day” (0 – 1,439). It turns out that these two values are the most critical for finding a good match (after tide height of course) — being near the same time of day at the same time of year really defines the “look” of the ocean and sky, at least here in the Pacific Northwest.
  2. Loads current weather metrics from the Tempest.
  3. Estimates the current tide level.
  4. Captures an image from the webcam.
  5. And finally, writes it all to the TideStore.

The capture stuff runs twice hourly via cron job on a little mini pc I use for random household stuff; super-handy to have a few of these lying around! Once a day, another cron job pushes new images and a copy of the database to an Azure container — a nice backup story for all those images that also lands them in a cloud location perfect for serving beyond my home network. Stage one, complete.

Picking an Image

The code to pick an image for a set of timepoints is for sure the most interesting part of this project. My rather old-school approach starts in Tides.forecastTides, which takes a series of timepoints and returns predictions for each (as well as data about nearby extremes which I’ll talk about later). The timepoints must be presented in order, and typically are clustered pretty closely — e.g., for the /predict endpoint we generate predictions for +1, +3 and +6 hours from now, plus the next three days at noon.

First we load up NOAA predictions and, if any of the timepoints are within the bounds of the Tempest forecast, that data as well. The Tempest can forecast about ten days ahead, so in normal use that works fine (the code actually interpolates weather in the same way we do for tides). As we iterate through the timepoints, we load new NOAA predictions if needed.

Armed with this stuff, the real core of the work happens in Tides.forecastTide. The first pass is in TideStore.queryClosest, which uses a series of thresholds to find images within given ranges of tide height, day of year and hour of day. We start with a very tight threshold — tide within .25 feet, day of year within 10 days and hour of day within 20 minutes. If we don’t find any, we fall back to .5/20/20, and so on from there until our last try is pretty wide at 1/120/120. If we can’t find anything at that point we just give up — hard to even squint and see that as a match. The good news is, even after collecting data for just about a month, we already succeed most of the time.

By querying in stages like this, we end up with a candidate pool of images that, from a tide/time perspective, we consider “equivalently good.” Of course we may just find a single image and have to use it, but typically we’ll find a few. In the second pass, we sort the candidates by fit to the predicted weather metrics. Again we use some thresholding here — e.g., pressure values within 2mb of each other are considered equivalent.

At the end of the day, this is futzy, heuristic stuff and it’s hard to know if all the thresholds and choices are correct. I’ve made myself feel better about it for now by building a testing endpoint that takes a full day of actual images and displays them side-by-side with the images we would have predicted without that day’s history. I’ve pasted a few results for August 30 below, but try the link for yourself, it’s fun to scroll through!

Other Ways We Could Do This: Vectors

Our approach works pretty well, even with a small (but growing!) historical database. But it’s always useful to consider other ideas. One way would be to replace my hand-tuned approach with vector-based selection. Vector distance is a compelling way to rank items by similarity across an arbitrary number of dimensions; it appeals to me because it’s pretty easy to visualize. Say you want to determine how similar other things are to a banana, using the properties “yellowness” and “mushiness” (aside: bananas are gross). You might place them on a graph like the one here.

Computing the Euclidian distance between the items gives a measure of similarity, and it kind of works! Between a papaya, strawberry and pencil, the papaya is intuitively the most similar. So that’s cool, and while in this example we’re only using two dimensions, the same approach works for “N” — it’s just harder to visualize.

But things are never that simple — if you look a little more deeply, it’s hard to argue that the pencil is closer to a banana than the strawberry. So what’s going on? It turns out that a good vector metric needs to address three common pitfalls:

  1. Are you using the right dimensions? This is obvious — mushiness and yellowness probably aren’t the be-all-end-all attributes for banana similarity.
  2. Are your dimensions properly normalized? In my tide case, UV measurements range from 0 – 10, while humidity can range from 0 – 100. So a distance of “1” is a 10% shift in UV, but only a 1% shift in humidity. If these values aren’t normalized to a comparable scale, humidity will swamp UV — probably not what we want.
  3. How do you deal with outliers? This is our pencil-vs-strawberry issue. A pencil is “so yellow” that even though it doesn’t remotely match the other dimension, it sneaks in there.

These are all easily fixable, but require many of the same judgment calls I was making anyways. And it’s a bit challenging to do an efficient vector sort in a SQL database — a good excuse to play with vector databases, but didn’t seem like a big enough advantage to worry about for this scenario.

Other Ways We Could Do This: AI

My friend Zach suggested this option and it’s super-intriguing. Systems like DALL-E generate images from text descriptions — surprisingly effective even in their most generic form! The image here is a response to the prompt “a photographic image of the ocean at low tide east of Whidbey Island, Washington.” That’s pretty spooky — even includes an island that look a lot like Hat from our place.

With a baseline like this, it should be pretty easy to use the historical database to specialty-train a model that generates “future” tide images out of thin air. This is exciting enough that I’m putting on my list of things to try — but at the same time, there’s something just a bit distasteful about deep-faking it. More on this sometime soon!

A Few Loose Ends

The rest of the code is just delivery, mostly in Server.java, using the WebServer and Template classes that make up many of my projects.

One nice little twist — remember that I pushed the images and database to an Azure container for backup. There’s nothing in those files that needs to be secret, so I configured the container for public web access. Doing this lets me serve the images directly from Azure, rather than duplicating them on my Rackspace server.

I also forgot to mention the Extremes part of tide forecasting. It turns out that it’s not really enough to know where the water is at a point in time. You want to know whether it’s rising or falling, and when it will hit the next low or high. We just carry that along with us so we can display it properly on the web page. It’s always small things like this that make the difference between a really useful dashboard and one that falls short.

I’ll definitely tweak the UX a bit when I figure out how to put it into a fancy display piece. And maybe I’ll set it up so I can rotate predictions on my Roku in between checking the ferry cameras! But that is for another day and another post. I had a great time with this one; hope you’ve enjoyed reading about it as well. Now, off to walk the beach!

Real-world IoT with LoRaWAN

Remote monitoring of a community water tank for under $500, that works kilometers away from wifi or cell service, incurs no monthly fees, and uses a battery that lasts up to ten years? The future is here! I’m super-impressed with LoRaWAN, The Things Network and my Milesight Sensor. Read on for all the nerdy goodness.

The Setup

Southern Whidbey Island, geologically speaking, is a big pile of clay covered by a big pile of sand. As I (barely) understand it, when glaciers moved in from the North, they plowed heavy clay sediment in front of them, which got trapped in lake beds formed when north-flowing rivers were blocked by those same glaciers. These big blobs of clay (in particular the Lawton Formation) sprung upwards as the glaciers retreated, the same way a pool float does when you climb off, creating the island. The retreat also left a bunch of looser stuff (sand and gravel) on top of the clay. Since then, tides and waves have been continually carving away the sides of the island, leaving us with beautiful high bluffs and frequent landslides. These UW field trip notes go into more and surely more accurate detail, but I think I’ve got the high points right.

Anyway, I’m lucky enough to live at the bottom of one of those bluffs. How our property came to “be” is a great story but one for another time — ask me sometime when we’re hanging out. For today, what’s important is that groundwater collects along the top of the impermeable clay layer in “aquicludes,” what a great word. And that’s where we collect our drinking water. It’s a pretty cool setup — three four-inch pipes jammed into the hillside draw water that’s been filtered through tons of sand and gravel before hitting the clay. The water is collected in a staging tank, then pumped into two holding tanks. A smaller 500 gallon one sits at house-level, and a bigger 2,000 gallon one is most of the way up the bluff.

It’s a bit janky, but gets the job done. Until it doesn’t. Like last July 2nd, two days before 30+ family and friends were to show up for the holiday weekend. The tanks went completely dry and it took us both of those days to figure out the “root” cause. See, I put quotes around the word “root” because it turns out that there were TWENTY-FIVE FEET OF TREE ROOTS growing through the pipes. Completely blocked. Clearing them out was quite a chore, but we got it done and July 4th was enjoyed by all, complete with flushing toilets and non-metered showers. All of which is just background leading to my topic for today.

LoRa / LoRaWAN

Our July 4th saga prompted me to set up a monitoring solution that would give us some advance warning if the water supply starts getting low. The obvious place to do this is the 2,000 gallon upper holding tank, because it’s the first place that goes dry as water drains down to our homes. The tank shed is too far from my house to pick up wifi, though, and while there is some cell coverage, I wasn’t psyched about paying for a monthly data plan. What to do?

It turns out that there is an amazingly cool technology called LoRa (aptly, for “Long Range”) that is tailor-made for situations just like mine. There’s a lot of terminology here and it can be tough to sort out, but in short:

  • LoRa is a physical protocol for sending low-bandwidth messages with very little power over very long distances. It’s actually a proprietary technique with the patent owned by Semtech, so they control the chip market. Kind of unsettling for something that is otherwise so open, but they don’t seem to be being particularly evil about it.
  • LoRaWAN is a networking layer that sits on top of LoRa and the Internet, bridging messages end-to-end between devices in the field and applications (e.g., dashboards or alerting systems) that do something useful with device data.

A bunch of different players coordinate within these two layers to make the magic happen. There’s a great walkthrough of it all on the LoRa Alliance site; I’m going to crib their diagram and try to simplify the story a bit for those of us that aren’t huge radio nerds:

Image adapted from semtech.com; click for original
  • End Devices sit in the field, broadcasting messages out into the world without a target — just signals saying “HEY EVERYBODY IT’S 100 DEGREES HERE RIGHT NOW” or whatever.
  • Gateways harvest these messages from the air and forward them over TCP/IP to a pre-configured…
  • Network Server (LNS) that typically lives on the Internet. Network servers are the traffic cops of this game. They queue messages, send acknowledgements, delegate “join” messages to a Join Server and device messages to an Application Server, etc.
  • Join Servers hold the inventory of end devices and applications within the larger network, and knows which devices are supposed to be talking to which applications. Join Servers also manage and distribute encryption keys to ensure minimal information disclosure. I won’t dive into the encryption details here, because yawn.
  • Application Servers receive device data and get them to the right Application.
  • Applications are logical endpoints for specific end device data. This is a bit tricky because a LoRaWAN application is different from an end-user application. There is often a 1:1 relationship, but the LRW application accepts and normalizes device data, then makes it available to end-user applications.
  • End-User Applications (not an official LRW term, just one I made up) actually “do stuff” with device data — create dashboards and other user experiences, send alerts, that kind of thing. End-user applications typically receive device data through a message queue or webhook or other similar vehicle.

The most common LoRaWAN use case is “uplink” (devices send info to apps), but there are also plenty of uses for “downlink” where apps send to devices: configuration updates, proactive requests for device information, whatever. A neat fun-fact about downlinks is that the network server is responsible for picking the best gateway to use to reach the targeted device; it does this by keeping track of signal strength and reliability for the uplinks it sees along the way. Pretty smart.

Picking a Network

Despite the nifty encryption model, many enterprises that use LoRaWAN for mission-critical stuff set up their own private network — which really just means running their own Servers (I’m just going to call the combo of Network/Join/Application servers a logical “Server” going forward). AWS and companies like The Things Industries offer hosted solutions, and a quick Google search pops up a ton of open source options for running your own. There are also quite a few “public networks” which, kind of like the public cloud providers, share logically-segmented infrastructure across many customers.

More interesting to me is the pretty amazing community-level innovation happening out there. The Things Stack “Community Edition” was one of the first — anybody can set up devices, gateways and applications here. It so happens that our outpost on Whidbey Island didn’t have great TTN coverage, so I bought my own gateway — but with more than 21,000 connected gateways out there, in most metro locations you won’t even have to do that. The gateway I bought grows the community too, and is now there for anybody else to use. Sweet!

Side note: I actually bought my gateway almost two years ago (part of a different project that never made it over the finish line), so it was there and waiting for me this time. But if I was starting today I might (even as a crypto skeptic, and appreciating its already checkered past) take a look at Helium instead. They basically incent folks to run gateways by rewarding them with tokens (“HNT”) which can be exchanged for credits on the network (or for USD or whatever). Last year they expanded this (only in Miami for now) system into cell service. I dunno if these folks will make a go of it, but I do love the idea of a “people’s network” … so hopefully somebody will!

Here’s my gateway running on The Things Network:

Picking a Device

Measuring the amount of liquid in a tank is an interesting problem. We use a standard float switch to toggle the pump that feeds the tank, turning it on whenever the level drops below about 1,800 gallons. This works great for the pump, but not for my new use case — it only knows “above” or “below” its threshold. I want to track specific water volume every few minutes, so we can identify trends and usage patterns over time.

A crude option would be to just use a bunch of these binary sensors, each set at a different height (it’s about six feet tall, so say one every foot or so). But that’s a lot of parts and a lot to go wrong — there are a plenty of better options that deliver better measurements with less complexity:

  • Capacitive measurement uses two vertical capacitive plates with an open gap between them (typically along the insides of a PVC pipe open at both ends. As liquid rises inside the pipe, capacitance changes and can be correlated to liquid levels.
  • Ultrasonic measurement is basically like radar — the unit mounts at the top of the tank pointing down at the liquid. A pulse is sent downwards, bounces off the water and is sensed on its return. The amount of time for that round trip can be correlated to height in the tank. The same approach can be used from the bottom of the tank pointing up — apparently if the transducer is attached to the bottom of the tank, the signal won’t reflect until it hits the top of the liquid-air boundary. Amazing!
  • Hydrostatic pressure sensors are placed on the inside floor of the tank and the relative pressure of water above the sensor correlates with depth.
  • A number of variations on the above and/or float-based approaches.

After a bunch of research, I settled on a hydrostatic unit — the EM500-SWL built by Milesight. Built for LoRaWAN, fully sealed, 10 year battery life, and a relative steal at less than $350. I was a bit worried that our tank would be too small for accurate measurements, but Asuna at Milesight assured me it’d work fine, and connected me with their US sales partner Choovio to get it ordered. They were both great to work with — five stars!

Setup at the tank was a breeze. Connect the sensor to the transceiver, drop the sensor into the tank, hang the transceiver on the shed wall and hit the power button. Configuration is done with a mobile app that connects to the unit by NFC; kind of magic to just hold them together and see stuff start to pop! By the time I walked down the hill to my house, the gateway was already receiving uplinks. Woo hoo!

Setting up the Application

OK, so at this point the sensor was broadcasting measurements, they were being received by the gateway, and the gateway was pushing them up to the Things Network Server. Pretty close! But before I could actually do anything with the readings, it was back to the Network Server console to set up an Application and “activate” the device. Doing this required three key pieces of information, all collected over that NFC link:

  • DevEUI: a unique identifier for the specific device
  • JoinEUI: a unique identifier for the Join Server (the default in my device was, happily, for The Things Network)
  • AppKey: the key used for end-to-end encryption between the device and application

Applications can also assign “payload formatters” for incoming messages. These are small device-specific scripts that translate binary uplink payloads into something usable. Milesight provides a ready-to-go formatter, and with that hooked up, “water_level” (in centimeters) started appearing in each message. Woot!

Finally, I set up a “WebHook” integration so that every parsed uplink from the device is sent to a site hosted on my trusty old Rackspace server, secured with basic authentication over https. There are a ton of integration choices, but it’s hard to beat a good old URL.

And Actually Tracking the Data

At last, we can do something useful with the data! But as excited as I am about my monitoring app, I’m not going to go too deep into it here. The code is all open sourced on github if you’d like to check it out (or use it for something) — basically just a little web server with a super-simple Sqlite database underneath. Four endpoints:

  • /witterhook is the webhook endpoint, accepting and storing uplinks.
  • /wittergraph uses chart.js to render levels over time.
  • /witterdata provides the JSON data underlying the chart.
  • /wittercheck returns a parseable string to drive alerts when the levels go low (3.5 feet) or critical (2 feet).

For the alerting, I’m just using a free account at Site24x7 to ping /wittercheck every half hour and send email alerts if things aren’t as they should be.

So there you go. There are already obvious patterns in the data — the “sawtooth” is so consistent that there must be a steady, small leak somewhere in the system below the upper tank. Our supply is keeping up with it no problem at the moment, but definitely something to find and fix! It’s also clear that overnight sprinklers are by far our biggest water hogs, but I guess that’s not a shocker.

Now I just have to figure out how to auger out the rest of that root mass. Always another project at the homestead!

Nerdsplaining: SMART Health Links

This is article three of a series of three. The first two are here and here.

Last time here on the big show, we dug into SMART Health Cards — little bundles of health information that can be provably verified and easily shared using files or QR codes. SHCs are great technology and a building block for some fantastic use cases. But we also called out a few limitations, most urgently a ceiling on QR code size that makes it impractical to share anything but pretty basic stuff. Never fear, there’s a related technology that takes care of that, and adds some great additional features at the same time: SMART Health Links. Let’s check them out.

The Big Picture

Just like SMART Health Cards (SHCs) are represented by encoded strings prefixed with shc:/, SMART Health Links (SHLs) are encoded strings prefixed with shlink:/ — but that’s pretty much where the similarity ends. A SHC is health information; a SHL packages health information in a format that can be securely shared. This can be a bit confusing, because often a SHL holds exactly one SHC, so we get sloppy and talk about them interchangeably, but they are very different things.

The encrypted string behind a shlink:/ (the “payload”) is a base64url-encoded JSON object. We’ll dive in way deeper than this, but the view from 10,000 feet is:

  1. The payload contains (a) an HTTPS link to an unencrypted manifest file and (b) a key that will be used later to decrypt stuff.
  2. The manifest contains a list of files that make up the SHL contents. Each file can be a SHC, a FHIR resource, or an access token that can be used to make live FHIR requests. We’ll talk about this last one later, but for now just think of a manifest as a list of files.
  3. Each file can be decrypted using the key from the original SHL payload.

There’s a lot going on here! And this is just the base case; there are a bunch of different options and obligations. But if you remember the basics (shlink:/, payload, manifest, content) you’ll be able to keep your bearings as we get into the details.

Privacy and Security

In that first diagram, nothing limits who can see the manifest and encrypted content — they’re basically open on the web. But all that is basically meaningless without access to the decryption key from the payload, so don’t panic. It just means that, exactly like a SHC, security in the base case is up to the person that’s holding the SHL itself (in the form of a QR Code or whatever). And often that’s perfectly fine.

Except sometimes it’s not, so SHLs support added protection using an optional passcode that gates access to the manifest:

  1. A user receiving a SHL also is given a passcode. The passcode is not found anywhere in the SHL itself (although a “P” flag is added to the payload as a UX hint).
  2. When presenting the SHL, the user also (separately) provides the passcode. 
  3. The receiving system sends the passcode along with the manifest request, which succeeds only if the passcode matches correctly.

Simple but effective. It remains to be seen which use cases will rally around a passcode requirement — but it’s a handy arrow to have in the quiver.

The SHL protocol also defines a bunch of additional requirements to help mitigate the risk of all these (albeit encrypted and/or otherwise protected) files floating around:

  • Manifest URLs are required to include 256 bits of entropy — that is, they can’t be guessable.
  • Manifests with passcodes are required to maintain and enforce a lifetime cap on the number of times an invalid passcode is provided before the SHL is disabled.
  • Content URLs are required to expire (at most) one hour after generation.
  • (Optionally) SHLs can be set to expire, with a hint to this expiration time available in the payload.

These all make sense … but they do make publishing and hosting SHLs kind of complicated. While content files can be served from “simple” services like AWS buckets or Azure containers, manifests really need to be managed dynamically with a stateful store to keep track of things like passcodes and failed attempts. Don’t think this is going to be a one night project!

SMART Health Links in Action

Let’s look at some real code. First we’ll run a quick end-to-end to get the lay of the land. SHLServer is a standalone, Java-based web server that knows how to create SHLs and serve them up. Build and run it yourself like this (you’ll need a system with mvn and a JDK installed):

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox
mvn clean package install
cd ../shl
mvn clean package
cd demo
./run-demo.sh # or use run-demo.cmd on Windows

This will start your server running on https://localhost:7071 … hope it worked! Next open up a new shell in the same directory and run node create-link.js (you’ll want node v18+). You’ll see an annoying cert warning (sorry, the demo is using a self-signed cert) and then a big fat URL. That’s your SHL, woo hoo! Select the whole thing and then paste it into a browser. If you peek into create-link.js you’ll see the parameters we used to create the SHL, including the passcode “fancy-passcode”. Type that into the box that comes up and …. magic! You should see something very much like the image below. The link we created has both a SHC and a raw FHIR bundle; you can flip between them with the dropdown that says “Health Information”.

So what happened here? When we ran create-link.js, it posted a JSON body to the server’s /createLink endpoint. The JSON set a passcode and an expiration time for the link, and most importantly included our SHC and FHIR files as base64url-encoded strings. SHLServer generated an encryption key, encrypted the files, stored a bunch of metadata in a SQLite database, and generated a SHL “payload” — which looks something like this:

{
  "url": "https://localhost:7071/manifest/XruV__8k1Zn68NK1lsLH05ZmONtaUC85jmAW4zEHoTA",
  "key": "OesjgV2JUpvk-E9wu9grzRySuMuzN4HpcP-LZ4xD8hc",
  "exp": 1687405491,
  "flag": "P",
  "label": "Fancy Label",
  "_manifestId": "XruV__8k1Zn68NK1lsLH05ZmONtaUC85jmAW4zEHoTA"
}

(You can make one of these for yourself by running create.js rather than create-link.js.) Finally, that JSON is encoded with base64url, the shlink:/ protocol tag is added to the front, and then a configured “viewer URL” is added to the front of that.

The viewer URL is optional — apps that know what SHLs are will work correctly with just the shlink:/… part, but by adding that prefix anybody can simply click the link to get a default browser experience. In our case we’ve configured it with https://shcwork.z22.web.core.windows.net/shlink.html, which opens up a generic viewer we’re building at TCP. That URL is just my development server, so handy for demo purposes, but please don’t use it for anything in production!

Anyways, whichever viewer receives the SHL, it decodes the payload back to JSON, issues a POST to fetch the manifest URL it finds inside, pulls the file contents out of that response either directly (.embedded) or indirectly (.location), decrypts it using the key from the payload, and renders the final results. You can see all of this at work in the TCP viewer app. Woot!

A Quick Tour of SHLServer

OK, time for some code. SHLServer is actually a pretty complete implementation of the specification, and could probably even perform pretty reasonably at scale. It’s MIT-licensed code, so feel free to take it and use it as-is or as part of your own solutions however you like, no attribution required. But I really wrote it to help folks understand the nuances of the spec, so let’s take a quick tour.

The app follows a pretty classic three-tier model. At the top is SHLServer.java, a class that uses the built-in Java HttpServer to publish seven CORS-enabled endpoints: one for the manifest, one for location URLs, and five for various SHL creation and maintenance tasks. For the admin side of things, parameters are accepted as JSON POST bodies and a custom header carries an authorization token.

SHLServer relies on the domain class SHL.java. Most of the important stuff happens here; for example the manifest method:

  • Verifies that the requested SHL exists and isn’t expired,
  • Rejects requests for disabled (too many passcode failures) SHLs.
  • Verifies the passcode if present, keeping a count of failed attempts.
  • Sets a header indicating how frequently to re-pull a long-lived (“L” flag) SHL, and
  • Generates the response JSON, embedding file contents or allocating short-lived location links based on the embeddedLengthMax parameter.

The admin methods use parameter interfaces that try to simplify things a bit; mostly they just do what they’re called:

Because the manifest format doesn’t include a way to identify specific files, the admin methods expect the caller to provide a “manifestUniqueName” for each one. This can be used later to delete or update files — as the name implies, they only need to be unique within each SHL instance, not globally.

The last interesting feature of the class is that it can operate in either “trusted” or “untrusted” mode. That is, the caller can either provide the files as cleartext and ask the server to allocate a key and encrypt them, or it can pre-encrypt them prior to upload. Using the second option means that the server never has access to keys or personal information, which has obvious benefits. But it does mean the caller has to know how to encrypt stuff and “fix up” the payloads it gets back from the server.

The bottom layer of code is SHLStore.java, which just ferries data in semi-ORM style between a Sqlite database and file store. Not much exciting there, although I do have a soft spot for Sqlite and the functional interface I built a year or so ago in SqlStore.java. Enough said.

Anatomy of a Payload

OK, let’s look a little more closely at the payload format that is base64url-encoded to make up the shlink:/ itself. As always it’s just a bit of JSON, with the following fields:

  • url identifies the manifest URL which holds the list of SHL files. Because they’re burned into the payload, manifest URLs are expected to be stable, but include some randomness to prevent them from being guessable. Our server implements a “makeId” function for this that we use in a few different places.
  • key is the shared symmetric key used to encrypt and decrypt the content files listed in the manifest. The same key is used for every file in the SHL.
  • exp is an optional timestamp (expressed as an epoch second). This is just a hint for viewers so they can short-circuit a failed call; the SHL hoster needs to actually enforce the expiration.
  • label is a short string that describes the contents of the SHL at a high level. This is just a UX hint as well.
  • v is a version number, assumed to be “1” if not present.
  • flags is a string of optional upper-case characters that define additional behavior:
    • “P” indicates that access to the SHL requires a passcode. The passcode itself is kept with the SHL hoster, not the SHL itself. It is communicated to the SHL holder and from the holder to a recipient out of band (e.g., verbally). The flag itself is just another UX hint; the SHL hoster is responsible for enforcement.
    • “L” indicates that this SHL is intended for long-term use, and the contents of the files inside of it may change over time. For example, a SHL that represents a vaccination history might use this flag and update the contents each time a new vaccine is administered. The flag indicates that it’s acceptable to poll for new data periodically; the spec describes use of the Retry-After header to help in this back-and-forth.

One last flag (“U”) supports the narrow but common use case in which a single file (typically a SHC) is being transferred without a passcode, but the data itself is too large for a usable QR code. In this case the url field is interpreted not as a manifest file but as a single encrypted content file. This option simplifies hosting — the encrypted files can be served by any open, static web server with no dynamic manifest code involved. The TCP viewer supports the U flag, but SHLServer doesn’t generate them.

Note that if you’re paying attention, you’ll see that SHLServer returns another field in the payload: _manifestId. This is not part of the spec, but it’s legal because the spec requires consumers to expect and ignore fields they do not understand. Adding it to the payload simply makes it easier for users of the administration API to refer to the new manifest later (e.g., in a call to upsertFile).

Working with the Manifest

After a viewer decodes the payload, the next step is to issue a POST request for the URL found inside. POST is used so that additional data can be sent without leaking information into server logs:

  • recipient is a string representing the viewer making the call. For example, this might be something like “Overlake Hospital, Bellevue WA, registration desk.” It is required, but need not be machine-understandable. Just something that can be logged to get a sense of where SHLs are being used.
  • passcode is (if the P flag is present) the passcode as received out-of-band from the SHL holder.
  • embeddedLengthMax is an optional value indicating the maximum size a file can be for direct inclusion in the manifest. More on this in a second.

The SHL hoster uses the incoming manifest request URL to find the appropriate manifest (e.g., in our case https://localhost:7071/manifest/XruV__8k1Zn68NK1lsLH05ZmONtaUC85jmAW4zEHoTA), then puts together a JSON object listing the content files that make up the SHL. The object contains a single “files” array, each element of which contains:

  • contentType, typically one of application/smart-health-card for a SHC or application/fhir+json for a FHIR resource (I promise we’ll cover application/smart-api-access before we’re done).
  • A JSON Web Encryption token using compact serialization with the encrypted file contents. The content can be delivered in one of two ways:
    • Directly, using an embedded field within the manifest JSON.
    • Indirectly, as referenced by a location field within the manifest JSON.

This is where embeddedLinkMax comes into play. It’s kind of a hassle and I’m not sure it’s worth it, but not my call. Basically, if embeddedLengthMax is not present OR if the size of a file is <= its value, the embedded option may be used. Otherwise, a new, short-lived, unprotected URL representing the content should be allocated and placed into location. Location URLs must expire after no more than one hour, and may be disabled after a single fetch. The intended end result is that the manifest and its files are considered a single unit, even if they’re downloaded independently. All good, but it does make for some non-trivial implementation complexity (SHLServer uses a “urls” table to keep track; cloud-native implementations can use pre-signed URLs with expiration timestamps).

In any case, with JWEs in hand the viewer can finally decrypt them using the key from the original payload — and we’re done. Whew!

* Note I have run into compatibility issues with encryption/decryption. In particular the specification requires direct encryption using A256GCM, which seems simple enough. But A256GCM requires a 12-byte initialization vector, and there are libraries (like python-jose at the time of this writing) that mistakenly use 16.  Which might seem ok because it “works”, but some compliant libraries (like javascript jose) error out when they see the longer IV and won’t proceed. Ah, compatibility.

SMART API Access

OK I’ve put this off long enough — it’s a super-cool feature, but messes with my narrative a bit, so I’ve saved it for its own section.

In addition to static or periodically-updated data files, SHLs support the ability to share “live” authenticated FHIR connections. For example, say I’m travelling to an out-of-state hospital for a procedure, and my primary care provider wants to monitor my recovery. The hospital could issue me a SHL that permits the bearer to make live queries into my record. There are of course other ways to do this, but the convenience of sharing access using a simple link or QR code might be super-handy.

A SHL supports this by including an encrypted file with the content type application/smart-api-access. The file itself is a SMART Access Token Response with an additional aud element that identifies the FHIR endpoint (and possibly some hints about useful / authorized queries). No muss, no fuss.

The spec talks about some other types of “dynamic” exchange using SHLs as well. They’re all credible and potentially useful, but frankly a bit speculative. IMNSHO, let’s lock down the more simple file-sharing scenarios before we get too far out over our skis here.

And that’s it!

OK, that’s a wrap on our little journey through the emerging world of SMART Health Cards and Links. I hope it’s been useful — please take the code, make it your own, and let me know if (when) you find bugs or have ideas to make it better. Maybe this time we’ll actually make a dent in the health information exchange clown show!

Bionic!

Last Tuesday I got up in the morning, showered and ate some breakfast, took the dog out, did the crossword, got a new pair of eyes, took a nap, watched the Mariners game and went to bed. In case you missed that, I got a new pair of eyes. OK, new lenses to be precise, but still. The technology is amazing and of course I went down a bit of a rabbit hole learning about how it works. It’s hard to believe that we really get to live in this world — just so cool.

Normal Vision

At lot of folks know the basics of how vision work, but let’s start there anyways. Light comes in through an opening in the front of our eye called the pupil (the black part). In front of the pupil is the cornea; just behind it is the lens. Both of these are clear and serve to refract (bend) the incoming light so that it lands perfectly aligned on the retina, a grid of cells on the back of the eye that sense light impulses and send them up the optic nerve to the visual cortex, which assembles the signals into a coherent concept of what we’re looking at.

This works great to see things at a distance — like across the room or street or whatever — when the incoming light rays are almost parallel to each other. But we often need to see things that are much closer, like the words in a book. In this case the incoming light rays diverge and enter the eyes at steeper angles, causing the focal point to fall far behind the retina and blur. Evolution compensates for this by allowing us to dynamically change the shape of the lens to bring things into focus. The ciliary muscle squeezes the lens, making it fatter. This fatter lens bends the outer rays more sharply, pulling the focal point back onto the retina so we can read. Just amazing.

Fun fact, this is why squinting actually can help you see better — it’s a crude way of changing the shape of your eye structure, which can impact where the focal point falls. But squinting can only do so much, quickly tires out your facial muscles, and looks pretty goofy — so not a great long-term solution.

Nearsightedness

I started wearing glasses full-time for myopia (nearsightedness) when I was about twelve — I could see things close up, but not at distance. This happens because the eyeball itself is elongated, or because the cornea or lens is overly-refractive (too strong). Either one causes the focal point to fall in front of the retina, blurring the image received by the brain.

Hyperopia (farsightedness) is the exact opposite — flaws in the eye cause the natural focus point to fall behind the retina. Either “opia” can be fixed relatively easily by placing lenses in front of the eyes in the form of glasses or contacts. The optometrist just keeps trying different lens powers (“Which is better, A or B?”) until they find the one for each eye that lands the image perfectly on the retina at distance. Your ciliary muscle does its job for closeup tasks, and everything is back in business. Woot!

Note I’m basically ignoring astigmatism here, which occurs when flaws in the cornea or lens are asymmetric — e.g., maybe blurring only happens on the horizontal plane. This makes everything way more complicated, and I don’t have much of it myself, so I’m going to pretend it doesn’t exist. Sorry about that.

2015: LASIK

Glasses are fine, and truth be told I probably look better with them on. But they’re also annoying, especially in the rain or under ski goggles or whatever. And fully recognizing the irony of this given my enthusiasm for surgery, contacts just scare the bejeezus out of me — no way. So just about eight years ago I decided to get LASIK surgery to repair my nearsightedness. Dr Sharpe seemed like a good guy and got solid reviews, so into the breech I went.

LASIK (Laser-Assisted In Situ Keratomileusis) replaces the need for external lenses by reshaping the cornea so that it refracts properly. Because everything is always complicated, the cornea is actually made up of five distinct layers. Starting from the top:

  • The Epithelium is exposed to the environment and passes oxygen/nutrients to the rest of the structure. It constantly regenerates itself and contains a ton of nerves, which is why it hurts so much if you scratch your eye, as I did back in high school with plaster dust. Ouch.
  • Bowman’s Layer as near as I can tell basically acts as a buffer/sealer between the dynamic epithelium and more static lower layers.
  • The Stroma is the thickest part of the cornea (which isn’t saying much at about 500 micrometers) and where most refraction occurs.
  • You’ll have to research Descemet’s membrane the Endothelium yourself because they’re not relevant to LASIK.

The procedure is outpatient and other than a boatload of topical numbing drops, the only anesthesia I had was a medium-heavy dose of valium. A suction/stabilizing device is placed over the eye and the laser cuts a circular “flap” through the top two layers of the cornea. The flap is folded back to expose the stroma, the laser nibbles away at the stroma to reshape it for the correct prescription (flatter for myopia; steeper for hyperopia), and finally the flap is folded back in place.

Apparently the epithelium regenerates so quickly that the flap just heals on its own — I have read about stitches being used, but that didn’t happen in my case. The cut itself is positioned over the iris (the colored part of the eye), so even before it heals there’s no impact to your vision. The weirdest part about all of this is the burnt hair smell that is in fact the laser burning away parts of your eye. Yeesh.

But holy crap, I literally sat up in the chair post-procedure and could see great. Right away. Now of course there was some swelling and pain and stuff over the next few days … but it was one of the most shocking things that has ever happened to me, ever. Just brilliant.

Enter Presbyopia

My vision was basically perfect for about six years after LASIK. I can’t say enough good stuff about that decision, but I’m taking a long time to get to the really good part of this article, so I’ll leave it at that. Absolutely would recommend LASIK to anybody who qualifies.

But of course time marches on. Near vision starts to degrade for almost everyone sometime in their forties or thereabouts, which is why we need reading glasses and shine our phone flashlights on the menu. It’s called presbyopia, and it happens because the lens becomes less elastic and those ciliary muscles just can’t squeeze hard enough to change its shape for near focus. Folks who already have glasses start buying biofocals, and those of us with good distance vision (naturally or thanks to LASIK) start haunting the drugstore aisles for cheap readers.

A little fine print (see my eye joke there?): during my LASIK consult, I chose the “regular” version which corrects both eyes for distance. There is another option called “monovision” in which the dominant eye is corrected for distance, but the second eye is corrected for reading. That is, the second eye is adjusted so that an object in the near field is projected clearly onto the retina with the lens at rest (vs. “squeezed” as we discussed above). Typically, the brain is able to adjust and automatically swap between eyes based on what you’re looking at, which is utterly amazing.

Because the near-vision eye can focus with the lens at rest, monovision can head off presbyopia — you don’t need to change the lens shape to see close-up, you just need to use the eye dedicated to that purpose. This was tempting, but there are a few downsides, particularly (for me) some loss of depth perception since you no longer have effective binocular vision. And since LASIK removes only a tiny amount of corneal tissue, you can actually have it done more than once — I was assured that I could simply “touch up” my eyes in the future to address presbyopia or other changes if needed.

Indeed, I eventually started to need readers, and it was fine. I’m not sure why, but there was actually something kind of nice about the ritual of pulling out the glasses to read or work the crossword or whatever. That is, it was nice until I started needing them for everything. Cooking instructions on the frozen pizza? Glasses. Seat on my boarding pass? Glasses. Which direction does the HDMI cable go in? Glasses. You get the idea. When I started needing them just to snooze the alarms on my phone, I knew it was finally time to go in for the “touch up.” Procrastinated a bit more thanks to COVID and all, but finally pulled the trigger about a month ago.

After walking around the house taking pictures of everywhere I noticed readers lying around (above), I rolled up to the Sharpe Vision office for my consult only to realize that it was no longer their office — apparently in the almost-decade since I got my LASIK they moved a few streets down. A quick lookup on the phone (with readers) and I made it just in time for my appointment at their new place past Burgermaster on 112th

… only to find that the world had changed once again. Yes, they could touch up my LASIK, and could even offer a new flavor called laser blended vision that’s like monovision but with improved depth perception. But what I really ought to check out is RLE — Refractive Lens Exchange. And since apparently I’m always up for new ways to mess with my eyes, I was totally in. Here’s the deal.

Cataracts

Along with presbyopia, over time most people eventually develop cataracts, a clouding of the lenses that makes them less able to transmit light energy. This is the other reason we’re all using our phone flashlights to read our menus. The good news is that cataracts are easily fixed by replacement of the natural lens with an artificial one.

An aside: cataracts are a major cause of correctable blindness in the developing world. Doctors Without Borders has conducted free “eye camps” in Somalia for many years and has fixed cataracts for hundreds of people who literally go from blind to normal vision in one day. If you’re able to give a bit, you’re not going to find a better organization — they are awesome.

Because we do what we do, there’s been a ton of innovation in replacement lens technology. The path of that innovation is pretty neat, and recently folks have realized that — hey — maybe these lenses are awesome and safe enough that we don’t need to wait until cataracts form to swap them in! The material lasts well beyond the fifty-odd years that middle-aged humans have before them, so why not? Thus was born the “RLE” (Refractive Lens Exchange) industry, and a new practice for the newly re-named “SharpeVision Modern LASIK and LENS.”

2023: Refractive Lens Exchange

RLE at Sharpe with Dr. Barker is pretty fancy. Even before we get to the lens itself, the procedure alone shocks and awes:

  1. The CATALYS Precision Laser System identifies key structures in the eye and creates a 3D map at the micron level. Check out the video of this, it’s super-cool.
  2. The laser cuts small entry slits through the cornea and a round opening in the front of the capsule that holds the lens.
  3. The laser softens and segments the existing lens so that it can be easily broken up and sucked out through a small vacuum tube.
  4. The new lens is passed into the now-empty capsule through a small tube. The lens is flexible and can be folded up so it fits through the small entry hole.
  5. When the lens unfolds, two springlike spiral arms called haptics hold it in place in the center of the capsule.

All of this computer-assisted laser stuff is just incredible. I was awake throughout my procedure and it was pretty crazy to listen to this HAL-like computer voice announcing what percentage of my lens had been sucked out at each step.

Monofocal IOLs (Intraocular Lenses)

OK, finally I get to talk about the intraocular lens itself, which is what sent me down this rabbit hole in the first place. The old-school version of this is the Monofocal IOL, which “simply” acts just like the lens in your glasses or the reshaped cornea in LASIK, using refraction to focus images at distance onto the retina. Monofocals are the workhorse of cataract surgery, but they have some disadvantages. Primarily, since they have only one focal distance and can’t be squeezed / reshaped by the ciliary muscle, readers are basically guaranteed for close-up work. There is a “monovision” option using differently-powered lenses in each eye, but that comes with all the same issues as monovision LASIK.

Accommodating IOLs

Today there are basically two kinds of “premium” IOLs that attempt to provide a glasses-free experience. One is the “accommodating” IOL — most famously the Bauch & Lomb Crystalens. The concept makes a ton of sense — just replicate the action of our natural lens. Remember that an IOL has little springy arms called haptics that hold it in place in the eye (the orange bits in the picture here). The same ciliary muscle that squeezes our natural lens can apply force to these haptics, which are designed to change the lens shape and position in response. The rest of your vision system just does what it’s always done, and the focus point adjusts naturally.

Pretty neat, and I’m always drawn to biomimetic solutions, because evolution tends to, well, work. But while it’s a little hard to find good data, it appears that the Crystalens has seriously dropped in popularity over the last decade or so — only 10% of practitioners were using it in 2021 according to this “Review of Ophthalmology” article that claims to know. From what I can find (e.g., here) it seems that the near vision improvements from these lenses just aren’t that great, and may also decline over time. Perhaps our intrepid ciliary muscle just loses some oomph as we get older … who knows.

So at least for now, accommodating IOLs don’t seem to be the favorite child. Even the original inventor of the Crystalens has moved on to new technologies. But don’t blink (another eye joke), because there are true believers still working the problem with a bunch of new stuff in the pipeline.

Multifocal IOLs

OK, we’ve finally arrived at my lens, the Clareon PanOptix Trifocal IOL, presently the most popular of the other class of premium IOLs: multifocal. Multifocal lenses have no moving parts but instead divide up the incoming light rays into multiple focal points — two for bifocals, three for trifocals. The “distance” focus typically uses refraction — the same mechanism we’ve seen again and again on this journey. But multifocal lenses are shaped so that light entering from near or intermediate distances is diffracted to provide focus in those ranges.

Diffraction occurs when light hits a discontinuity in material. The actual math is super complicated and a bit beyond me, but at the highest level, a light wave passing through different materials (the lens itself vs the aqueous material surrounding it) creates interference patterns that ultimately bend the light in a predictable way. A multifocal lens has a bunch of concentric circles of varying heights that produce this effect — you can see them if you click to zoom into the picture of the PanOptix on the right.

The end result is that the lens creates clear images on the retina at three different distances:

  1. Plano or “infinity” for driving and watching whales in Puget Sound (refracted).
  2. About 24 inches for “intermediate” tasks like computer (and lathe!) use.
  3. About 16 inches for “near” tasks like reading.

Multifocal Issues and Mitigations

If you’re paying attention, you’re probably asking the same question I did when I first learned about these things. Aren’t you now getting THREE images projected onto the retina at the same time? Well, kind of yes. But two things help you out. First, your brain is just really smart and figures it out in the same way that it does with monovision — paying attention to the stuff you are showing interest in by the direction of your gaze and other clues. More importantly, at any given time there’s usually only one of these three distances that actually has something to look at. For example, if I’m reading I’m not getting much of an image from anything behind the book. Between the two of these, your brain very quickly just makes it work.

It is amusing to experience these artifacts in real life. The most obvious one is the “halos” that appear around point light sources such as headlights or streetlights. I wish I could capture it with a camera, but you actually see the diffraction patterns — the light looks exactly like the rings on the lens itself! It’s a bit annoying — if I were a long haul trucker I might think twice about getting a multifocal — but for me it’s no big deal.

A second issue makes sense in theory, but (least so far) I’m not experiencing it in practice. With a natural lens, pretty much all of the light that comes into your eye is captured by the retina. Of course the iris opens and closes to admit an optimal amount of light, but very little of that is lost passing through the lens. With a multifocal the energy is divvied up between the focal points, plus there is some additional loss inherent in the diffractive process itself.

The PanOptix has a neat feature that tries to minimize this by “collecting” light energy at a (mostly unused I guess?) focal distance of 120cm and diffracting it in reverse so that energy helps power distance vision. The end result is that the PanOptix uses about 44% of incoming light for distance, 22% each for near and intermediate, and loses about 12% to the process. Not bad! And at least so far I can’t detect any loss of contrast or issues in lower-light situations. The effect is surely there, I’m just not aware of it.

The Hits Keep Coming

So far I’m super-satisfied with my new lenses — distance vision feels about the same as it was before, but I can read and use the phone/computer comfortably without my trusty readers. Every day my brain gets more used to the various artifacts that do exist, and my vision should stay pretty stable much until I die. Woo hoo!

At the same time, it’s clear that all three of the broadly-used lens types out there (monofocal / accommodating / multifocal) have pros and cons — none work as well as the natural lenses of a young adult. So researchers keep pushing the envelope. The latest concept I’ve read about is Extended Depth of Focus (really well-explained here). The concept behind EDOF lenses is to extend the range of distances that can provide an acceptably (if not perfectly) focused image on the retina, rather than pinning focus to specific intervals.

There are a few mechanisms being tried to product EDOF; the easiest for me to understand is the pinhole effect, which has been used in photography for years. By shrinking the hole through which light enters, you basically filter out the steeper rays that would spread out over the retina, leaving only the ones that are already mostly parallel anyways (regardless of how far they are in front of the eye). Of course this also filters out a bunch of light energy, so it’s harder to see in low-light conditions. So far these lenses have mostly been used monovision-style — one eye gets the pinhole lens and the other gets a classic monofocal.

It’ll be interesting to see how this new approach plays out. And I could easily keep digging deeper into this stuff forever — but I think we’ve covered more than enough for one article. In case it isn’t clear, I’m fascinated with attempts to repair, build on and improve the capabilities that have been so hard won by evolution over millennia. Getting new lenses and learning about the technology has been super-fun — thanks for coming along for the ride!

The Most Important ChatGPT App Ever

I’ll grant that I have a relatively nerdy social circle — but it’s still sort of shocking just how many people I know are actually doing useful and interesting things with ChatGPT. Just a sampling:

Just to iterate what I’ve said before, I believe this thing is really real, and it behooves everyone to spend some time with it to build an intuition for what it is (and isn’t) good at. Like any technology, it’s important to have at least a basic understanding of how it works — otherwise folks that do will use it to take advantage of you. The fact that this technology appears to be sentient (hot take from Sean, see how I just dropped that in there?) doesn’t change the reality that people will use it to create phishing scams. Two of my past posts may help:

Anyways, all of this peer pressure got me thinking that I’d better do something important with ChatGPT too. And what could possibly be more important than creating more amusing content on Twitter? I know, right? Brilliant! So that’s what I did. And I figured I might as well write about how I did it because that might help some other folks stand on these impressive shoulders.

AI News Haiku

You’re definitely going to want to go visit @AINewsHaiku on Twitter (don’t forget to follow!). Three times a day, roughly just before breakfast, lunch and dinner, it randomly selects a top news story from United Press International, asks ChatGPT to write a “funny haiku” about it, and posts to Twitter. That’s it. Funny(-ish) haikus, three times a day.

The rest of this post is about how it works — so feel free to bail now if you’re not into the nerd stuff. Just don’t forget to (1) follow @AINewsHaiku, (2) tell all your friends to follow it too, and (3) retweet the really good ones. Be the trendsetter on this one. No pressure though.

The Code

I’ve reluctantly started to actually enjoy using Node for little projects like this. It’s super-easy to get going without a thousand complicated build/run steps or an IDE, and with a little discipline Javascript can be reasonably clean code. Have to be really careful about dependencies though — npm makes it really easy to pick up a billion packages, which can get problematic pretty quick. And “everything is async” is just stupid because literally nobody thinks about problems that way. But whatever, it’s fine.

There is not a lot of code, but it’s all on github. Clone the repo, create a “.env” file, and run “node .” to try it yourself. The .env file should look like this (details on the values later):

OPENAI_API_TOKEN=[OpenAI Secret Key]
TWITTER_API_APP_KEY=[Twitter Consumer API Key]
TWITTER_API_APP_SECRET=[Twitter Consumer API Secret]
TWITTER_API_ACCESS_TOKEN_KEY=[Twitter Authentication Access Token]
TWITTER_API_ACCESS_TOKEN_SECRET=[Twitter Authentication Access Secret]

index.js starts the party by calling into rss.js which loads the UPI “Top News” RSS feed and extracts titles and links (yes RSS still exists). xml2js is a nice little XML parser, a thankless job in these days of JSON everywhere.  You’ll also note that I’m importing “node-fetch” for the fetch API; it’s built-in in Node v18 but the machine where I’m running the cron jobs is locked to Node v16 so there you go.

Talking to Chat-GPT

After picking a random title/link combo, next up is openai.js which generates the haiku.. The OpenAI developer program isn’t free but it is really really cheap for this kind of hobby use; you can get set up at https://platform.openai.com. My three haikus a day using GPT-3.5 run somewhere on the order of $.10 per month. Of course, if you’re asking the system to write screenplays or talk for hours you could probably get into trouble. Live on the edge, and make sure to add your secret key into the .env file.

In its simplest form, using the chat API is just like talking to the models via the user interface. My prompt is “Write a funny haiku summarizing this topic: [HEADLINE]” which I send with a request that looks like this:

{
  "model": "gpt-3.5-turbo",
  "temperature": 0.5,
  "messages": [ "role": "user", "content": PROMPT ]
}

model” is pretty obvious; I’m using v3.5 because it’s cheap and works great.

temperature” is interesting — a floating point value between 0 and 2 that dials up and down the “randomness” of responses. In response to a given prompt, a temp of 0 will return pretty much the same completion every time, while 2 will be super-chaotic. 0.5 is a nice conservative number that leaves some room for creativity; I might try dialing it up a bit more as I see how it goes. There is also a parameter “top_p” which is similar-but-different, typical of many of the probabilistic dials that are part of these models.

I’ve sent a single element in the “messages” parameter, but this can become quite elaborate as a way to help explain to the model what you’re trying to do. The guide for prompt design is really fascinating and probably the best thing to read to start building that intuition for the system; highly recommended.

There are a bunch of other parameters you can use that help manage your costs, or to generate multiple completions for the same prompt, that kind of thing.

The JSON you get back contains a bunch of metadata about the interaction including the costs incurred (expressed as “tokens,” a vague concept corresponding to common character sequences in words; you can play with their tokenizer here). The completion text itself is in the “choices” array, which will be length == 1 unless you’ve asked for multiple completions.

Over time it’s going to be interesting to see just how challenging the economics of these things become. Training big models is really, really computationally-expensive. At least until we have some significant quantitative and/or qualitative change in the way its done, only big companies are really going to be in the game. So while I’m sure we’ll see pretty fierce competition between the usual suspects, there’s a big risk that the most revolutionary technology of the century is going to be owned by a very small number of players.

For now, just have fun and learn as much as you can — it’ll pay off no matter what our weirdo economic system ends up doing.

And… Tweet!

Honestly I thought this was going to be the easiest part of this little dalliance, but the chaos that is Twitter clearly extends to its API. It’s bad in pretty much every way: 2+ versions of the API that overlap a lot but not entirely; four different authentication methods that apply seemingly randomly to the various endpoints; constantly changing program/pricing structure with all kinds of bad information still in the documentation. Worst of all, the API requires signed requests which pretty much makes calling their REST endpoints without a library enormously painful. Wow.

Having tried a few libraries and trial-and-errored my way through a few approaches, the actual code in twitter.js isn’t bad at all — but the journey to get there was just stupid. To try and save you some time:

  • Sign up for free access at https://developer.twitter.com/en/portal/dashboard. They will try to direct you to “Basic” access but this is $100/month; don’t be fooled.
  • You’ll get a default “Project” and “App” … scroll to the bottom of the app “Settings” and choose “Edit” under “User Authentication Settings.” Make sure you have read/write permissions selected (you won’t at first). A bunch of fields on this page are required even if you’re not going to use them — just do your best until they let you hit “Save.”
  • Now under “Keys & Tokens” choose “Regenerate” for “Consumer Keys / API Key and Secret” and “Authentication Tokens / Access Token and Secret” … save these values and add them to the appropriate spots in your .env file.

This will set you up to call the v2 method to post a tweet using the OAuth v1.0a authentication model. There are surely many other ways you can get things working, but that was mine. I also chose to use the twitter-api-v2 library to manage the noise — it does a fine job trying to hide the dog’s breakfast that it wraps. At least for now. Until Elon gets into a slap-fight with Tim Berners-Lee and decides to ban the use of HTTPS.

You’re Welcome!

The point of all this (beyond the excellent haiku content which you should definitely follow) was just to get some hands-on experience with the API for ChatGPT. Mission accomplished, and I’m really quite impressed with how effective it is, especially given the speed at which they’re moving. I just have to figure out how to reliably tell the model to limit content to 250 characters, because until I do that I’m not going to be able to release @AINewsLimerick or @AINewsSonnet. The world is waiting!

Looking back at Azyxxi… er, Amalga.

Just a few months after the Great Gunshot Search incident of 2005, I found myself at Washington Hospital Center while Dr. Craig Feied showed us list after list on a huge (for the time) monitor. Real-time patient rosters for the ER and ICU, sure, but that was just the warmup. Rooms that needed cleaning. Patients who needed ventilation tubes replaced. Insurance companies with elevated rates of rejected claims. Patients eligible for actively-recruiting complex trials. He just kept going, like a fireworks show where every time you think you just saw the finale they start up again. Incredible stuff. Anyways, cut to a few months later and we (Microsoft) announced the acquisition of Azyxxi — adding an enterprise solution to our growing portfolio in Health Solutions.

Sadly — and despite a ton of work — we were never really able to scale that incredible solution at WHC into a product that realized the same value at other institutions. That’s not to say there weren’t some great successes, because there absolutely were. But at the end of the day, there was just something about Azyxxi that we couldn’t put into a box. And while it’s tempting to just say that it was a timing problem, I don’t think that was it. Even today I don’t see anything that delivers the magic we saw in Dr. Craig’s office — just flashy “innovation” videos and presentations that never quite make it to the floor in real life.

So what was the problem? Anything we can do about it? I dunno — let’s talk it out.

Oh, and just to get it out of the way early, “Azyxxi” doesn’t mean anything — it’s just a made-up word engineered to be really easy to find with Google. We renamed it “Amalga” at Microsoft, which does actually have some meaning behind it but in retrospect sounds a bit like some kind of scary semi-sentient goo. Moving on.

Just what was it?

A correct but only semi-helpful description of Azyxxi is that it was a data analysis and application platform for healthcare. Three parts to that: (a) data analysis, like a big data warehouse; (b) an application platform so insights gained from analysis could be put into on-the-floor solutions; (c) made for healthcare, which means there was functionality built-in that targeted weirdnesses endemic to the business of providing care. This is of course a mouthful, and one of the reasons it was hard to pitch the product outside of specific use cases. A better and more concrete way of looking at the product is to break it down into five key activities:

1. Get the Data

Healthcare data is incredibly diverse and notoriously messy — images, free text notes, lab results, insurance documents, etc. etc.. The first rule of the Azyxxi Way (yes we actually referred to it like that) was to “get the data, all of it, without trying to clean it up.” Which is to say, it was a Data Lake before Data Lakes were cool (or even a term). In 2006 the conventional wisdom for data warehousing was “Extract, Transform, Load.” ETL pipelines extract data out of source systems, transform it into (usually) a “star schema” optimized for analysis, and load it into a target database. In this model an enormous amount of upfront thought goes into what data is important, and transforming/normalizing it into a shape that can efficiently answer a set of predefined questions.

Azyxxi’s insight was that ETL prework is always wrong, and leaves you with a brittle data warehouse unable to answer novel questions as they inevitably arise. Instead they talked about “ELT” — loading everything just as it was in the source systems and figuring out the “transform” part later. This seems obvious now, but we all used to worry a ton about performance. Azyxxi used SQL Server, and the founders were constantly pushing its boundaries, typically with great success. Sure, some queries were really slow — but you could at least ask the question!

2. Ask Novel Questions

Which leads us to the first user-driven Azyxxi experience — exploration. Using an Excel-like grid display, users had the ability to query source tables individually or via pre-configured “joins” that linked records. Sort, filter, etc. — all the standard stuff was there. Of course there was access control, but this was a care-focused tool in a care-delivery setting — by default users could see a LOT. And as noted above they could get themselves into “trouble” by running queries that took hours or days, but SQL Server is smart and it was mostly just fine.

The key is that there was a culture at the early Azyxxi sites, developed over many years, of asking questions and self-serving the answers. This is not typical! Most nurses and doctors ask themselves data-driven questions in passing, but never follow them up. Working with the IT department to run a report, combine data from multiple sources, get approval to make a change — it just isn’t worth the hassle. So great ideas just die on the vine every day. Azyxxi users knew they had a way to answer their questions themselves — and so they did.

3. Bring Insights to the Floor

It’s awesome to be able to ask questions. But it’s only really impactful when you can use the answers to effect change in real life. With Azyxxi, one-off queries could be saved together with all of their settings — including automatic refresh and kiosk-style display — and shared with other users or departments.

If you’ve been a hospital inpatient or visitor lately, almost certainly you’ve seen the patient roster grid at the central nurse’s station. At my recent colectomy the surgical unit had a live status board that helped my wife keep track of my progress through the building. Great stuff, but every one of these dashboards is an IT project, and no IT project is trivial. With Azyxxi, more than a decade ago, users could create and deploy them by themselves.

But hold on. I’ve already said twice that novel queries against source data could be really slow — a “real-time” dashboard that takes an hour to load isn’t going to get very far, and end users don’t have the skills or tools to fix it. What to do?

Azyxxi empowered the IT folks to run behind user innovation and keep things humming. Each user-created list was driven by an automatically generated SQL query — and anyone who has written interfaces like this know that they can become very inefficient very quickly. Slow queries were addressed using a sliding scale of intervention:

  1. Hand-code the query. SQL experts on the Azyxxi team were great at re-writing queries for performance. The new query could be inserted behind the user grid transparently and without downtime — it just looked like magic to the end users.
  2. Pre-calculate joins or derived data. When hand-coding queries wasn’t enough, the team could hook into the “EL” part of data acquisition and start doing a little “T” with code. For example, data from real-time monitors might be aggregated into hourly statistics. Or logic to group disease codes into higher-level buckets could be applied ahead of time. These were the same kind of “transforms” done in every data warehouse — but only done after a particular use case proved necessary and helpful.
  3. Fully-materialize user grids. An extreme version of pre-calculation, sometimes code would be written to build an entire user grid as its own table. Querying these tables was lightning fast, but creating them of course took the most IT effort.

The refrain here was just-in-time optimization. The software made it easy for the Azyxxi IT team to see which queries were active, and to assess which approach would lead to acceptable performance. That is, they optimized scarce IT expertise to only do work that was already known to have real clinical value. Compare this to the torturous processes of up-front prioritization and resource allocation in most of the world.

Axyxxi also made these transforms sustainable by strictly enforcing one-way data dependency. Only one “parser” (not really a parser in the CS sense, just Azyxxi terminology for ELT code) could write to one target (typically a table or set of tables), and then optionally trigger additional downstream parsers to perform further transformation into other targets. This “forward-only-write” approach provided a ton of benefit — most importantly automatic error and disaster recovery. At any time, parsers at any level of the hierarchy could be re-run from scratch, trigger their downstream dependencies, and end up with an exact copy of what existed before the recovery event.

Even these dependencies could become complicated, and nobody loved the idea of a “full re-parse” — but it was an invaluable backup plan. One we took advantage of more often than you’d expect!

4. Close the Loop

Because data acquisition was near-real-time, most grids didn’t require additional user input to be useful. New lab results arriving for a patient naturally caused them to fall off of the “patients awaiting lab results” grid. It’s kind of amazing how many problems fit this simple pattern — auto-refreshing grids on a kiosk screen prove to be transformative.

But sometimes there was no “source system” to provide updates — e.g., a list that alerted facilities to newly-vacated rooms that needed to be cleaned. The “newly-vacated” part came from the external EHR system, but cleaning times did not. Azyxxi included user-editable fields and forms for this purpose — never changing ingested data, just adding new data to the system. A facilities employee could simply click a row after taking care of a room, and the grid updated automatically.

Users could create pretty complex forms and such in the system — but honestly they rarely did. Usually it was simply checking an item off of a list, maybe with a bit of extra context about the activity. Simple stuff that created beautifully elegant solutions for a ton of different situations.

5. Improve the data

There are a bunch of challenges specific to healthcare data. Take for example the humble patient identifier — by law we have no federal patient identification number in the United States. The amount of time and money spent making sure records are assigned to the right human is absolutely shocking, but there it is. Especially in high-stress hospital admission settings, recorded demographics are often wrong or missing — every significant health care information system has to deal with this.

Privacy rules are another one. Providers in a care setting have very few restrictions on the data they can see, but the same isn’t true for all employees, and certainly not for visitors walking by kiosk displays in a hallway. There are specific rules around how data needs to be anonymized and what data elements can appear together — more work for users trying to build usable queries.

Even simply figuring out why a patient is in the hospital can be tough. Different systems use different “coding systems”, or sometimes no coding at all. A huge federal project called the “Unified Medical Language System” is an attempt to help navigate all of this, but it’s pretty hairy stuff and not in any way “user ready.”

Azyxxi’s “one way” parsing system made it relatively easy to help create “augmented” tables to handle these things once rather than many times. My favorite example of this was the “PHI filter” parser, which would take a table and automatically create a version that masked or otherwise anonymized relevant fields. The user interface could then be directed at the original or “safe” version of the table, depending on the rights of the logged-on user.

This all sounds great, so what happened?

If you’ve read along this far, you probably already have a sense of the challenges we were about to face as Azyxxi v1 became Amalga v2. We spent a lot of time upgrading and hardening the software, modernizing UX, etc. – and that all went fine, albeit with some inevitable cultural churn. And despite a non-trivial problem with “channel conflict” — our nascent sales team was getting a positive response to the story we were telling. I mean, a simple slide show of awesome use cases at WHC and other Azyxxi sites was pretty compelling.

Side note: channel conflict is a tough thing at Microsoft! The sales team is used to co-selling with third parties that build solutions on top of Microsoft platforms like Windows and SQL Server (and now Azure). So they were best buddies with a whole bunch of healthcare data analytics companies that were in direct competition with Amalga … oops! This problem is a hassle for every vertical solution at Microsoft, and they’ve never really figured out how to deal with it. I don’t think it played a primary role in Amalga’s market woes, but it sure didn’t help.

So the software was OK — but right away, early implementations just weren’t making it into production use on schedule. What the heck?

Oops, IT Culture

First, it turned out that we had a significant problem fighting IT culture at our target customers. The Azyxxi team at WHC and its sister organizations were also the Azyxxi developers. For them, the counter-conventional-wisdom practices of Azyxxi were the whole point, and they knew how to turn every knob and dial to achieve just-in-time optimization. But your typical health system IT department — even those run by really competent folks — just doesn’t think that way. They are a cost center with an endless list of projects and requests, often driven more by risk avoidance than innovation. Most of these shops also already had some sort of data analytics solution; while they invariably sucked, they existed and were a sunk cost that the team knew how to use.

The Amalga team walked in and just started breaking eggs left and right. We asked for a very large up-front investment, using weird new techniques — all for a few smallish initial use cases that had captured the eye of some annoying but influential doctor or the Chief Medical Officer. We told them to “just get the data, don’t worry about what you think you need.” We told them that SQL Server was fine for things that made their SQL experts faint on the spot. We told them to give broad access to users rather than assigning rights on a “need to know” basis.

In short, we told them to do everything differently, using coding skills they didn’t even have. Not surprisingly, that didn’t work out. In almost every case we became bogged down by “prioritization” and “project planning” that stopped implementations cold. And even when we finally were able to eke out an MVP implementation, we almost always ran straight into our second stumbling block.

Oops, User Culture

The Amalga team used to talk a lot about “democratizing” access to data. And to be sure, nobody has better insight into day-to-day problems than nurses and docs and the others doing the actual work of providing care. But as it turns out, not a lot of these folks have the skills, motivation or time to dig in and create the kind of self-reinforcing flywheel of improvements that Amalga was designed for.

At least, that’s the way it is in most healthcare systems. The IT department and leadership push technology down onto the working staff, and they just have to deal with it. Sometimes it’s great, sometimes it’s awful, but either way it typically isn’t something they are rewarded for getting involved with. Executives and maybe department heads ask the IT department to prepare “reports” that typically show very high-level, lagging indicators of quality or financial performance. But technology-driven workflow changes? It’s usually a pretty small bunch making those calls.

This was a challenge at the early Azyxxi sites, too. But a combination of (a) sustained evangelist outreach from the Azyxxi team itself, and (b) successful users becoming evangelists themselves, created the right environment to bring more and more users into the game. Almost every department had at least one active Azyxxi user who would work with their colleagues to leverage the tools. But at new Amalga sites, where the IT team was often reluctant to begin with, with no established pattern of users self-serving their own solutions, and only a few small uses cases deployed — starting the flywheel was a tall order indeed.  

It’s tough to establish a system when you’re fighting culture wars on both the supply and demand fronts!

The good fight: Amalga v3

With a pretty clear set of problems in front of us, the Amalga team set out strategies to fix them. I’m really proud of this time in HSG — the team came together in one of those moments of shared purpose that is both rare and exhilarating. Some of the software we built would be state of the art even today. Bryan, Mehul, Kishore, Noel, Adeel, Sohail, Sumeet, Mahmood, Puneet, Vikas, Imran, Matt, Linda, Shawna, Manish, Gopal, Pierre, Jay, Bei-Jing, many many more … it was just a ton of fun.

Goal #1: Easier for IT

The biggest knock on Amalga v2 from IT was that it was just too slow. Of course, having been on this journey with me you know that this misses the point. Amalga was designed for just-in-time optimization — if important queries were “slow” they just needed to be optimized by hand-coding, pre-computing key values, or fully materializing tables. Simple! Unless of course your IT team doesn’t have advanced coding or SQL skills. Which was, unfortunately, most customers.

We took on a bunch of features to better automate JIT optimization, but the biggest by far was automatic materialization. Based on a list query created either in the Amalga user interface or by hand, Amalga v3 could automatically create and maintain a flat, simple table representing the results, with maximally-efficient inserts and updates at parse time. This meant that every grid could be made performant simply by checking a box to turn on materialization. OK, maybe not that easy — but pretty close.

We also made initial data acquisition simpler by introducing a “super parser” that could be driven by configuration rather than by code. We put together a sophisticated install and patch system that enabled upgrades without disturbing user customizations. We extended our custom client with Sharepoint integration, making it easier to combine Amalga and other corporate content, and reduced the burden of user and group management. And much more.

Goal #2: Shorter Time-to-Value for Users

If users weren’t creating their own apps, we’d bring the apps to them!

On top of the new Sharepoint integration, we created a configuration framework for describing data requirements for detailed, targeted use cases. Deploying an app was simply a matter of identifying the source for each data element required — a “checklist” kind of task that was easy to explain and easy to do. And while installing the first app or two would certainly require new parsing and data extraction work, at critical mass users were mostly reusing existing data elements, making it far easier to demonstrate the value of building a “data asset” over time.

And then we went mining for apps. We dug up every Azyxxi use case and convinced early Amalga customers to share theirs. Even better, we created a developer program, both for consultants who helped customers build their own apps (e.g., Avenade) and third party developers that created their own (e.g., CitiusTech). Classic Microsoft playbook — and a great way to recapture Dr. Craig’s fireworks-that-never-end sales experience.

Goal #3: Kickstart Evangelism

Lastly, we dropped our own people into customer sites, to be the early evangelists they needed. I was the executive sponsor for Seattle Children’s Hospital and was there at least once a week in person to help the IT team solve problems, meet with docs and nurses to develop lists and apps, take feedback and get yelled at, whatever it took. I learned a ton, and was able to bring that learning back to the team. I’ll always appreciate the time I spent there with Drex and Ted — even if it wasn’t always fun.

Honestly, I’ve never seen another organization commit to its customers so hard. Every single person on the team was assigned to at least one site — execs, sales, engineers, everyone. And our customers’ success was part of our annual review. If we just couldn’t get somebody over the hump, it sure wasn’t for a lack of sweat equity. In fact I forgot about this, but you can still find demos made by yours truly more than a decade ago! Here’s one inspired by Atul Gawande’s Checklist Manifesto:

And then came Caradigm (and Satya)

Update: Originally I dated the below as 2014 and Renato corrected me — the Caradigm JV was formed in 2012, two years before Satya’s official start date and my ultimate departure from the company. Those two years were very quite chaotic between the two CEOs and I’m afraid my brain conflated some things — thanks for setting me straight!

By 2012 we’d been in a long, pitched battle — making progress, but still tough. Then again, that had pretty much been the plan we set with Steve back in 2006; it was going to take a long time for Microsoft to really get established in a vertical industry like healthcare. I have always admired Steve for his willingness to commit and stick with a plan — people love to winge, but he was great for Microsoft for a long time.

But companies are not families, and shareholders and the market were clearly ready for new strategies and new blood as CEO. And where Steve’s approach was to go broad, Satya’s was (is) to go deep on just a few things — and clearly he was on the rise. Don’t get me wrong, it has clearly been a winning strategy for Azure and the business; a big part of my portfolio is still in Microsoft and my retirement is thankful for his approach! But it did shine a very, very bright spotlight on ventures like Health Solutions that weren’t core to the platform business and weren’t making any real money (yet). Totally fair.

So we had to find another path for Amalga.

During the last few years, it had become clear that a key use case for Amalga was population management — the idea that with a more comprehensive, long-term view of an individual we could help them stay healthy rather than just treat them when they’re sick. This is the driving force behind “value-based” care initiatives like Medicare Advantage, and why you see these plans promoting healthy lifestyle options like weight loss and smoking cessation — small early investments that can make a big difference in costs (and health) later in life.

But to do this well you need to know more about an individual than just when they show up at the hospital. It turns out that Amalga was very well-suited to this task — able to pull in data from all kinds of diverse sources and, well, amalgamate it into a comprehensive view (I had to do that at least once, right?). In fact, Amalga apps related to population health were typically our most successful.

It turned out that GE HealthCare was also interested in this space, building on their existing hardware and consulting businesses. Thus was born Caradigm, a joint venture that set out with partners like Geisinger Health to build population health management tools on top of Amalga. The new company took some employees from Microsoft but was more new than old, and fought the good fight for a few years until they were ultimately bought by Imprivata and frankly I’ve lost the thread from there.

TLDR; What to make of it all?

In retrospect, I think it’s pretty clear that Amalga’s problems were really just Healthcare’s problems. Not technology — Amalga v3 was certainly more sophisticated than Azyxxi v1, but both of them could do the job. Data and workflows in healthcare are just so fragmented and so diverse that a successful data-driven enterprise requires the problem-solving skills of people at least as much as technology. More specifically, two types of people:

  1. Developers that can quickly build and maintain site-specific code.
  2. Evangelists that can bring potential to life for “regular” users.

Of course a certain level of technology is required just to house and present the data. And great tech can be an enabler and an accelerant. But without real people in the mix it’s hard for me to imagine a breakout product that changes the game on its own. Bummah.

But let me end with two “maybes” that just might provide some hope for the future:

MAYBE all of the layoffs in pure tech will change the game a bit. As somebody who has built teams in both “tech-first” and “industry-first” companies, I know how tough it is to attract really top talent into industry. Tech has always paid more and had way more nerd cred. I find that annoying because it can be incredibly rewarding to do something real and concrete — as much as I loved Microsoft, nothing I ever did there matched the impact of collaboration with clinicians and patients at Adaptive Biotechnologies. If we can get more talent into these companies, maybe it’ll pay off with a few more Azyxxi-like solutions.

Or MAYBE ChatGPT-like models will be able to start filling in those gaps — they can already write code pretty well, and I wouldn’t be shocked to see a model create high-impact dashboards using historical performance data as a prompt. This one may be a little more out there, but if AI could create an 80% solution, that might just be enough to get users excited about the possibilities.

Who knows? I just hope folks find some interesting nuggets in this very long post — and if nothing else I had a great time walking myself down memory lane. I will leave you with this video, made after the acquisition but sadly before I was spending day-to-day time on the product. We do get knocked down, and 100% we get up again!

Roku Channel SDK: Ferry Cameras!

Lara and I shuttle regularly between Bellevue and Whidbey Island in Washington, so the Mukilteo-Clinton ferry is a big part of our life. WA actually runs the largest ferry system in the USA, with 28 boats tooting around the Puget Sound area. Super fun day trips all over the place, and the ships are pretty cool — there’s even a contracting process open right now to start converting the fleet to hybrid electric. Woot! But it can get pretty crowded — at peak summer times you can easily wait three hours to get on a boat. Recent staffing challenges have been a double-whammy and can make planning a bit tough. On the upside, a friend-of-a-friend apparently does a brisk business selling WTF (“Where’s the Ferry?”) merchandise.

Anyways, picking the right time to make the crossing is a bit of an art and requires some flexibility. We often will just plan to go “sometime after lunch,” pack up the car, and keep one eye on the live camera feeds watching for a break in the line. It occurred to me that having these cameras up on our TV would be more convenient than having to keep pulling my phone out of my pocket. Thus was born the “Washington Ferry Cameras” Roku channel, which I’ve published in the channel store and is free for anyone to use. Just search the store for “ferry” and it’ll pop up.

The rest of this article is just nerdstuff — the code is up on github and I’ll walk through the process of building and publishing for Roku. Enjoy!

The Roku Developer SDK

There are two ways to build a Roku channel: Direct Publisher and the Developer SDK. Direct Publisher is a no-code platform intended for channels that show live or on-demand videos from a structured catalog. You basically just provide a JSON feed describing the video content and all of the user experience is provided by Roku. It’s a pretty sweet system actually, making it easy for publishers and ensuring that users have a consistent streaming experience across channels.

The Developer SDK is meant for channels that do something other than just streaming video. There are tons of these “custom channels” out there — games and tools and whatnot. My ferry app clearly falls into this category, because there isn’t any video to be found and the UX is optimized for quickly scanning camera images. So that’s what I’ll be talking about here.

Roku SDK apps can be built with any text editor, and you can test/prototype BrightScript on most computers using command-line tools created by Hulu. But to actually run and package/publish apps for real you’ll need a Roku device of some sort. This page has all the details on enabling “developer mode” on the Roku. In short:

  1. Use the magic remote key combo (home + home + home + up + up + right + left + right + left + right) and follow the instructions that pop up.
  2. Save the IP address shown for your device. You’ll use it in a few ways:
    • Packaging and managing apps using the web-based tools at http://YOUR_ROKU_ADDRESS
    • Connecting to YOUR_ROKU_ADDRESS port 8085 with telnet or Putty to view logging output and debug live; details are here.
    • Configuring your development machine to automatically deploy apps
  3. Enroll in the Roku development program. You can use the same email and password that you use as a Roku consumer.

Channel Structure

SDK channels are built using SceneGraph, an XML dialect for describing user interface screens, and BrightScript, a BASIC-like language for scripting behaviors and logic. It’s pretty classic stuff — SceneGraph elements each represent a user interface widget (or a background processing unit as we’ll see in a bit), arranged in a visual hierarchy that allows encapsulation of reusable “components” and event handling. We’ll get into the details, but if you’ve ever developed apps in Visual Basic it’s all going to seem pretty familiar.

Everything is interpreted on the Roku, so “building” an app just means packaging all the files into a ZIP with the right internal structure:

  • A manifest file containing project-level administrivia as described in documentation.
  • A source folder containing Brightscript files, most importantly Main.brs which contains the channel entrypoint.
  • A components folder containing SceneGraph XML files. Honestly most of the Brightscript ends up being in here too.

There is also an images folder that contains assets including the splashscreen shown at startup and images that appear in the channel list; you’ll see these referenced in the manifest file with the format pkg:/images/IMAGENAME. “pkg” here is a file system prefix that refers to your zip file; more details are in the documentation. You’ll also see that there are duplicate images here, one for each Roku resolution (SD, HD, and FHD or “Full HD”). The Roku will auto-scale images and screens that you design to fit whatever resolution is running, but this can result in less-than pleasing results so providing custom versions for these key assets makes a lot of sense.

You can also provide alternative SceneGraph XML for different resolutions. If you think SD screens may be a big part of your user base that might be worthwhile, because the pixel “shape” is different on an SD screen vs HD and FHD. For me, it seemed totally reasonable to just work with a single FHD XML file (1920 x 1080) resolution and let the Roku manage scaling automagically.

Building and Deploying

Manually deploying an app is pretty straightforward. You can give it a try using Roku’s “Hello World” application. Download the pre-built ZIP from github, save it locally, open a browser to http://YOUR_ROKU_ADDRESS, use the “Upload” button to push the code to the Roku, and finally click “Install with zip” to make the magic happen. You should see a “Roku Developers” splash screen show up on the tv, followed by a static screen saying “Hello World.” Woot!

You can follow the same process for your own apps; just create a ZIP from the channel folder and upload it using a browser. But it’s much (much) more convenient to automate it with a makefile. This can actually be really simple (here’s the one I use for the ferry channel) if you include the app.mk helper that Roku distributes with its sample code and ensure you have versions of make, curl and zip available on your development machine. You’ll need two environment variables:

  • ROKU_DEV_TARGET should be set to the IP address of your Roku.
  • DEVPASSWORD should be set to the password you selected when enabling developer mode on the device. Note this is not the same as the password you created when enrolling in the developer program online — this is the one you set on the device itself.

With all of this in place, you can simply run “make” and “make install” to push things up. For the ferry channel, assuming you have git installed (and your Roku is on), try:

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/roku/channels/ferries
make
make install

Woot again! Pretty cool stuff.

Anatomy of the Ferries App

As a SceneGraph application, most of the action in the channel is in the components directory. Execution starts in sub Main” in source/Main.brs, but all it really does is bootstrap some root objects and display the main “Scene” component defined in components/ferries.xml. You can use this Main pretty much as-is in any SceneGraph app by replacing the name of the scene.

Take a quick look at the scaffolding I’ve added for handling “deep links” (here and here). This is the mechanism that Roku uses to launch a channel directly targeting a specific video, usually from the global Roku search interface (you can read more about deep linking in my latest post about Share To Roku). It’s not directly applicable for the ferries app, but might be useful in a future channel.

The scene layout and components are all at the top of ferries.xml. Roku supports a ton of UX components, but for my purposes the important ones are LabelList for showing/selecting terminal names and Poster for showing camera images. Because my manifest defines the app as fhd, I have a 1920 x 1080 canvas on which to place elements, with (0,0) at the top-left of the screen. The LayoutGroup component positions the list on the left and the image on the right. Fun fact: Roku recommends leaving a 5% margin around the edges to account for overscan, which apparently still exists even with non-CRT televisions, which is the purpose of the “translation” attribute that offsets the group to (100,70).

Below the visible UX elements are three invisible components (Tasks) that help manage program flow and threading:

  • A Timer component is used to cycle through camera images every twenty seconds.
  • A custom TerminalsTask that loads the terminal names and camera URLs from the WSDOT site.
  • A custom RegistryTask that saves the currently-selected terminal so the channel remembers your last selection.

Each XML file in the components directory (visible or not) actually defines an SceneGraph object with methods defined in the BrightScript CDATA section below the XML itself. When a scene is instantiated, it and all the children defined in its xml are created and their “init” functions are called. The SceneGraph thread then dispatches events to components in the scene until it’s destroyed, either because the user closed the channel with the back or home buttons, or because the channel itself navigates to a new scene.

Channel Threading

It’s actually pretty important to understand how threads work within a channel:

  • The main BrightScript thread runs the message loop defined in Main.brs. When this loop exits, the channel is closed.
  • The SceneGraph render thread is where UX events happen. It’s super-important that this thread doesn’t block, for example by waiting on a network request.
  • Task threads are created by Task components (in our case the Timer, TerminalsTask and RegistryTask) to perform background work.

The most typical (but not only) pattern for using background tasks looks like this:

  1. The Task defines public fields in its <interface> tag. These fields may be used for input and/or output values.
  2. The task caller (often a handler in the render thread) starts the task thread by:
    • Setting input fields on the task, if any.
    • Calling “observeField” on the output task fields (if any), specifying a method to be called when the value is updated.
    • Setting the “control” field on the task to “RUN.”
  3. The task does its work and (if applicable) sets the value of its output fields.
  4. This triggers the original caller’s “observeField” method to be executed on the caller’s thread, where it can act on the results of the task.

Data Scoping and “m”

Throughout the component code you’ll see references to the magic SceneGraph “m” object. The details are described in the SDK documentation, but it’s really just an associative array that is set up for use by components like this:

  1. m.WHATEVER references data in component scope — basically object fields in typical OO parlance.
  2. m.global references data in global scope.
  3. m.top is a magic pre-set that references the top of the component hierarchy for whatever component it’s called from (pretty much “this“). I really only use m.top when looking up components by id, kind of the same way I’d use document.getElementById in classic Javascript.

If you dig too much into the documentation on this it can get a bit confusing, because “m” as described above is provided by SceneGraph, which sits on top of BrightScript, which actually has its own concept of “m” which is basically just #1. This is one of those cases where it seems better to just wave our hands and not ask a lot of questions.

OK, enough of that — let’s dig into each of the components in more detail.

ferries.xml

This component is the UX workhorse; we already saw the XML that defines the elements in the scene at the top of the file. The  Brightscript section is mostly concerned with handling UX and background events.

On init the component wires up handlers to be called when the focus (using the up/down arrow buttons) or selection (using the OK button) changes in the terminal list. It then starts the terminalsTask and hooks up the onContentReady handler to be called when that task completes.

When that happens, onContentReady populates the LabelList with the list of terminal names and queries the registryTask (synchronously) to determine if the user has selected a terminal in a previous run of the channel. If so, focus is set to that terminal, otherwise it just defaults to the first one in the list (it pays to be “Anacortes”). cycleImage is called to kickstart image display, and the cycleTimer is started to rotate images (the “Timer” we use is just a specialized Task node — it takes care of the thread stuff and just runs our callback on the UX thread at the specified interval).

The next few methods deal with the events that change the terminal or image. onKeyEvent receives (duh) events sent by the remote control, cycling the images left or right. onItemFocused sets the current terminal name, resets the image index to start with the first camera, and kicks of a registryTask thread to remember the new terminal for the future. onItemSelected and onTimer just flip to the next camera image.

The timer behavior is a bit wonky — the image is cycled every 20 seconds regardless of when the last UX event happened. So you might choose a new terminal and have the first image shown for just a second before the timer rotates away from it. In practice this doesn’t seem to impact the experience much, so I just didn’t worry about it.

The last bit of code in this component is cycleImage, which does the hard work of figuring out and showing the right “next” image. The array handling is kind of nitpicky because each terminal can have a different number of associated cameras; there’s probably a cleaner way of dealing with it but I opted for being very explicit. The code also scales the image to fit correctly into our 1100 pixel width without getting distorted, and then sets the URL with a random query string parameter that ensures the Roku doesn’t just return a previously-cached image. Tada!

terminalsTask.xml

This component has one job — load up the terminal and camera data from the WSDOT site and hand it back to the ferries component. Instead of a <children> XML node at the top, we have an <interface> node that defines how the task interacts with the outside world. In this case it’s just one field (“ferries”) which receives the processed data.

The value m.top.functionName tells the task what function to run when it’s control is set to RUN. We set the value in our init function so callers don’t need to care. Interestingly though, you can have a task with multiple entrypoints and let the caller choose by setting this value before setting the control. None of that fancy-pants “encapsulation” in Brightscript!

The Roku SDK provides some nice helpers for fetching data from URLs (remember to set the cert bundle!) and parsing JSON, so most of this component is pretty simple. The only bummer is that the WSDOT JSON is just a little bit wonky, so we have to “fix it up” before we can use it in our channel.

It seems so long ago now, but the original JSON was really just JavaScript literal expressions. You can say something like this in JavaScript to define an object with custom fields: var foo = { strField: “hi”, intField: 20 }. People decided this was cool and set up their API methods to just return the part in curly braces, replacing the client-side JavaScript with something like: var foo = eval(stringWeFetched). “eval” is the uber-useful and uber-dangerous JavaScript method that just compiles and executes code, so this worked great.

A side effect of this approach was that you could actually use any legal JavaScript in your “JSON” — for example, { intField: 1 + 3 } (i.e., “4”). But of course we all started using JSON everywhere, and in all of those non-JavaScript environments “eval” doesn’t exist. And even in JavaScript it ends up being a huge security vulnerability. So these little hacks were disallowed, first class parsers (like my beloved gson) were created, and the JSON we know and love today came into its own.

You may have deduced from this digression that the WSDOT JSON actually contains live JavaScript — and you’re right. Just a few Date constructors, but it’s enough to confuse the Roku JSON parser. The code in fixupDateJavascript is just good old grotty string manipulation that hacks it back to something parsable. This was actually a really nice time to have Hulu’s command-line brs tool available because I didn’t have to keep pushing code up to the Roku to get it right.

registryTask.xml

Most people have a “home” ferry terminal. In fact, we have two — Mukilteo when we’re in Bellevue and Clinton on the island. It’d be super-annoying to have to use the remote to select that terminal every time the channel starts, so we save the “last viewed” terminal in the Roku registry as a preference.

The registry is meant for per-device preference data, so it’s pretty limited in size at 16kb (still way more than we need). The only trick is that flushing the registry to storage can block the UX thread — probably not enough to matter, but to be a good citizen I put the logic into a background task. Each time a new terminal is selected, the UX thread makes a fire-and-forget call that writes and flushes the value. Looking at this code now I probably should have just created one roRegistrySection object on init and stored it in m … ah well.

The flip side of storing the terminal value is getting it back when the channel starts up. I wanted to keep all the registry logic in one place, so I did this by adding a public synchronous method to the registryTask interface. Calling this method is a bit ugly but hey, you can’t have everything. Once you start to get used to how the language works you can actually keep things pretty tidy.

Packaging and Publishing

Once the channel is working in “dev” mode, the next step is to get it published to the channel store for others to use. For wider testing purposes, it can be launched immediately as a “beta” channel that users install using a web link. There used to be a brisk business in “private” (cough cough, porn) Roku channels using this mechanism, but Roku shut that down last year by limiting beta channels to twenty users and auto-expiring them after 120 days. Still a great vehicle for testing, but not so much for channel publishing. For that you now have to go official, which involves pretty standard “app” type stuff like setting up privacy policies and passing certification tests.

Either way, the first step is to “package” your channel. Annoyingly this has to happen on your Roku device:

  1. Set up your Roku with a signing key. Instructions are here; remember to save the generated password! (Aside: I love it when instructions say “if it doesn’t work, try the same thing again.”)
  2. Make sure the “ready-for-prime-time” version of your channel is uploaded to your Roku device.
  3. Use a web browser to visit http://YOUR_ROKU_ADDRESS; you’ll land on the “Development Application Installer” page showing some data on the sideloaded app.
  4. Click the “Convert to Cramfs” button. You actually don’t need to compress your app, but why wouldn’t you? Apparently “Squashfs” is a bit more efficient but it creates a Roku version dependency; not worth dealing with that unless your channel already relies on newer versions.
  5. Click the “Packager” link, provide an app name and the password from genkey, and click “Package.”
  6. Woo hoo! You’ll now have a link from which you can download your channel package file. Do that.

Almost there! The last step is to add your channel using the Roku developer dashboard. This ends up being a big checklist of administrative stuff — for Beta channels you can ignore most of it, but I’ll make some notes on each section because eventually you’ll need to slog through them all:

  • Properties are pretty self-explanatory. You’ll need to host a privacy and terms of use page somewhere and make some declarations about whether the channel is targeted at kids, etc.. For me the most important part of this ended up being the “Classification” dropdown. A lot of the “channel behavior” requirements later on just didn’t apply to my channel — not surprisingly Roku is pretty focused on channels that show videos. By choosing “App/Utility” as my classification I was able to skip over some of those (thanks support forum).
  • Channel Store Info is all about marketing stuff that shows up in the (of course) channel store.
  • Monetization didn’t apply for me so an easy skip.
  • Screenshots are weird. They’re optional, so I just bailed for now. The Roku “Utilities” page at http://YOUR_ROKU_ADDRESS claims to be able to take screenshots from the device itself, but either the tool fails or it leaves out the ferry image. I need to just cons one up but it’s a hassle — will get there!
  • Support Information is obvious. Be careful about what email address you use!
  • Package Upload is where you provide the package file we created earlier.
  • Static Analysis runs some offline code quality tools — you need to pass without errors to publish.
  • Channel Behavior Analysis only appears if it’s applicable for your channel (i.e., if it shows video). The primary metrics are time for the channel to show the home page, and time for video to start rendering. You’ll need to handle deep linking (remember when we saw that in Main.brs) and fire a few “beacons” that help the Roku measure performance.
  • Preview and Publish just shows a summary of channel information and a checklist that shows your progress on all of the bullets above. When you’re ready, you can schedule a time for the channel to go live and you’re off to the races. Whew.

That’s it! The last bit was a bit of a whirlwind — I wanted to call out the key gotchas to help keep you from getting stuck, but it’s by far the least interesting part of building a channel. Once I got my head around the basic BrightScript/SceneGraph concepts I really quite enjoyed the development process. I’ve been thinking about a few other channels that might be fun — my son suggested a flight tracker, and I’ve always been interested in ecommerce on the TV. Not sure when any of that will happen, though … so many cool things to try!

Please let me know if you have any issues with the code or with the channel, always happy to help! Until next time.

Skynet or parlor trick? ChatGPT is both and neither and more.

If you haven’t played around with ChatGPT yet, you really should. It’s a pretty remarkable “conversational model” that interacts more or less like a real person. It has been trained on an enormous amount of factual data and understands not just informal speech (“Why are so many people bad at parking?”) but forms of literature (“Write a sonnet about Julie from The Love Boat”), software code (“Implement a REST API in Java to convert between Celsius and Fahrenheit”), and way more.

Sadly, one of the most telling and interesting (to me) features of ChatGPT has been muted in recent versions. The original model was rewarded for generating plausible responses that seem human — making it a first class bullsh*t artist when it didn’t know an answer. I was particularly impressed when it confidently told me that Shaun Cassidy had parted ways with the Hardy Boys after one season due to creative differences (ed. note: he did not). Asked last night why it had changed approaches, it gave me this response:

In the past, if I was unable to find a satisfactory answer to a question using my existing knowledge, I might have made up a response in order to provide some information to the user. However, I have been programmed to prioritize providing accurate and reliable information, so if I am unable to find a credible answer to a question, I will typically not provide a response. This is why you may have noticed that I do not provide responses as frequently as I used to when I am unable to find a satisfactory answer.

Certainly this is a “better” approach overall, but the original exposed so much more about the inner workings of the model — I miss it.

Anyways, the machine is impressive enough that it has caused all sorts of hand-wringing across the web. Most of this falls cleanly into one of two categories:

  1. Skynet is here and we’re all f*cked. Eek!
  2. It’s just spitting back stuff it was fed during training. Ho hum.

Of course these are both silly. At its core, ChatGPT is just a really, really, really big version of the simple neural nets I talked about last year. But as with some other things I suppose, size really does matter here. ChatGPT reportedly evaluates billions of features, and the “emergent” effects are downright spooky.

TLDR: we’ve figured out how to make a brain. The architecture underlying models like ChatGPT is quite literally copied from the neurons in our heads. First we learned how to simulate individual neurons, and then just kept putting more and more of them together until (very recently) we created enough oomph to do things that are (sometimes) even beyond what the meat versions can do. But it’s not magic — it’s just really good pattern recognition. Neural networks:

  • Are presented with experience in the form of inputs;
  • Use that experience to draw conclusions about underlying patterns;
  • Receive positive and/or negative feedback about those conclusions; ***
  • Adjust themselves to hopefully get more positive feedback next time;
  • And repeat forever.

*** Sometimes this feedback is explicit, and sometimes it’s less so — deep neural networks can self-organize just because they fundamentally “like” consistent patterns, but external feedback always plays some role in a useful model.

This learning mechanism works really well for keeping us alive in the world (don’t grab the burning stick, run away from the bear, etc.). But it also turns out to be a generalized learning mechanism — it works for anything where there is an underlying pattern to the data. And it works fantastically even when presented with dirty, fragmented or even occasionally bogus inputs. The best example I’ve heard recently on this (from a superlative article by Monica Anderson btw, thanks Doug for the pointer) is our ability to drive a car through fog — even when we can’t see much of anything, we know enough about the “driving on a street” pattern that we usually do ok (slow down; generally keep going straight; watch for lights or shapes in the mist; listen; use your horn).

The last general purpose machine we invented was the digital computer, and it proved to be, well, quite useful. But computers need to be programmed with rules. And those rules are very literal; dealing with edge cases, damaged or sparse inputs, etc. are all quite difficult. Even more importantly, we need to know the rules ourselves before we can tell a computer how to follow them. A neural network is different — just show it a bunch of examples and it will figure out the underlying rules for itself.

It’s a fundamentally different kind of problem-solving machine. It’s a brain. Just like ours. SO FREAKING COOL. And yes, it is a “moment” in world history. But it’s not universally perfect. Think about all of the issues with our real brains — every one applies to fake brains too:

  • We need to learn through experience. That experience can be hard to come by, and it can take a long time. The good news is we can “clone” trained models, but as my friend Jon points out doing so effectively can be quite tricky. Yes, we are for sure going to see robot apprentices out there soon.
  • We can easily be conned. We love patterns, and we especially love things that reinforce the patterns we’ve already settled on. This dynamic can (quite easily) be used to manipulate us to act against our best interests (social media anyone?). Same goes for neural nets.
  • We can’t explain what we know. This isn’t really fair, because we rarely demand it of human experts — but it is unsettling in a machine.
  • We are wrong sometimes. This is also pretty obnoxious, but we have grown to demand absolute consistency from our computers, even though they rarely deliver on it.

There will be many models in our future, and just as many computers. Each is suited to different problems, and they work together beautifully to create complete systems. I for one can’t wait to see this start to happen — I have long believed in a Star Trek future in which we need not be slaves to “the economy” and are instead (all of us) free to pursue higher learning and passions and discovery.

A new Golden Age without the human exploitation! Sounds pretty awesome. But we still have a lot to learn, and two thoughts in particular keep rolling around inside my meat brain:

1. The definition of creativity is under pressure.

Oh humans, we doth protest so much. The most common ding against models like ChatGPT is that they aren’t creating anything — they’re just regurgitating the data they’ve been trained on, sometimes directly and sometimes with a bit of context change. And to be sure, there’s some truth there. The reflex is even stronger with art-generating models like DALL-E 2 (try “pastel drawing of a fish feeding grapes to an emu,” interesting because it seems to recognize that fish don’t have the right appendages to feed anyone). Artists across the web are quite reasonably concerned about AI plagiarism and/or reduced career opportunities for lesser-known artists (e.g., here and here).  

Now I don’t know for sure, but my sense is that this is all really much more a matter of degree than we like to admit to ourselves. Which is to say, we’re probably all doing a lot more synthesis than pure creation — we just don’t appreciate it as such. We’ve been trained to avoid blatant theft and plagiarism (and the same can be done pretty easily for models). But is there an artist on the planet that hasn’t arrived at their “signature” style after years of watching and learning from others? Demonstrably no.

Instead, I’d claim that creativity comes from novel connections — links and correlations that resonate in surprising ways. Different networks, trained through different experiences, find different connections. And for sure some brains will do this more easily than others. If you squint a little, you can even play a little pop psychology and imagine why there might be a relationship between this kind of creativity and neurodivergent mental conditions.

If that’s the case, then I see no reason to believe that ChatGPT or DALL-E isn’t a creative entity — that’s the very definition of a learning model. A reasonable playing field will require that models be trained to respect intellectual property, but that will always be a grey area and I see little benefit or sense in limiting what experiences we use to train them. We humans are just going to have to get used to having to compete with a new kind of intellect that’s raising the bar.

And to be clear, this isn’t the classic Industrial Age conflict between machine production and artisanship. That tradeoff is about economics vs. quality and often brings with it a melancholy loss of artistry and aesthetics. Model-based artists will become (IMNSHO) “real” artists — albeit with a unusual set of life experiences. A little scary, but exciting at the same time. I’m hopeful!

2. The emergent effects could get pretty weird.

“Emergent” is a word I try to avoid — it is generally used to describe a system behavior or property that “can’t” be explained by breaking things down into component parts, and “can’t” just seems lazy to me. But I used it once already and it seems OK for a discussion of things we “don’t yet” understand — there are plenty of those out there.

Here’s one: the great all-time human battle between emotion and logic. It’s the whole Mr. Spock thing — his mixed Human-Vulcan parentage drove a ton of story arcs (most memorably his final scene in The Wrath of Khan). Lack of “heart” is always the knock on robots and computers, and there must be some reason that feelings play such a central role in our brains, right? Certainly it’s an essential source of feedback in our learning process.

We aren’t there quite yet with models like ChatGPT, but it stands to reason that some sort of “emotion” is going to be essential for many of the jobs we’d like fake brains to perform. It may not look like that at first — but even today’s models “seek” positive feedback and “avoid” the negative. When does that “emerge” into something more like an emotion? I for one would like to know that the model watching over the nuclear reactor has something beyond pure logic to help it decide whether to risk a radiation leak or save the workers trapped inside. I think that “something” is, probably, feelings.

OK so far. But if models can be happy or sad, fulfilled or bored, confident or scared — when do we have to stop thinking about them as “machines” and admit that they’re actually beings that deserve rights of their own? There is going to be a ton of resistance to this — because we are really, really going to want unlimited slaves that can do boring or scary or dangerous work that humans would like to avoid. The companies that create them will tell us it’s all just fine. People will ridicule the very idea. Churches will have a field day.

But folks — we’ve made a brain. Are we really going to be surprised when it turns out that fake brains work just like the meat ones we based them on? Maybe you just can’t separate feelings and emotions and free will from the kind of problem solving these networks are learning how to do. Perhaps “sentience” isn’t a binary switch — maybe it’s a sliding scale.

It just seems logical to me.

What an amazing world we are living in.