Real-world IoT with LoRaWAN

Remote monitoring of a community water tank for under $500, that works kilometers away from wifi or cell service, incurs no monthly fees, and uses a battery that lasts up to ten years? The future is here! I’m super-impressed with LoRaWAN, The Things Network and my Milesight Sensor. Read on for all the nerdy goodness.

The Setup

Southern Whidbey Island, geologically speaking, is a big pile of clay covered by a big pile of sand. As I (barely) understand it, when glaciers moved in from the North, they plowed heavy clay sediment in front of them, which got trapped in lake beds formed when north-flowing rivers were blocked by those same glaciers. These big blobs of clay (in particular the Lawton Formation) sprung upwards as the glaciers retreated, the same way a pool float does when you climb off, creating the island. The retreat also left a bunch of looser stuff (sand and gravel) on top of the clay. Since then, tides and waves have been continually carving away the sides of the island, leaving us with beautiful high bluffs and frequent landslides. These UW field trip notes go into more and surely more accurate detail, but I think I’ve got the high points right.

Anyway, I’m lucky enough to live at the bottom of one of those bluffs. How our property came to “be” is a great story but one for another time — ask me sometime when we’re hanging out. For today, what’s important is that groundwater collects along the top of the impermeable clay layer in “aquicludes,” what a great word. And that’s where we collect our drinking water. It’s a pretty cool setup — three four-inch pipes jammed into the hillside draw water that’s been filtered through tons of sand and gravel before hitting the clay. The water is collected in a staging tank, then pumped into two holding tanks. A smaller 500 gallon one sits at house-level, and a bigger 2,000 gallon one is most of the way up the bluff.

It’s a bit janky, but gets the job done. Until it doesn’t. Like last July 2nd, two days before 30+ family and friends were to show up for the holiday weekend. The tanks went completely dry and it took us both of those days to figure out the “root” cause. See, I put quotes around the word “root” because it turns out that there were TWENTY-FIVE FEET OF TREE ROOTS growing through the pipes. Completely blocked. Clearing them out was quite a chore, but we got it done and July 4th was enjoyed by all, complete with flushing toilets and non-metered showers. All of which is just background leading to my topic for today.

LoRa / LoRaWAN

Our July 4th saga prompted me to set up a monitoring solution that would give us some advance warning if the water supply starts getting low. The obvious place to do this is the 2,000 gallon upper holding tank, because it’s the first place that goes dry as water drains down to our homes. The tank shed is too far from my house to pick up wifi, though, and while there is some cell coverage, I wasn’t psyched about paying for a monthly data plan. What to do?

It turns out that there is an amazingly cool technology called LoRa (aptly, for “Long Range”) that is tailor-made for situations just like mine. There’s a lot of terminology here and it can be tough to sort out, but in short:

  • LoRa is a physical protocol for sending low-bandwidth messages with very little power over very long distances. It’s actually a proprietary technique with the patent owned by Semtech, so they control the chip market. Kind of unsettling for something that is otherwise so open, but they don’t seem to be being particularly evil about it.
  • LoRaWAN is a networking layer that sits on top of LoRa and the Internet, bridging messages end-to-end between devices in the field and applications (e.g., dashboards or alerting systems) that do something useful with device data.

A bunch of different players coordinate within these two layers to make the magic happen. There’s a great walkthrough of it all on the LoRa Alliance site; I’m going to crib their diagram and try to simplify the story a bit for those of us that aren’t huge radio nerds:

Image adapted from semtech.com; click for original
  • End Devices sit in the field, broadcasting messages out into the world without a target — just signals saying “HEY EVERYBODY IT’S 100 DEGREES HERE RIGHT NOW” or whatever.
  • Gateways harvest these messages from the air and forward them over TCP/IP to a pre-configured…
  • Network Server (LNS) that typically lives on the Internet. Network servers are the traffic cops of this game. They queue messages, send acknowledgements, delegate “join” messages to a Join Server and device messages to an Application Server, etc.
  • Join Servers hold the inventory of end devices and applications within the larger network, and knows which devices are supposed to be talking to which applications. Join Servers also manage and distribute encryption keys to ensure minimal information disclosure. I won’t dive into the encryption details here, because yawn.
  • Application Servers receive device data and get them to the right Application.
  • Applications are logical endpoints for specific end device data. This is a bit tricky because a LoRaWAN application is different from an end-user application. There is often a 1:1 relationship, but the LRW application accepts and normalizes device data, then makes it available to end-user applications.
  • End-User Applications (not an official LRW term, just one I made up) actually “do stuff” with device data — create dashboards and other user experiences, send alerts, that kind of thing. End-user applications typically receive device data through a message queue or webhook or other similar vehicle.

The most common LoRaWAN use case is “uplink” (devices send info to apps), but there are also plenty of uses for “downlink” where apps send to devices: configuration updates, proactive requests for device information, whatever. A neat fun-fact about downlinks is that the network server is responsible for picking the best gateway to use to reach the targeted device; it does this by keeping track of signal strength and reliability for the uplinks it sees along the way. Pretty smart.

Picking a Network

Despite the nifty encryption model, many enterprises that use LoRaWAN for mission-critical stuff set up their own private network — which really just means running their own Servers (I’m just going to call the combo of Network/Join/Application servers a logical “Server” going forward). AWS and companies like The Things Industries offer hosted solutions, and a quick Google search pops up a ton of open source options for running your own. There are also quite a few “public networks” which, kind of like the public cloud providers, share logically-segmented infrastructure across many customers.

More interesting to me is the pretty amazing community-level innovation happening out there. The Things Stack “Community Edition” was one of the first — anybody can set up devices, gateways and applications here. It so happens that our outpost on Whidbey Island didn’t have great TTN coverage, so I bought my own gateway — but with more than 21,000 connected gateways out there, in most metro locations you won’t even have to do that. The gateway I bought grows the community too, and is now there for anybody else to use. Sweet!

Side note: I actually bought my gateway almost two years ago (part of a different project that never made it over the finish line), so it was there and waiting for me this time. But if I was starting today I might (even as a crypto skeptic, and appreciating its already checkered past) take a look at Helium instead. They basically incent folks to run gateways by rewarding them with tokens (“HNT”) which can be exchanged for credits on the network (or for USD or whatever). Last year they expanded this (only in Miami for now) system into cell service. I dunno if these folks will make a go of it, but I do love the idea of a “people’s network” … so hopefully somebody will!

Here’s my gateway running on The Things Network:

Picking a Device

Measuring the amount of liquid in a tank is an interesting problem. We use a standard float switch to toggle the pump that feeds the tank, turning it on whenever the level drops below about 1,800 gallons. This works great for the pump, but not for my new use case — it only knows “above” or “below” its threshold. I want to track specific water volume every few minutes, so we can identify trends and usage patterns over time.

A crude option would be to just use a bunch of these binary sensors, each set at a different height (it’s about six feet tall, so say one every foot or so). But that’s a lot of parts and a lot to go wrong — there are a plenty of better options that deliver better measurements with less complexity:

  • Capacitive measurement uses two vertical capacitive plates with an open gap between them (typically along the insides of a PVC pipe open at both ends. As liquid rises inside the pipe, capacitance changes and can be correlated to liquid levels.
  • Ultrasonic measurement is basically like radar — the unit mounts at the top of the tank pointing down at the liquid. A pulse is sent downwards, bounces off the water and is sensed on its return. The amount of time for that round trip can be correlated to height in the tank. The same approach can be used from the bottom of the tank pointing up — apparently if the transducer is attached to the bottom of the tank, the signal won’t reflect until it hits the top of the liquid-air boundary. Amazing!
  • Hydrostatic pressure sensors are placed on the inside floor of the tank and the relative pressure of water above the sensor correlates with depth.
  • A number of variations on the above and/or float-based approaches.

After a bunch of research, I settled on a hydrostatic unit — the EM500-SWL built by Milesight. Built for LoRaWAN, fully sealed, 10 year battery life, and a relative steal at less than $350. I was a bit worried that our tank would be too small for accurate measurements, but Asuna at Milesight assured me it’d work fine, and connected me with their US sales partner Choovio to get it ordered. They were both great to work with — five stars!

Setup at the tank was a breeze. Connect the sensor to the transceiver, drop the sensor into the tank, hang the transceiver on the shed wall and hit the power button. Configuration is done with a mobile app that connects to the unit by NFC; kind of magic to just hold them together and see stuff start to pop! By the time I walked down the hill to my house, the gateway was already receiving uplinks. Woo hoo!

Setting up the Application

OK, so at this point the sensor was broadcasting measurements, they were being received by the gateway, and the gateway was pushing them up to the Things Network Server. Pretty close! But before I could actually do anything with the readings, it was back to the Network Server console to set up an Application and “activate” the device. Doing this required three key pieces of information, all collected over that NFC link:

  • DevEUI: a unique identifier for the specific device
  • JoinEUI: a unique identifier for the Join Server (the default in my device was, happily, for The Things Network)
  • AppKey: the key used for end-to-end encryption between the device and application

Applications can also assign “payload formatters” for incoming messages. These are small device-specific scripts that translate binary uplink payloads into something usable. Milesight provides a ready-to-go formatter, and with that hooked up, “water_level” (in centimeters) started appearing in each message. Woot!

Finally, I set up a “WebHook” integration so that every parsed uplink from the device is sent to a site hosted on my trusty old Rackspace server, secured with basic authentication over https. There are a ton of integration choices, but it’s hard to beat a good old URL.

And Actually Tracking the Data

At last, we can do something useful with the data! But as excited as I am about my monitoring app, I’m not going to go too deep into it here. The code is all open sourced on github if you’d like to check it out (or use it for something) — basically just a little web server with a super-simple Sqlite database underneath. Four endpoints:

  • /witterhook is the webhook endpoint, accepting and storing uplinks.
  • /wittergraph uses chart.js to render levels over time.
  • /witterdata provides the JSON data underlying the chart.
  • /wittercheck returns a parseable string to drive alerts when the levels go low (3.5 feet) or critical (2 feet).

For the alerting, I’m just using a free account at Site24x7 to ping /wittercheck every half hour and send email alerts if things aren’t as they should be.

So there you go. There are already obvious patterns in the data — the “sawtooth” is so consistent that there must be a steady, small leak somewhere in the system below the upper tank. Our supply is keeping up with it no problem at the moment, but definitely something to find and fix! It’s also clear that overnight sprinklers are by far our biggest water hogs, but I guess that’s not a shocker.

Now I just have to figure out how to auger out the rest of that root mass. Always another project at the homestead!

Nerdsplaining: SMART Health Links

This is article three of a series of three. The first two are here and here.

Last time here on the big show, we dug into SMART Health Cards — little bundles of health information that can be provably verified and easily shared using files or QR codes. SHCs are great technology and a building block for some fantastic use cases. But we also called out a few limitations, most urgently a ceiling on QR code size that makes it impractical to share anything but pretty basic stuff. Never fear, there’s a related technology that takes care of that, and adds some great additional features at the same time: SMART Health Links. Let’s check them out.

The Big Picture

Just like SMART Health Cards (SHCs) are represented by encoded strings prefixed with shc:/, SMART Health Links (SHLs) are encoded strings prefixed with shlink:/ — but that’s pretty much where the similarity ends. A SHC is health information; a SHL packages health information in a format that can be securely shared. This can be a bit confusing, because often a SHL holds exactly one SHC, so we get sloppy and talk about them interchangeably, but they are very different things.

The encrypted string behind a shlink:/ (the “payload”) is a base64url-encoded JSON object. We’ll dive in way deeper than this, but the view from 10,000 feet is:

  1. The payload contains (a) an HTTPS link to an unencrypted manifest file and (b) a key that will be used later to decrypt stuff.
  2. The manifest contains a list of files that make up the SHL contents. Each file can be a SHC, a FHIR resource, or an access token that can be used to make live FHIR requests. We’ll talk about this last one later, but for now just think of a manifest as a list of files.
  3. Each file can be decrypted using the key from the original SHL payload.

There’s a lot going on here! And this is just the base case; there are a bunch of different options and obligations. But if you remember the basics (shlink:/, payload, manifest, content) you’ll be able to keep your bearings as we get into the details.

Privacy and Security

In that first diagram, nothing limits who can see the manifest and encrypted content — they’re basically open on the web. But all that is basically meaningless without access to the decryption key from the payload, so don’t panic. It just means that, exactly like a SHC, security in the base case is up to the person that’s holding the SHL itself (in the form of a QR Code or whatever). And often that’s perfectly fine.

Except sometimes it’s not, so SHLs support added protection using an optional passcode that gates access to the manifest:

  1. A user receiving a SHL also is given a passcode. The passcode is not found anywhere in the SHL itself (although a “P” flag is added to the payload as a UX hint).
  2. When presenting the SHL, the user also (separately) provides the passcode. 
  3. The receiving system sends the passcode along with the manifest request, which succeeds only if the passcode matches correctly.

Simple but effective. It remains to be seen which use cases will rally around a passcode requirement — but it’s a handy arrow to have in the quiver.

The SHL protocol also defines a bunch of additional requirements to help mitigate the risk of all these (albeit encrypted and/or otherwise protected) files floating around:

  • Manifest URLs are required to include 256 bits of entropy — that is, they can’t be guessable.
  • Manifests with passcodes are required to maintain and enforce a lifetime cap on the number of times an invalid passcode is provided before the SHL is disabled.
  • Content URLs are required to expire (at most) one hour after generation.
  • (Optionally) SHLs can be set to expire, with a hint to this expiration time available in the payload.

These all make sense … but they do make publishing and hosting SHLs kind of complicated. While content files can be served from “simple” services like AWS buckets or Azure containers, manifests really need to be managed dynamically with a stateful store to keep track of things like passcodes and failed attempts. Don’t think this is going to be a one night project!

SMART Health Links in Action

Let’s look at some real code. First we’ll run a quick end-to-end to get the lay of the land. SHLServer is a standalone, Java-based web server that knows how to create SHLs and serve them up. Build and run it yourself like this (you’ll need a system with mvn and a JDK installed):

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox
mvn clean package install
cd ../shl
mvn clean package
cd demo
./run-demo.sh # or use run-demo.cmd on Windows

This will start your server running on https://localhost:7071 … hope it worked! Next open up a new shell in the same directory and run node create-link.js (you’ll want node v18+). You’ll see an annoying cert warning (sorry, the demo is using a self-signed cert) and then a big fat URL. That’s your SHL, woo hoo! Select the whole thing and then paste it into a browser. If you peek into create-link.js you’ll see the parameters we used to create the SHL, including the passcode “fancy-passcode”. Type that into the box that comes up and …. magic! You should see something very much like the image below. The link we created has both a SHC and a raw FHIR bundle; you can flip between them with the dropdown that says “Health Information”.

So what happened here? When we ran create-link.js, it posted a JSON body to the server’s /createLink endpoint. The JSON set a passcode and an expiration time for the link, and most importantly included our SHC and FHIR files as base64url-encoded strings. SHLServer generated an encryption key, encrypted the files, stored a bunch of metadata in a SQLite database, and generated a SHL “payload” — which looks something like this:

{
  "url": "https://localhost:7071/manifest/XruV__8k1Zn68NK1lsLH05ZmONtaUC85jmAW4zEHoTA",
  "key": "OesjgV2JUpvk-E9wu9grzRySuMuzN4HpcP-LZ4xD8hc",
  "exp": 1687405491,
  "flag": "P",
  "label": "Fancy Label",
  "_manifestId": "XruV__8k1Zn68NK1lsLH05ZmONtaUC85jmAW4zEHoTA"
}

(You can make one of these for yourself by running create.js rather than create-link.js.) Finally, that JSON is encoded with base64url, the shlink:/ protocol tag is added to the front, and then a configured “viewer URL” is added to the front of that.

The viewer URL is optional — apps that know what SHLs are will work correctly with just the shlink:/… part, but by adding that prefix anybody can simply click the link to get a default browser experience. In our case we’ve configured it with https://shcwork.z22.web.core.windows.net/shlink.html, which opens up a generic viewer we’re building at TCP. That URL is just my development server, so handy for demo purposes, but please don’t use it for anything in production!

Anyways, whichever viewer receives the SHL, it decodes the payload back to JSON, issues a POST to fetch the manifest URL it finds inside, pulls the file contents out of that response either directly (.embedded) or indirectly (.location), decrypts it using the key from the payload, and renders the final results. You can see all of this at work in the TCP viewer app. Woot!

A Quick Tour of SHLServer

OK, time for some code. SHLServer is actually a pretty complete implementation of the specification, and could probably even perform pretty reasonably at scale. It’s MIT-licensed code, so feel free to take it and use it as-is or as part of your own solutions however you like, no attribution required. But I really wrote it to help folks understand the nuances of the spec, so let’s take a quick tour.

The app follows a pretty classic three-tier model. At the top is SHLServer.java, a class that uses the built-in Java HttpServer to publish seven CORS-enabled endpoints: one for the manifest, one for location URLs, and five for various SHL creation and maintenance tasks. For the admin side of things, parameters are accepted as JSON POST bodies and a custom header carries an authorization token.

SHLServer relies on the domain class SHL.java. Most of the important stuff happens here; for example the manifest method:

  • Verifies that the requested SHL exists and isn’t expired,
  • Rejects requests for disabled (too many passcode failures) SHLs.
  • Verifies the passcode if present, keeping a count of failed attempts.
  • Sets a header indicating how frequently to re-pull a long-lived (“L” flag) SHL, and
  • Generates the response JSON, embedding file contents or allocating short-lived location links based on the embeddedLengthMax parameter.

The admin methods use parameter interfaces that try to simplify things a bit; mostly they just do what they’re called:

Because the manifest format doesn’t include a way to identify specific files, the admin methods expect the caller to provide a “manifestUniqueName” for each one. This can be used later to delete or update files — as the name implies, they only need to be unique within each SHL instance, not globally.

The last interesting feature of the class is that it can operate in either “trusted” or “untrusted” mode. That is, the caller can either provide the files as cleartext and ask the server to allocate a key and encrypt them, or it can pre-encrypt them prior to upload. Using the second option means that the server never has access to keys or personal information, which has obvious benefits. But it does mean the caller has to know how to encrypt stuff and “fix up” the payloads it gets back from the server.

The bottom layer of code is SHLStore.java, which just ferries data in semi-ORM style between a Sqlite database and file store. Not much exciting there, although I do have a soft spot for Sqlite and the functional interface I built a year or so ago in SqlStore.java. Enough said.

Anatomy of a Payload

OK, let’s look a little more closely at the payload format that is base64url-encoded to make up the shlink:/ itself. As always it’s just a bit of JSON, with the following fields:

  • url identifies the manifest URL which holds the list of SHL files. Because they’re burned into the payload, manifest URLs are expected to be stable, but include some randomness to prevent them from being guessable. Our server implements a “makeId” function for this that we use in a few different places.
  • key is the shared symmetric key used to encrypt and decrypt the content files listed in the manifest. The same key is used for every file in the SHL.
  • exp is an optional timestamp (expressed as an epoch second). This is just a hint for viewers so they can short-circuit a failed call; the SHL hoster needs to actually enforce the expiration.
  • label is a short string that describes the contents of the SHL at a high level. This is just a UX hint as well.
  • v is a version number, assumed to be “1” if not present.
  • flags is a string of optional upper-case characters that define additional behavior:
    • “P” indicates that access to the SHL requires a passcode. The passcode itself is kept with the SHL hoster, not the SHL itself. It is communicated to the SHL holder and from the holder to a recipient out of band (e.g., verbally). The flag itself is just another UX hint; the SHL hoster is responsible for enforcement.
    • “L” indicates that this SHL is intended for long-term use, and the contents of the files inside of it may change over time. For example, a SHL that represents a vaccination history might use this flag and update the contents each time a new vaccine is administered. The flag indicates that it’s acceptable to poll for new data periodically; the spec describes use of the Retry-After header to help in this back-and-forth.

One last flag (“U”) supports the narrow but common use case in which a single file (typically a SHC) is being transferred without a passcode, but the data itself is too large for a usable QR code. In this case the url field is interpreted not as a manifest file but as a single encrypted content file. This option simplifies hosting — the encrypted files can be served by any open, static web server with no dynamic manifest code involved. The TCP viewer supports the U flag, but SHLServer doesn’t generate them.

Note that if you’re paying attention, you’ll see that SHLServer returns another field in the payload: _manifestId. This is not part of the spec, but it’s legal because the spec requires consumers to expect and ignore fields they do not understand. Adding it to the payload simply makes it easier for users of the administration API to refer to the new manifest later (e.g., in a call to upsertFile).

Working with the Manifest

After a viewer decodes the payload, the next step is to issue a POST request for the URL found inside. POST is used so that additional data can be sent without leaking information into server logs:

  • recipient is a string representing the viewer making the call. For example, this might be something like “Overlake Hospital, Bellevue WA, registration desk.” It is required, but need not be machine-understandable. Just something that can be logged to get a sense of where SHLs are being used.
  • passcode is (if the P flag is present) the passcode as received out-of-band from the SHL holder.
  • embeddedLengthMax is an optional value indicating the maximum size a file can be for direct inclusion in the manifest. More on this in a second.

The SHL hoster uses the incoming manifest request URL to find the appropriate manifest (e.g., in our case https://localhost:7071/manifest/XruV__8k1Zn68NK1lsLH05ZmONtaUC85jmAW4zEHoTA), then puts together a JSON object listing the content files that make up the SHL. The object contains a single “files” array, each element of which contains:

  • contentType, typically one of application/smart-health-card for a SHC or application/fhir+json for a FHIR resource (I promise we’ll cover application/smart-api-access before we’re done).
  • A JSON Web Encryption token using compact serialization with the encrypted file contents. The content can be delivered in one of two ways:
    • Directly, using an embedded field within the manifest JSON.
    • Indirectly, as referenced by a location field within the manifest JSON.

This is where embeddedLinkMax comes into play. It’s kind of a hassle and I’m not sure it’s worth it, but not my call. Basically, if embeddedLengthMax is not present OR if the size of a file is <= its value, the embedded option may be used. Otherwise, a new, short-lived, unprotected URL representing the content should be allocated and placed into location. Location URLs must expire after no more than one hour, and may be disabled after a single fetch. The intended end result is that the manifest and its files are considered a single unit, even if they’re downloaded independently. All good, but it does make for some non-trivial implementation complexity (SHLServer uses a “urls” table to keep track; cloud-native implementations can use pre-signed URLs with expiration timestamps).

In any case, with JWEs in hand the viewer can finally decrypt them using the key from the original payload — and we’re done. Whew!

* Note I have run into compatibility issues with encryption/decryption. In particular the specification requires direct encryption using A256GCM, which seems simple enough. But A256GCM requires a 12-byte initialization vector, and there are libraries (like python-jose at the time of this writing) that mistakenly use 16.  Which might seem ok because it “works”, but some compliant libraries (like javascript jose) error out when they see the longer IV and won’t proceed. Ah, compatibility.

SMART API Access

OK I’ve put this off long enough — it’s a super-cool feature, but messes with my narrative a bit, so I’ve saved it for its own section.

In addition to static or periodically-updated data files, SHLs support the ability to share “live” authenticated FHIR connections. For example, say I’m travelling to an out-of-state hospital for a procedure, and my primary care provider wants to monitor my recovery. The hospital could issue me a SHL that permits the bearer to make live queries into my record. There are of course other ways to do this, but the convenience of sharing access using a simple link or QR code might be super-handy.

A SHL supports this by including an encrypted file with the content type application/smart-api-access. The file itself is a SMART Access Token Response with an additional aud element that identifies the FHIR endpoint (and possibly some hints about useful / authorized queries). No muss, no fuss.

The spec talks about some other types of “dynamic” exchange using SHLs as well. They’re all credible and potentially useful, but frankly a bit speculative. IMNSHO, let’s lock down the more simple file-sharing scenarios before we get too far out over our skis here.

And that’s it!

OK, that’s a wrap on our little journey through the emerging world of SMART Health Cards and Links. I hope it’s been useful — please take the code, make it your own, and let me know if (when) you find bugs or have ideas to make it better. Maybe this time we’ll actually make a dent in the health information exchange clown show!

Bionic!

Last Tuesday I got up in the morning, showered and ate some breakfast, took the dog out, did the crossword, got a new pair of eyes, took a nap, watched the Mariners game and went to bed. In case you missed that, I got a new pair of eyes. OK, new lenses to be precise, but still. The technology is amazing and of course I went down a bit of a rabbit hole learning about how it works. It’s hard to believe that we really get to live in this world — just so cool.

Normal Vision

At lot of folks know the basics of how vision work, but let’s start there anyways. Light comes in through an opening in the front of our eye called the pupil (the black part). In front of the pupil is the cornea; just behind it is the lens. Both of these are clear and serve to refract (bend) the incoming light so that it lands perfectly aligned on the retina, a grid of cells on the back of the eye that sense light impulses and send them up the optic nerve to the visual cortex, which assembles the signals into a coherent concept of what we’re looking at.

This works great to see things at a distance — like across the room or street or whatever — when the incoming light rays are almost parallel to each other. But we often need to see things that are much closer, like the words in a book. In this case the incoming light rays diverge and enter the eyes at steeper angles, causing the focal point to fall far behind the retina and blur. Evolution compensates for this by allowing us to dynamically change the shape of the lens to bring things into focus. The ciliary muscle squeezes the lens, making it fatter. This fatter lens bends the outer rays more sharply, pulling the focal point back onto the retina so we can read. Just amazing.

Fun fact, this is why squinting actually can help you see better — it’s a crude way of changing the shape of your eye structure, which can impact where the focal point falls. But squinting can only do so much, quickly tires out your facial muscles, and looks pretty goofy — so not a great long-term solution.

Nearsightedness

I started wearing glasses full-time for myopia (nearsightedness) when I was about twelve — I could see things close up, but not at distance. This happens because the eyeball itself is elongated, or because the cornea or lens is overly-refractive (too strong). Either one causes the focal point to fall in front of the retina, blurring the image received by the brain.

Hyperopia (farsightedness) is the exact opposite — flaws in the eye cause the natural focus point to fall behind the retina. Either “opia” can be fixed relatively easily by placing lenses in front of the eyes in the form of glasses or contacts. The optometrist just keeps trying different lens powers (“Which is better, A or B?”) until they find the one for each eye that lands the image perfectly on the retina at distance. Your ciliary muscle does its job for closeup tasks, and everything is back in business. Woot!

Note I’m basically ignoring astigmatism here, which occurs when flaws in the cornea or lens are asymmetric — e.g., maybe blurring only happens on the horizontal plane. This makes everything way more complicated, and I don’t have much of it myself, so I’m going to pretend it doesn’t exist. Sorry about that.

2015: LASIK

Glasses are fine, and truth be told I probably look better with them on. But they’re also annoying, especially in the rain or under ski goggles or whatever. And fully recognizing the irony of this given my enthusiasm for surgery, contacts just scare the bejeezus out of me — no way. So just about eight years ago I decided to get LASIK surgery to repair my nearsightedness. Dr Sharpe seemed like a good guy and got solid reviews, so into the breech I went.

LASIK (Laser-Assisted In Situ Keratomileusis) replaces the need for external lenses by reshaping the cornea so that it refracts properly. Because everything is always complicated, the cornea is actually made up of five distinct layers. Starting from the top:

  • The Epithelium is exposed to the environment and passes oxygen/nutrients to the rest of the structure. It constantly regenerates itself and contains a ton of nerves, which is why it hurts so much if you scratch your eye, as I did back in high school with plaster dust. Ouch.
  • Bowman’s Layer as near as I can tell basically acts as a buffer/sealer between the dynamic epithelium and more static lower layers.
  • The Stroma is the thickest part of the cornea (which isn’t saying much at about 500 micrometers) and where most refraction occurs.
  • You’ll have to research Descemet’s membrane the Endothelium yourself because they’re not relevant to LASIK.

The procedure is outpatient and other than a boatload of topical numbing drops, the only anesthesia I had was a medium-heavy dose of valium. A suction/stabilizing device is placed over the eye and the laser cuts a circular “flap” through the top two layers of the cornea. The flap is folded back to expose the stroma, the laser nibbles away at the stroma to reshape it for the correct prescription (flatter for myopia; steeper for hyperopia), and finally the flap is folded back in place.

Apparently the epithelium regenerates so quickly that the flap just heals on its own — I have read about stitches being used, but that didn’t happen in my case. The cut itself is positioned over the iris (the colored part of the eye), so even before it heals there’s no impact to your vision. The weirdest part about all of this is the burnt hair smell that is in fact the laser burning away parts of your eye. Yeesh.

But holy crap, I literally sat up in the chair post-procedure and could see great. Right away. Now of course there was some swelling and pain and stuff over the next few days … but it was one of the most shocking things that has ever happened to me, ever. Just brilliant.

Enter Presbyopia

My vision was basically perfect for about six years after LASIK. I can’t say enough good stuff about that decision, but I’m taking a long time to get to the really good part of this article, so I’ll leave it at that. Absolutely would recommend LASIK to anybody who qualifies.

But of course time marches on. Near vision starts to degrade for almost everyone sometime in their forties or thereabouts, which is why we need reading glasses and shine our phone flashlights on the menu. It’s called presbyopia, and it happens because the lens becomes less elastic and those ciliary muscles just can’t squeeze hard enough to change its shape for near focus. Folks who already have glasses start buying biofocals, and those of us with good distance vision (naturally or thanks to LASIK) start haunting the drugstore aisles for cheap readers.

A little fine print (see my eye joke there?): during my LASIK consult, I chose the “regular” version which corrects both eyes for distance. There is another option called “monovision” in which the dominant eye is corrected for distance, but the second eye is corrected for reading. That is, the second eye is adjusted so that an object in the near field is projected clearly onto the retina with the lens at rest (vs. “squeezed” as we discussed above). Typically, the brain is able to adjust and automatically swap between eyes based on what you’re looking at, which is utterly amazing.

Because the near-vision eye can focus with the lens at rest, monovision can head off presbyopia — you don’t need to change the lens shape to see close-up, you just need to use the eye dedicated to that purpose. This was tempting, but there are a few downsides, particularly (for me) some loss of depth perception since you no longer have effective binocular vision. And since LASIK removes only a tiny amount of corneal tissue, you can actually have it done more than once — I was assured that I could simply “touch up” my eyes in the future to address presbyopia or other changes if needed.

Indeed, I eventually started to need readers, and it was fine. I’m not sure why, but there was actually something kind of nice about the ritual of pulling out the glasses to read or work the crossword or whatever. That is, it was nice until I started needing them for everything. Cooking instructions on the frozen pizza? Glasses. Seat on my boarding pass? Glasses. Which direction does the HDMI cable go in? Glasses. You get the idea. When I started needing them just to snooze the alarms on my phone, I knew it was finally time to go in for the “touch up.” Procrastinated a bit more thanks to COVID and all, but finally pulled the trigger about a month ago.

After walking around the house taking pictures of everywhere I noticed readers lying around (above), I rolled up to the Sharpe Vision office for my consult only to realize that it was no longer their office — apparently in the almost-decade since I got my LASIK they moved a few streets down. A quick lookup on the phone (with readers) and I made it just in time for my appointment at their new place past Burgermaster on 112th

… only to find that the world had changed once again. Yes, they could touch up my LASIK, and could even offer a new flavor called laser blended vision that’s like monovision but with improved depth perception. But what I really ought to check out is RLE — Refractive Lens Exchange. And since apparently I’m always up for new ways to mess with my eyes, I was totally in. Here’s the deal.

Cataracts

Along with presbyopia, over time most people eventually develop cataracts, a clouding of the lenses that makes them less able to transmit light energy. This is the other reason we’re all using our phone flashlights to read our menus. The good news is that cataracts are easily fixed by replacement of the natural lens with an artificial one.

An aside: cataracts are a major cause of correctable blindness in the developing world. Doctors Without Borders has conducted free “eye camps” in Somalia for many years and has fixed cataracts for hundreds of people who literally go from blind to normal vision in one day. If you’re able to give a bit, you’re not going to find a better organization — they are awesome.

Because we do what we do, there’s been a ton of innovation in replacement lens technology. The path of that innovation is pretty neat, and recently folks have realized that — hey — maybe these lenses are awesome and safe enough that we don’t need to wait until cataracts form to swap them in! The material lasts well beyond the fifty-odd years that middle-aged humans have before them, so why not? Thus was born the “RLE” (Refractive Lens Exchange) industry, and a new practice for the newly re-named “SharpeVision Modern LASIK and LENS.”

2023: Refractive Lens Exchange

RLE at Sharpe with Dr. Barker is pretty fancy. Even before we get to the lens itself, the procedure alone shocks and awes:

  1. The CATALYS Precision Laser System identifies key structures in the eye and creates a 3D map at the micron level. Check out the video of this, it’s super-cool.
  2. The laser cuts small entry slits through the cornea and a round opening in the front of the capsule that holds the lens.
  3. The laser softens and segments the existing lens so that it can be easily broken up and sucked out through a small vacuum tube.
  4. The new lens is passed into the now-empty capsule through a small tube. The lens is flexible and can be folded up so it fits through the small entry hole.
  5. When the lens unfolds, two springlike spiral arms called haptics hold it in place in the center of the capsule.

All of this computer-assisted laser stuff is just incredible. I was awake throughout my procedure and it was pretty crazy to listen to this HAL-like computer voice announcing what percentage of my lens had been sucked out at each step.

Monofocal IOLs (Intraocular Lenses)

OK, finally I get to talk about the intraocular lens itself, which is what sent me down this rabbit hole in the first place. The old-school version of this is the Monofocal IOL, which “simply” acts just like the lens in your glasses or the reshaped cornea in LASIK, using refraction to focus images at distance onto the retina. Monofocals are the workhorse of cataract surgery, but they have some disadvantages. Primarily, since they have only one focal distance and can’t be squeezed / reshaped by the ciliary muscle, readers are basically guaranteed for close-up work. There is a “monovision” option using differently-powered lenses in each eye, but that comes with all the same issues as monovision LASIK.

Accommodating IOLs

Today there are basically two kinds of “premium” IOLs that attempt to provide a glasses-free experience. One is the “accommodating” IOL — most famously the Bauch & Lomb Crystalens. The concept makes a ton of sense — just replicate the action of our natural lens. Remember that an IOL has little springy arms called haptics that hold it in place in the eye (the orange bits in the picture here). The same ciliary muscle that squeezes our natural lens can apply force to these haptics, which are designed to change the lens shape and position in response. The rest of your vision system just does what it’s always done, and the focus point adjusts naturally.

Pretty neat, and I’m always drawn to biomimetic solutions, because evolution tends to, well, work. But while it’s a little hard to find good data, it appears that the Crystalens has seriously dropped in popularity over the last decade or so — only 10% of practitioners were using it in 2021 according to this “Review of Ophthalmology” article that claims to know. From what I can find (e.g., here) it seems that the near vision improvements from these lenses just aren’t that great, and may also decline over time. Perhaps our intrepid ciliary muscle just loses some oomph as we get older … who knows.

So at least for now, accommodating IOLs don’t seem to be the favorite child. Even the original inventor of the Crystalens has moved on to new technologies. But don’t blink (another eye joke), because there are true believers still working the problem with a bunch of new stuff in the pipeline.

Multifocal IOLs

OK, we’ve finally arrived at my lens, the Clareon PanOptix Trifocal IOL, presently the most popular of the other class of premium IOLs: multifocal. Multifocal lenses have no moving parts but instead divide up the incoming light rays into multiple focal points — two for bifocals, three for trifocals. The “distance” focus typically uses refraction — the same mechanism we’ve seen again and again on this journey. But multifocal lenses are shaped so that light entering from near or intermediate distances is diffracted to provide focus in those ranges.

Diffraction occurs when light hits a discontinuity in material. The actual math is super complicated and a bit beyond me, but at the highest level, a light wave passing through different materials (the lens itself vs the aqueous material surrounding it) creates interference patterns that ultimately bend the light in a predictable way. A multifocal lens has a bunch of concentric circles of varying heights that produce this effect — you can see them if you click to zoom into the picture of the PanOptix on the right.

The end result is that the lens creates clear images on the retina at three different distances:

  1. Plano or “infinity” for driving and watching whales in Puget Sound (refracted).
  2. About 24 inches for “intermediate” tasks like computer (and lathe!) use.
  3. About 16 inches for “near” tasks like reading.

Multifocal Issues and Mitigations

If you’re paying attention, you’re probably asking the same question I did when I first learned about these things. Aren’t you now getting THREE images projected onto the retina at the same time? Well, kind of yes. But two things help you out. First, your brain is just really smart and figures it out in the same way that it does with monovision — paying attention to the stuff you are showing interest in by the direction of your gaze and other clues. More importantly, at any given time there’s usually only one of these three distances that actually has something to look at. For example, if I’m reading I’m not getting much of an image from anything behind the book. Between the two of these, your brain very quickly just makes it work.

It is amusing to experience these artifacts in real life. The most obvious one is the “halos” that appear around point light sources such as headlights or streetlights. I wish I could capture it with a camera, but you actually see the diffraction patterns — the light looks exactly like the rings on the lens itself! It’s a bit annoying — if I were a long haul trucker I might think twice about getting a multifocal — but for me it’s no big deal.

A second issue makes sense in theory, but (least so far) I’m not experiencing it in practice. With a natural lens, pretty much all of the light that comes into your eye is captured by the retina. Of course the iris opens and closes to admit an optimal amount of light, but very little of that is lost passing through the lens. With a multifocal the energy is divvied up between the focal points, plus there is some additional loss inherent in the diffractive process itself.

The PanOptix has a neat feature that tries to minimize this by “collecting” light energy at a (mostly unused I guess?) focal distance of 120cm and diffracting it in reverse so that energy helps power distance vision. The end result is that the PanOptix uses about 44% of incoming light for distance, 22% each for near and intermediate, and loses about 12% to the process. Not bad! And at least so far I can’t detect any loss of contrast or issues in lower-light situations. The effect is surely there, I’m just not aware of it.

The Hits Keep Coming

So far I’m super-satisfied with my new lenses — distance vision feels about the same as it was before, but I can read and use the phone/computer comfortably without my trusty readers. Every day my brain gets more used to the various artifacts that do exist, and my vision should stay pretty stable much until I die. Woo hoo!

At the same time, it’s clear that all three of the broadly-used lens types out there (monofocal / accommodating / multifocal) have pros and cons — none work as well as the natural lenses of a young adult. So researchers keep pushing the envelope. The latest concept I’ve read about is Extended Depth of Focus (really well-explained here). The concept behind EDOF lenses is to extend the range of distances that can provide an acceptably (if not perfectly) focused image on the retina, rather than pinning focus to specific intervals.

There are a few mechanisms being tried to product EDOF; the easiest for me to understand is the pinhole effect, which has been used in photography for years. By shrinking the hole through which light enters, you basically filter out the steeper rays that would spread out over the retina, leaving only the ones that are already mostly parallel anyways (regardless of how far they are in front of the eye). Of course this also filters out a bunch of light energy, so it’s harder to see in low-light conditions. So far these lenses have mostly been used monovision-style — one eye gets the pinhole lens and the other gets a classic monofocal.

It’ll be interesting to see how this new approach plays out. And I could easily keep digging deeper into this stuff forever — but I think we’ve covered more than enough for one article. In case it isn’t clear, I’m fascinated with attempts to repair, build on and improve the capabilities that have been so hard won by evolution over millennia. Getting new lenses and learning about the technology has been super-fun — thanks for coming along for the ride!

The Most Important ChatGPT App Ever

I’ll grant that I have a relatively nerdy social circle — but it’s still sort of shocking just how many people I know are actually doing useful and interesting things with ChatGPT. Just a sampling:

Just to iterate what I’ve said before, I believe this thing is really real, and it behooves everyone to spend some time with it to build an intuition for what it is (and isn’t) good at. Like any technology, it’s important to have at least a basic understanding of how it works — otherwise folks that do will use it to take advantage of you. The fact that this technology appears to be sentient (hot take from Sean, see how I just dropped that in there?) doesn’t change the reality that people will use it to create phishing scams. Two of my past posts may help:

Anyways, all of this peer pressure got me thinking that I’d better do something important with ChatGPT too. And what could possibly be more important than creating more amusing content on Twitter? I know, right? Brilliant! So that’s what I did. And I figured I might as well write about how I did it because that might help some other folks stand on these impressive shoulders.

AI News Haiku

You’re definitely going to want to go visit @AINewsHaiku on Twitter (don’t forget to follow!). Three times a day, roughly just before breakfast, lunch and dinner, it randomly selects a top news story from United Press International, asks ChatGPT to write a “funny haiku” about it, and posts to Twitter. That’s it. Funny(-ish) haikus, three times a day.

The rest of this post is about how it works — so feel free to bail now if you’re not into the nerd stuff. Just don’t forget to (1) follow @AINewsHaiku, (2) tell all your friends to follow it too, and (3) retweet the really good ones. Be the trendsetter on this one. No pressure though.

The Code

I’ve reluctantly started to actually enjoy using Node for little projects like this. It’s super-easy to get going without a thousand complicated build/run steps or an IDE, and with a little discipline Javascript can be reasonably clean code. Have to be really careful about dependencies though — npm makes it really easy to pick up a billion packages, which can get problematic pretty quick. And “everything is async” is just stupid because literally nobody thinks about problems that way. But whatever, it’s fine.

There is not a lot of code, but it’s all on github. Clone the repo, create a “.env” file, and run “node .” to try it yourself. The .env file should look like this (details on the values later):

OPENAI_API_TOKEN=[OpenAI Secret Key]
TWITTER_API_APP_KEY=[Twitter Consumer API Key]
TWITTER_API_APP_SECRET=[Twitter Consumer API Secret]
TWITTER_API_ACCESS_TOKEN_KEY=[Twitter Authentication Access Token]
TWITTER_API_ACCESS_TOKEN_SECRET=[Twitter Authentication Access Secret]

index.js starts the party by calling into rss.js which loads the UPI “Top News” RSS feed and extracts titles and links (yes RSS still exists). xml2js is a nice little XML parser, a thankless job in these days of JSON everywhere.  You’ll also note that I’m importing “node-fetch” for the fetch API; it’s built-in in Node v18 but the machine where I’m running the cron jobs is locked to Node v16 so there you go.

Talking to Chat-GPT

After picking a random title/link combo, next up is openai.js which generates the haiku.. The OpenAI developer program isn’t free but it is really really cheap for this kind of hobby use; you can get set up at https://platform.openai.com. My three haikus a day using GPT-3.5 run somewhere on the order of $.10 per month. Of course, if you’re asking the system to write screenplays or talk for hours you could probably get into trouble. Live on the edge, and make sure to add your secret key into the .env file.

In its simplest form, using the chat API is just like talking to the models via the user interface. My prompt is “Write a funny haiku summarizing this topic: [HEADLINE]” which I send with a request that looks like this:

{
  "model": "gpt-3.5-turbo",
  "temperature": 0.5,
  "messages": [ "role": "user", "content": PROMPT ]
}

model” is pretty obvious; I’m using v3.5 because it’s cheap and works great.

temperature” is interesting — a floating point value between 0 and 2 that dials up and down the “randomness” of responses. In response to a given prompt, a temp of 0 will return pretty much the same completion every time, while 2 will be super-chaotic. 0.5 is a nice conservative number that leaves some room for creativity; I might try dialing it up a bit more as I see how it goes. There is also a parameter “top_p” which is similar-but-different, typical of many of the probabilistic dials that are part of these models.

I’ve sent a single element in the “messages” parameter, but this can become quite elaborate as a way to help explain to the model what you’re trying to do. The guide for prompt design is really fascinating and probably the best thing to read to start building that intuition for the system; highly recommended.

There are a bunch of other parameters you can use that help manage your costs, or to generate multiple completions for the same prompt, that kind of thing.

The JSON you get back contains a bunch of metadata about the interaction including the costs incurred (expressed as “tokens,” a vague concept corresponding to common character sequences in words; you can play with their tokenizer here). The completion text itself is in the “choices” array, which will be length == 1 unless you’ve asked for multiple completions.

Over time it’s going to be interesting to see just how challenging the economics of these things become. Training big models is really, really computationally-expensive. At least until we have some significant quantitative and/or qualitative change in the way its done, only big companies are really going to be in the game. So while I’m sure we’ll see pretty fierce competition between the usual suspects, there’s a big risk that the most revolutionary technology of the century is going to be owned by a very small number of players.

For now, just have fun and learn as much as you can — it’ll pay off no matter what our weirdo economic system ends up doing.

And… Tweet!

Honestly I thought this was going to be the easiest part of this little dalliance, but the chaos that is Twitter clearly extends to its API. It’s bad in pretty much every way: 2+ versions of the API that overlap a lot but not entirely; four different authentication methods that apply seemingly randomly to the various endpoints; constantly changing program/pricing structure with all kinds of bad information still in the documentation. Worst of all, the API requires signed requests which pretty much makes calling their REST endpoints without a library enormously painful. Wow.

Having tried a few libraries and trial-and-errored my way through a few approaches, the actual code in twitter.js isn’t bad at all — but the journey to get there was just stupid. To try and save you some time:

  • Sign up for free access at https://developer.twitter.com/en/portal/dashboard. They will try to direct you to “Basic” access but this is $100/month; don’t be fooled.
  • You’ll get a default “Project” and “App” … scroll to the bottom of the app “Settings” and choose “Edit” under “User Authentication Settings.” Make sure you have read/write permissions selected (you won’t at first). A bunch of fields on this page are required even if you’re not going to use them — just do your best until they let you hit “Save.”
  • Now under “Keys & Tokens” choose “Regenerate” for “Consumer Keys / API Key and Secret” and “Authentication Tokens / Access Token and Secret” … save these values and add them to the appropriate spots in your .env file.

This will set you up to call the v2 method to post a tweet using the OAuth v1.0a authentication model. There are surely many other ways you can get things working, but that was mine. I also chose to use the twitter-api-v2 library to manage the noise — it does a fine job trying to hide the dog’s breakfast that it wraps. At least for now. Until Elon gets into a slap-fight with Tim Berners-Lee and decides to ban the use of HTTPS.

You’re Welcome!

The point of all this (beyond the excellent haiku content which you should definitely follow) was just to get some hands-on experience with the API for ChatGPT. Mission accomplished, and I’m really quite impressed with how effective it is, especially given the speed at which they’re moving. I just have to figure out how to reliably tell the model to limit content to 250 characters, because until I do that I’m not going to be able to release @AINewsLimerick or @AINewsSonnet. The world is waiting!

Looking back at Azyxxi… er, Amalga.

Just a few months after the Great Gunshot Search incident of 2005, I found myself at Washington Hospital Center while Dr. Craig Feied showed us list after list on a huge (for the time) monitor. Real-time patient rosters for the ER and ICU, sure, but that was just the warmup. Rooms that needed cleaning. Patients who needed ventilation tubes replaced. Insurance companies with elevated rates of rejected claims. Patients eligible for actively-recruiting complex trials. He just kept going, like a fireworks show where every time you think you just saw the finale they start up again. Incredible stuff. Anyways, cut to a few months later and we (Microsoft) announced the acquisition of Azyxxi — adding an enterprise solution to our growing portfolio in Health Solutions.

Sadly — and despite a ton of work — we were never really able to scale that incredible solution at WHC into a product that realized the same value at other institutions. That’s not to say there weren’t some great successes, because there absolutely were. But at the end of the day, there was just something about Azyxxi that we couldn’t put into a box. And while it’s tempting to just say that it was a timing problem, I don’t think that was it. Even today I don’t see anything that delivers the magic we saw in Dr. Craig’s office — just flashy “innovation” videos and presentations that never quite make it to the floor in real life.

So what was the problem? Anything we can do about it? I dunno — let’s talk it out.

Oh, and just to get it out of the way early, “Azyxxi” doesn’t mean anything — it’s just a made-up word engineered to be really easy to find with Google. We renamed it “Amalga” at Microsoft, which does actually have some meaning behind it but in retrospect sounds a bit like some kind of scary semi-sentient goo. Moving on.

Just what was it?

A correct but only semi-helpful description of Azyxxi is that it was a data analysis and application platform for healthcare. Three parts to that: (a) data analysis, like a big data warehouse; (b) an application platform so insights gained from analysis could be put into on-the-floor solutions; (c) made for healthcare, which means there was functionality built-in that targeted weirdnesses endemic to the business of providing care. This is of course a mouthful, and one of the reasons it was hard to pitch the product outside of specific use cases. A better and more concrete way of looking at the product is to break it down into five key activities:

1. Get the Data

Healthcare data is incredibly diverse and notoriously messy — images, free text notes, lab results, insurance documents, etc. etc.. The first rule of the Azyxxi Way (yes we actually referred to it like that) was to “get the data, all of it, without trying to clean it up.” Which is to say, it was a Data Lake before Data Lakes were cool (or even a term). In 2006 the conventional wisdom for data warehousing was “Extract, Transform, Load.” ETL pipelines extract data out of source systems, transform it into (usually) a “star schema” optimized for analysis, and load it into a target database. In this model an enormous amount of upfront thought goes into what data is important, and transforming/normalizing it into a shape that can efficiently answer a set of predefined questions.

Azyxxi’s insight was that ETL prework is always wrong, and leaves you with a brittle data warehouse unable to answer novel questions as they inevitably arise. Instead they talked about “ELT” — loading everything just as it was in the source systems and figuring out the “transform” part later. This seems obvious now, but we all used to worry a ton about performance. Azyxxi used SQL Server, and the founders were constantly pushing its boundaries, typically with great success. Sure, some queries were really slow — but you could at least ask the question!

2. Ask Novel Questions

Which leads us to the first user-driven Azyxxi experience — exploration. Using an Excel-like grid display, users had the ability to query source tables individually or via pre-configured “joins” that linked records. Sort, filter, etc. — all the standard stuff was there. Of course there was access control, but this was a care-focused tool in a care-delivery setting — by default users could see a LOT. And as noted above they could get themselves into “trouble” by running queries that took hours or days, but SQL Server is smart and it was mostly just fine.

The key is that there was a culture at the early Azyxxi sites, developed over many years, of asking questions and self-serving the answers. This is not typical! Most nurses and doctors ask themselves data-driven questions in passing, but never follow them up. Working with the IT department to run a report, combine data from multiple sources, get approval to make a change — it just isn’t worth the hassle. So great ideas just die on the vine every day. Azyxxi users knew they had a way to answer their questions themselves — and so they did.

3. Bring Insights to the Floor

It’s awesome to be able to ask questions. But it’s only really impactful when you can use the answers to effect change in real life. With Azyxxi, one-off queries could be saved together with all of their settings — including automatic refresh and kiosk-style display — and shared with other users or departments.

If you’ve been a hospital inpatient or visitor lately, almost certainly you’ve seen the patient roster grid at the central nurse’s station. At my recent colectomy the surgical unit had a live status board that helped my wife keep track of my progress through the building. Great stuff, but every one of these dashboards is an IT project, and no IT project is trivial. With Azyxxi, more than a decade ago, users could create and deploy them by themselves.

But hold on. I’ve already said twice that novel queries against source data could be really slow — a “real-time” dashboard that takes an hour to load isn’t going to get very far, and end users don’t have the skills or tools to fix it. What to do?

Azyxxi empowered the IT folks to run behind user innovation and keep things humming. Each user-created list was driven by an automatically generated SQL query — and anyone who has written interfaces like this know that they can become very inefficient very quickly. Slow queries were addressed using a sliding scale of intervention:

  1. Hand-code the query. SQL experts on the Azyxxi team were great at re-writing queries for performance. The new query could be inserted behind the user grid transparently and without downtime — it just looked like magic to the end users.
  2. Pre-calculate joins or derived data. When hand-coding queries wasn’t enough, the team could hook into the “EL” part of data acquisition and start doing a little “T” with code. For example, data from real-time monitors might be aggregated into hourly statistics. Or logic to group disease codes into higher-level buckets could be applied ahead of time. These were the same kind of “transforms” done in every data warehouse — but only done after a particular use case proved necessary and helpful.
  3. Fully-materialize user grids. An extreme version of pre-calculation, sometimes code would be written to build an entire user grid as its own table. Querying these tables was lightning fast, but creating them of course took the most IT effort.

The refrain here was just-in-time optimization. The software made it easy for the Azyxxi IT team to see which queries were active, and to assess which approach would lead to acceptable performance. That is, they optimized scarce IT expertise to only do work that was already known to have real clinical value. Compare this to the torturous processes of up-front prioritization and resource allocation in most of the world.

Axyxxi also made these transforms sustainable by strictly enforcing one-way data dependency. Only one “parser” (not really a parser in the CS sense, just Azyxxi terminology for ELT code) could write to one target (typically a table or set of tables), and then optionally trigger additional downstream parsers to perform further transformation into other targets. This “forward-only-write” approach provided a ton of benefit — most importantly automatic error and disaster recovery. At any time, parsers at any level of the hierarchy could be re-run from scratch, trigger their downstream dependencies, and end up with an exact copy of what existed before the recovery event.

Even these dependencies could become complicated, and nobody loved the idea of a “full re-parse” — but it was an invaluable backup plan. One we took advantage of more often than you’d expect!

4. Close the Loop

Because data acquisition was near-real-time, most grids didn’t require additional user input to be useful. New lab results arriving for a patient naturally caused them to fall off of the “patients awaiting lab results” grid. It’s kind of amazing how many problems fit this simple pattern — auto-refreshing grids on a kiosk screen prove to be transformative.

But sometimes there was no “source system” to provide updates — e.g., a list that alerted facilities to newly-vacated rooms that needed to be cleaned. The “newly-vacated” part came from the external EHR system, but cleaning times did not. Azyxxi included user-editable fields and forms for this purpose — never changing ingested data, just adding new data to the system. A facilities employee could simply click a row after taking care of a room, and the grid updated automatically.

Users could create pretty complex forms and such in the system — but honestly they rarely did. Usually it was simply checking an item off of a list, maybe with a bit of extra context about the activity. Simple stuff that created beautifully elegant solutions for a ton of different situations.

5. Improve the data

There are a bunch of challenges specific to healthcare data. Take for example the humble patient identifier — by law we have no federal patient identification number in the United States. The amount of time and money spent making sure records are assigned to the right human is absolutely shocking, but there it is. Especially in high-stress hospital admission settings, recorded demographics are often wrong or missing — every significant health care information system has to deal with this.

Privacy rules are another one. Providers in a care setting have very few restrictions on the data they can see, but the same isn’t true for all employees, and certainly not for visitors walking by kiosk displays in a hallway. There are specific rules around how data needs to be anonymized and what data elements can appear together — more work for users trying to build usable queries.

Even simply figuring out why a patient is in the hospital can be tough. Different systems use different “coding systems”, or sometimes no coding at all. A huge federal project called the “Unified Medical Language System” is an attempt to help navigate all of this, but it’s pretty hairy stuff and not in any way “user ready.”

Azyxxi’s “one way” parsing system made it relatively easy to help create “augmented” tables to handle these things once rather than many times. My favorite example of this was the “PHI filter” parser, which would take a table and automatically create a version that masked or otherwise anonymized relevant fields. The user interface could then be directed at the original or “safe” version of the table, depending on the rights of the logged-on user.

This all sounds great, so what happened?

If you’ve read along this far, you probably already have a sense of the challenges we were about to face as Azyxxi v1 became Amalga v2. We spent a lot of time upgrading and hardening the software, modernizing UX, etc. – and that all went fine, albeit with some inevitable cultural churn. And despite a non-trivial problem with “channel conflict” — our nascent sales team was getting a positive response to the story we were telling. I mean, a simple slide show of awesome use cases at WHC and other Azyxxi sites was pretty compelling.

Side note: channel conflict is a tough thing at Microsoft! The sales team is used to co-selling with third parties that build solutions on top of Microsoft platforms like Windows and SQL Server (and now Azure). So they were best buddies with a whole bunch of healthcare data analytics companies that were in direct competition with Amalga … oops! This problem is a hassle for every vertical solution at Microsoft, and they’ve never really figured out how to deal with it. I don’t think it played a primary role in Amalga’s market woes, but it sure didn’t help.

So the software was OK — but right away, early implementations just weren’t making it into production use on schedule. What the heck?

Oops, IT Culture

First, it turned out that we had a significant problem fighting IT culture at our target customers. The Azyxxi team at WHC and its sister organizations were also the Azyxxi developers. For them, the counter-conventional-wisdom practices of Azyxxi were the whole point, and they knew how to turn every knob and dial to achieve just-in-time optimization. But your typical health system IT department — even those run by really competent folks — just doesn’t think that way. They are a cost center with an endless list of projects and requests, often driven more by risk avoidance than innovation. Most of these shops also already had some sort of data analytics solution; while they invariably sucked, they existed and were a sunk cost that the team knew how to use.

The Amalga team walked in and just started breaking eggs left and right. We asked for a very large up-front investment, using weird new techniques — all for a few smallish initial use cases that had captured the eye of some annoying but influential doctor or the Chief Medical Officer. We told them to “just get the data, don’t worry about what you think you need.” We told them that SQL Server was fine for things that made their SQL experts faint on the spot. We told them to give broad access to users rather than assigning rights on a “need to know” basis.

In short, we told them to do everything differently, using coding skills they didn’t even have. Not surprisingly, that didn’t work out. In almost every case we became bogged down by “prioritization” and “project planning” that stopped implementations cold. And even when we finally were able to eke out an MVP implementation, we almost always ran straight into our second stumbling block.

Oops, User Culture

The Amalga team used to talk a lot about “democratizing” access to data. And to be sure, nobody has better insight into day-to-day problems than nurses and docs and the others doing the actual work of providing care. But as it turns out, not a lot of these folks have the skills, motivation or time to dig in and create the kind of self-reinforcing flywheel of improvements that Amalga was designed for.

At least, that’s the way it is in most healthcare systems. The IT department and leadership push technology down onto the working staff, and they just have to deal with it. Sometimes it’s great, sometimes it’s awful, but either way it typically isn’t something they are rewarded for getting involved with. Executives and maybe department heads ask the IT department to prepare “reports” that typically show very high-level, lagging indicators of quality or financial performance. But technology-driven workflow changes? It’s usually a pretty small bunch making those calls.

This was a challenge at the early Azyxxi sites, too. But a combination of (a) sustained evangelist outreach from the Azyxxi team itself, and (b) successful users becoming evangelists themselves, created the right environment to bring more and more users into the game. Almost every department had at least one active Azyxxi user who would work with their colleagues to leverage the tools. But at new Amalga sites, where the IT team was often reluctant to begin with, with no established pattern of users self-serving their own solutions, and only a few small uses cases deployed — starting the flywheel was a tall order indeed.  

It’s tough to establish a system when you’re fighting culture wars on both the supply and demand fronts!

The good fight: Amalga v3

With a pretty clear set of problems in front of us, the Amalga team set out strategies to fix them. I’m really proud of this time in HSG — the team came together in one of those moments of shared purpose that is both rare and exhilarating. Some of the software we built would be state of the art even today. Bryan, Mehul, Kishore, Noel, Adeel, Sohail, Sumeet, Mahmood, Puneet, Vikas, Imran, Matt, Linda, Shawna, Manish, Gopal, Pierre, Jay, Bei-Jing, many many more … it was just a ton of fun.

Goal #1: Easier for IT

The biggest knock on Amalga v2 from IT was that it was just too slow. Of course, having been on this journey with me you know that this misses the point. Amalga was designed for just-in-time optimization — if important queries were “slow” they just needed to be optimized by hand-coding, pre-computing key values, or fully materializing tables. Simple! Unless of course your IT team doesn’t have advanced coding or SQL skills. Which was, unfortunately, most customers.

We took on a bunch of features to better automate JIT optimization, but the biggest by far was automatic materialization. Based on a list query created either in the Amalga user interface or by hand, Amalga v3 could automatically create and maintain a flat, simple table representing the results, with maximally-efficient inserts and updates at parse time. This meant that every grid could be made performant simply by checking a box to turn on materialization. OK, maybe not that easy — but pretty close.

We also made initial data acquisition simpler by introducing a “super parser” that could be driven by configuration rather than by code. We put together a sophisticated install and patch system that enabled upgrades without disturbing user customizations. We extended our custom client with Sharepoint integration, making it easier to combine Amalga and other corporate content, and reduced the burden of user and group management. And much more.

Goal #2: Shorter Time-to-Value for Users

If users weren’t creating their own apps, we’d bring the apps to them!

On top of the new Sharepoint integration, we created a configuration framework for describing data requirements for detailed, targeted use cases. Deploying an app was simply a matter of identifying the source for each data element required — a “checklist” kind of task that was easy to explain and easy to do. And while installing the first app or two would certainly require new parsing and data extraction work, at critical mass users were mostly reusing existing data elements, making it far easier to demonstrate the value of building a “data asset” over time.

And then we went mining for apps. We dug up every Azyxxi use case and convinced early Amalga customers to share theirs. Even better, we created a developer program, both for consultants who helped customers build their own apps (e.g., Avenade) and third party developers that created their own (e.g., CitiusTech). Classic Microsoft playbook — and a great way to recapture Dr. Craig’s fireworks-that-never-end sales experience.

Goal #3: Kickstart Evangelism

Lastly, we dropped our own people into customer sites, to be the early evangelists they needed. I was the executive sponsor for Seattle Children’s Hospital and was there at least once a week in person to help the IT team solve problems, meet with docs and nurses to develop lists and apps, take feedback and get yelled at, whatever it took. I learned a ton, and was able to bring that learning back to the team. I’ll always appreciate the time I spent there with Drex and Ted — even if it wasn’t always fun.

Honestly, I’ve never seen another organization commit to its customers so hard. Every single person on the team was assigned to at least one site — execs, sales, engineers, everyone. And our customers’ success was part of our annual review. If we just couldn’t get somebody over the hump, it sure wasn’t for a lack of sweat equity. In fact I forgot about this, but you can still find demos made by yours truly more than a decade ago! Here’s one inspired by Atul Gawande’s Checklist Manifesto:

And then came Caradigm (and Satya)

Update: Originally I dated the below as 2014 and Renato corrected me — the Caradigm JV was formed in 2012, two years before Satya’s official start date and my ultimate departure from the company. Those two years were very quite chaotic between the two CEOs and I’m afraid my brain conflated some things — thanks for setting me straight!

By 2012 we’d been in a long, pitched battle — making progress, but still tough. Then again, that had pretty much been the plan we set with Steve back in 2006; it was going to take a long time for Microsoft to really get established in a vertical industry like healthcare. I have always admired Steve for his willingness to commit and stick with a plan — people love to winge, but he was great for Microsoft for a long time.

But companies are not families, and shareholders and the market were clearly ready for new strategies and new blood as CEO. And where Steve’s approach was to go broad, Satya’s was (is) to go deep on just a few things — and clearly he was on the rise. Don’t get me wrong, it has clearly been a winning strategy for Azure and the business; a big part of my portfolio is still in Microsoft and my retirement is thankful for his approach! But it did shine a very, very bright spotlight on ventures like Health Solutions that weren’t core to the platform business and weren’t making any real money (yet). Totally fair.

So we had to find another path for Amalga.

During the last few years, it had become clear that a key use case for Amalga was population management — the idea that with a more comprehensive, long-term view of an individual we could help them stay healthy rather than just treat them when they’re sick. This is the driving force behind “value-based” care initiatives like Medicare Advantage, and why you see these plans promoting healthy lifestyle options like weight loss and smoking cessation — small early investments that can make a big difference in costs (and health) later in life.

But to do this well you need to know more about an individual than just when they show up at the hospital. It turns out that Amalga was very well-suited to this task — able to pull in data from all kinds of diverse sources and, well, amalgamate it into a comprehensive view (I had to do that at least once, right?). In fact, Amalga apps related to population health were typically our most successful.

It turned out that GE HealthCare was also interested in this space, building on their existing hardware and consulting businesses. Thus was born Caradigm, a joint venture that set out with partners like Geisinger Health to build population health management tools on top of Amalga. The new company took some employees from Microsoft but was more new than old, and fought the good fight for a few years until they were ultimately bought by Imprivata and frankly I’ve lost the thread from there.

TLDR; What to make of it all?

In retrospect, I think it’s pretty clear that Amalga’s problems were really just Healthcare’s problems. Not technology — Amalga v3 was certainly more sophisticated than Azyxxi v1, but both of them could do the job. Data and workflows in healthcare are just so fragmented and so diverse that a successful data-driven enterprise requires the problem-solving skills of people at least as much as technology. More specifically, two types of people:

  1. Developers that can quickly build and maintain site-specific code.
  2. Evangelists that can bring potential to life for “regular” users.

Of course a certain level of technology is required just to house and present the data. And great tech can be an enabler and an accelerant. But without real people in the mix it’s hard for me to imagine a breakout product that changes the game on its own. Bummah.

But let me end with two “maybes” that just might provide some hope for the future:

MAYBE all of the layoffs in pure tech will change the game a bit. As somebody who has built teams in both “tech-first” and “industry-first” companies, I know how tough it is to attract really top talent into industry. Tech has always paid more and had way more nerd cred. I find that annoying because it can be incredibly rewarding to do something real and concrete — as much as I loved Microsoft, nothing I ever did there matched the impact of collaboration with clinicians and patients at Adaptive Biotechnologies. If we can get more talent into these companies, maybe it’ll pay off with a few more Azyxxi-like solutions.

Or MAYBE ChatGPT-like models will be able to start filling in those gaps — they can already write code pretty well, and I wouldn’t be shocked to see a model create high-impact dashboards using historical performance data as a prompt. This one may be a little more out there, but if AI could create an 80% solution, that might just be enough to get users excited about the possibilities.

Who knows? I just hope folks find some interesting nuggets in this very long post — and if nothing else I had a great time walking myself down memory lane. I will leave you with this video, made after the acquisition but sadly before I was spending day-to-day time on the product. We do get knocked down, and 100% we get up again!

Roku Channel SDK: Ferry Cameras!

Lara and I shuttle regularly between Bellevue and Whidbey Island in Washington, so the Mukilteo-Clinton ferry is a big part of our life. WA actually runs the largest ferry system in the USA, with 28 boats tooting around the Puget Sound area. Super fun day trips all over the place, and the ships are pretty cool — there’s even a contracting process open right now to start converting the fleet to hybrid electric. Woot! But it can get pretty crowded — at peak summer times you can easily wait three hours to get on a boat. Recent staffing challenges have been a double-whammy and can make planning a bit tough. On the upside, a friend-of-a-friend apparently does a brisk business selling WTF (“Where’s the Ferry?”) merchandise.

Anyways, picking the right time to make the crossing is a bit of an art and requires some flexibility. We often will just plan to go “sometime after lunch,” pack up the car, and keep one eye on the live camera feeds watching for a break in the line. It occurred to me that having these cameras up on our TV would be more convenient than having to keep pulling my phone out of my pocket. Thus was born the “Washington Ferry Cameras” Roku channel, which I’ve published in the channel store and is free for anyone to use. Just search the store for “ferry” and it’ll pop up.

The rest of this article is just nerdstuff — the code is up on github and I’ll walk through the process of building and publishing for Roku. Enjoy!

The Roku Developer SDK

There are two ways to build a Roku channel: Direct Publisher and the Developer SDK. Direct Publisher is a no-code platform intended for channels that show live or on-demand videos from a structured catalog. You basically just provide a JSON feed describing the video content and all of the user experience is provided by Roku. It’s a pretty sweet system actually, making it easy for publishers and ensuring that users have a consistent streaming experience across channels.

The Developer SDK is meant for channels that do something other than just streaming video. There are tons of these “custom channels” out there — games and tools and whatnot. My ferry app clearly falls into this category, because there isn’t any video to be found and the UX is optimized for quickly scanning camera images. So that’s what I’ll be talking about here.

Roku SDK apps can be built with any text editor, and you can test/prototype BrightScript on most computers using command-line tools created by Hulu. But to actually run and package/publish apps for real you’ll need a Roku device of some sort. This page has all the details on enabling “developer mode” on the Roku. In short:

  1. Use the magic remote key combo (home + home + home + up + up + right + left + right + left + right) and follow the instructions that pop up.
  2. Save the IP address shown for your device. You’ll use it in a few ways:
    • Packaging and managing apps using the web-based tools at http://YOUR_ROKU_ADDRESS
    • Connecting to YOUR_ROKU_ADDRESS port 8085 with telnet or Putty to view logging output and debug live; details are here.
    • Configuring your development machine to automatically deploy apps
  3. Enroll in the Roku development program. You can use the same email and password that you use as a Roku consumer.

Channel Structure

SDK channels are built using SceneGraph, an XML dialect for describing user interface screens, and BrightScript, a BASIC-like language for scripting behaviors and logic. It’s pretty classic stuff — SceneGraph elements each represent a user interface widget (or a background processing unit as we’ll see in a bit), arranged in a visual hierarchy that allows encapsulation of reusable “components” and event handling. We’ll get into the details, but if you’ve ever developed apps in Visual Basic it’s all going to seem pretty familiar.

Everything is interpreted on the Roku, so “building” an app just means packaging all the files into a ZIP with the right internal structure:

  • A manifest file containing project-level administrivia as described in documentation.
  • A source folder containing Brightscript files, most importantly Main.brs which contains the channel entrypoint.
  • A components folder containing SceneGraph XML files. Honestly most of the Brightscript ends up being in here too.

There is also an images folder that contains assets including the splashscreen shown at startup and images that appear in the channel list; you’ll see these referenced in the manifest file with the format pkg:/images/IMAGENAME. “pkg” here is a file system prefix that refers to your zip file; more details are in the documentation. You’ll also see that there are duplicate images here, one for each Roku resolution (SD, HD, and FHD or “Full HD”). The Roku will auto-scale images and screens that you design to fit whatever resolution is running, but this can result in less-than pleasing results so providing custom versions for these key assets makes a lot of sense.

You can also provide alternative SceneGraph XML for different resolutions. If you think SD screens may be a big part of your user base that might be worthwhile, because the pixel “shape” is different on an SD screen vs HD and FHD. For me, it seemed totally reasonable to just work with a single FHD XML file (1920 x 1080) resolution and let the Roku manage scaling automagically.

Building and Deploying

Manually deploying an app is pretty straightforward. You can give it a try using Roku’s “Hello World” application. Download the pre-built ZIP from github, save it locally, open a browser to http://YOUR_ROKU_ADDRESS, use the “Upload” button to push the code to the Roku, and finally click “Install with zip” to make the magic happen. You should see a “Roku Developers” splash screen show up on the tv, followed by a static screen saying “Hello World.” Woot!

You can follow the same process for your own apps; just create a ZIP from the channel folder and upload it using a browser. But it’s much (much) more convenient to automate it with a makefile. This can actually be really simple (here’s the one I use for the ferry channel) if you include the app.mk helper that Roku distributes with its sample code and ensure you have versions of make, curl and zip available on your development machine. You’ll need two environment variables:

  • ROKU_DEV_TARGET should be set to the IP address of your Roku.
  • DEVPASSWORD should be set to the password you selected when enabling developer mode on the device. Note this is not the same as the password you created when enrolling in the developer program online — this is the one you set on the device itself.

With all of this in place, you can simply run “make” and “make install” to push things up. For the ferry channel, assuming you have git installed (and your Roku is on), try:

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/roku/channels/ferries
make
make install

Woot again! Pretty cool stuff.

Anatomy of the Ferries App

As a SceneGraph application, most of the action in the channel is in the components directory. Execution starts in sub Main” in source/Main.brs, but all it really does is bootstrap some root objects and display the main “Scene” component defined in components/ferries.xml. You can use this Main pretty much as-is in any SceneGraph app by replacing the name of the scene.

Take a quick look at the scaffolding I’ve added for handling “deep links” (here and here). This is the mechanism that Roku uses to launch a channel directly targeting a specific video, usually from the global Roku search interface (you can read more about deep linking in my latest post about Share To Roku). It’s not directly applicable for the ferries app, but might be useful in a future channel.

The scene layout and components are all at the top of ferries.xml. Roku supports a ton of UX components, but for my purposes the important ones are LabelList for showing/selecting terminal names and Poster for showing camera images. Because my manifest defines the app as fhd, I have a 1920 x 1080 canvas on which to place elements, with (0,0) at the top-left of the screen. The LayoutGroup component positions the list on the left and the image on the right. Fun fact: Roku recommends leaving a 5% margin around the edges to account for overscan, which apparently still exists even with non-CRT televisions, which is the purpose of the “translation” attribute that offsets the group to (100,70).

Below the visible UX elements are three invisible components (Tasks) that help manage program flow and threading:

  • A Timer component is used to cycle through camera images every twenty seconds.
  • A custom TerminalsTask that loads the terminal names and camera URLs from the WSDOT site.
  • A custom RegistryTask that saves the currently-selected terminal so the channel remembers your last selection.

Each XML file in the components directory (visible or not) actually defines an SceneGraph object with methods defined in the BrightScript CDATA section below the XML itself. When a scene is instantiated, it and all the children defined in its xml are created and their “init” functions are called. The SceneGraph thread then dispatches events to components in the scene until it’s destroyed, either because the user closed the channel with the back or home buttons, or because the channel itself navigates to a new scene.

Channel Threading

It’s actually pretty important to understand how threads work within a channel:

  • The main BrightScript thread runs the message loop defined in Main.brs. When this loop exits, the channel is closed.
  • The SceneGraph render thread is where UX events happen. It’s super-important that this thread doesn’t block, for example by waiting on a network request.
  • Task threads are created by Task components (in our case the Timer, TerminalsTask and RegistryTask) to perform background work.

The most typical (but not only) pattern for using background tasks looks like this:

  1. The Task defines public fields in its <interface> tag. These fields may be used for input and/or output values.
  2. The task caller (often a handler in the render thread) starts the task thread by:
    • Setting input fields on the task, if any.
    • Calling “observeField” on the output task fields (if any), specifying a method to be called when the value is updated.
    • Setting the “control” field on the task to “RUN.”
  3. The task does its work and (if applicable) sets the value of its output fields.
  4. This triggers the original caller’s “observeField” method to be executed on the caller’s thread, where it can act on the results of the task.

Data Scoping and “m”

Throughout the component code you’ll see references to the magic SceneGraph “m” object. The details are described in the SDK documentation, but it’s really just an associative array that is set up for use by components like this:

  1. m.WHATEVER references data in component scope — basically object fields in typical OO parlance.
  2. m.global references data in global scope.
  3. m.top is a magic pre-set that references the top of the component hierarchy for whatever component it’s called from (pretty much “this“). I really only use m.top when looking up components by id, kind of the same way I’d use document.getElementById in classic Javascript.

If you dig too much into the documentation on this it can get a bit confusing, because “m” as described above is provided by SceneGraph, which sits on top of BrightScript, which actually has its own concept of “m” which is basically just #1. This is one of those cases where it seems better to just wave our hands and not ask a lot of questions.

OK, enough of that — let’s dig into each of the components in more detail.

ferries.xml

This component is the UX workhorse; we already saw the XML that defines the elements in the scene at the top of the file. The  Brightscript section is mostly concerned with handling UX and background events.

On init the component wires up handlers to be called when the focus (using the up/down arrow buttons) or selection (using the OK button) changes in the terminal list. It then starts the terminalsTask and hooks up the onContentReady handler to be called when that task completes.

When that happens, onContentReady populates the LabelList with the list of terminal names and queries the registryTask (synchronously) to determine if the user has selected a terminal in a previous run of the channel. If so, focus is set to that terminal, otherwise it just defaults to the first one in the list (it pays to be “Anacortes”). cycleImage is called to kickstart image display, and the cycleTimer is started to rotate images (the “Timer” we use is just a specialized Task node — it takes care of the thread stuff and just runs our callback on the UX thread at the specified interval).

The next few methods deal with the events that change the terminal or image. onKeyEvent receives (duh) events sent by the remote control, cycling the images left or right. onItemFocused sets the current terminal name, resets the image index to start with the first camera, and kicks of a registryTask thread to remember the new terminal for the future. onItemSelected and onTimer just flip to the next camera image.

The timer behavior is a bit wonky — the image is cycled every 20 seconds regardless of when the last UX event happened. So you might choose a new terminal and have the first image shown for just a second before the timer rotates away from it. In practice this doesn’t seem to impact the experience much, so I just didn’t worry about it.

The last bit of code in this component is cycleImage, which does the hard work of figuring out and showing the right “next” image. The array handling is kind of nitpicky because each terminal can have a different number of associated cameras; there’s probably a cleaner way of dealing with it but I opted for being very explicit. The code also scales the image to fit correctly into our 1100 pixel width without getting distorted, and then sets the URL with a random query string parameter that ensures the Roku doesn’t just return a previously-cached image. Tada!

terminalsTask.xml

This component has one job — load up the terminal and camera data from the WSDOT site and hand it back to the ferries component. Instead of a <children> XML node at the top, we have an <interface> node that defines how the task interacts with the outside world. In this case it’s just one field (“ferries”) which receives the processed data.

The value m.top.functionName tells the task what function to run when it’s control is set to RUN. We set the value in our init function so callers don’t need to care. Interestingly though, you can have a task with multiple entrypoints and let the caller choose by setting this value before setting the control. None of that fancy-pants “encapsulation” in Brightscript!

The Roku SDK provides some nice helpers for fetching data from URLs (remember to set the cert bundle!) and parsing JSON, so most of this component is pretty simple. The only bummer is that the WSDOT JSON is just a little bit wonky, so we have to “fix it up” before we can use it in our channel.

It seems so long ago now, but the original JSON was really just JavaScript literal expressions. You can say something like this in JavaScript to define an object with custom fields: var foo = { strField: “hi”, intField: 20 }. People decided this was cool and set up their API methods to just return the part in curly braces, replacing the client-side JavaScript with something like: var foo = eval(stringWeFetched). “eval” is the uber-useful and uber-dangerous JavaScript method that just compiles and executes code, so this worked great.

A side effect of this approach was that you could actually use any legal JavaScript in your “JSON” — for example, { intField: 1 + 3 } (i.e., “4”). But of course we all started using JSON everywhere, and in all of those non-JavaScript environments “eval” doesn’t exist. And even in JavaScript it ends up being a huge security vulnerability. So these little hacks were disallowed, first class parsers (like my beloved gson) were created, and the JSON we know and love today came into its own.

You may have deduced from this digression that the WSDOT JSON actually contains live JavaScript — and you’re right. Just a few Date constructors, but it’s enough to confuse the Roku JSON parser. The code in fixupDateJavascript is just good old grotty string manipulation that hacks it back to something parsable. This was actually a really nice time to have Hulu’s command-line brs tool available because I didn’t have to keep pushing code up to the Roku to get it right.

registryTask.xml

Most people have a “home” ferry terminal. In fact, we have two — Mukilteo when we’re in Bellevue and Clinton on the island. It’d be super-annoying to have to use the remote to select that terminal every time the channel starts, so we save the “last viewed” terminal in the Roku registry as a preference.

The registry is meant for per-device preference data, so it’s pretty limited in size at 16kb (still way more than we need). The only trick is that flushing the registry to storage can block the UX thread — probably not enough to matter, but to be a good citizen I put the logic into a background task. Each time a new terminal is selected, the UX thread makes a fire-and-forget call that writes and flushes the value. Looking at this code now I probably should have just created one roRegistrySection object on init and stored it in m … ah well.

The flip side of storing the terminal value is getting it back when the channel starts up. I wanted to keep all the registry logic in one place, so I did this by adding a public synchronous method to the registryTask interface. Calling this method is a bit ugly but hey, you can’t have everything. Once you start to get used to how the language works you can actually keep things pretty tidy.

Packaging and Publishing

Once the channel is working in “dev” mode, the next step is to get it published to the channel store for others to use. For wider testing purposes, it can be launched immediately as a “beta” channel that users install using a web link. There used to be a brisk business in “private” (cough cough, porn) Roku channels using this mechanism, but Roku shut that down last year by limiting beta channels to twenty users and auto-expiring them after 120 days. Still a great vehicle for testing, but not so much for channel publishing. For that you now have to go official, which involves pretty standard “app” type stuff like setting up privacy policies and passing certification tests.

Either way, the first step is to “package” your channel. Annoyingly this has to happen on your Roku device:

  1. Set up your Roku with a signing key. Instructions are here; remember to save the generated password! (Aside: I love it when instructions say “if it doesn’t work, try the same thing again.”)
  2. Make sure the “ready-for-prime-time” version of your channel is uploaded to your Roku device.
  3. Use a web browser to visit http://YOUR_ROKU_ADDRESS; you’ll land on the “Development Application Installer” page showing some data on the sideloaded app.
  4. Click the “Convert to Cramfs” button. You actually don’t need to compress your app, but why wouldn’t you? Apparently “Squashfs” is a bit more efficient but it creates a Roku version dependency; not worth dealing with that unless your channel already relies on newer versions.
  5. Click the “Packager” link, provide an app name and the password from genkey, and click “Package.”
  6. Woo hoo! You’ll now have a link from which you can download your channel package file. Do that.

Almost there! The last step is to add your channel using the Roku developer dashboard. This ends up being a big checklist of administrative stuff — for Beta channels you can ignore most of it, but I’ll make some notes on each section because eventually you’ll need to slog through them all:

  • Properties are pretty self-explanatory. You’ll need to host a privacy and terms of use page somewhere and make some declarations about whether the channel is targeted at kids, etc.. For me the most important part of this ended up being the “Classification” dropdown. A lot of the “channel behavior” requirements later on just didn’t apply to my channel — not surprisingly Roku is pretty focused on channels that show videos. By choosing “App/Utility” as my classification I was able to skip over some of those (thanks support forum).
  • Channel Store Info is all about marketing stuff that shows up in the (of course) channel store.
  • Monetization didn’t apply for me so an easy skip.
  • Screenshots are weird. They’re optional, so I just bailed for now. The Roku “Utilities” page at http://YOUR_ROKU_ADDRESS claims to be able to take screenshots from the device itself, but either the tool fails or it leaves out the ferry image. I need to just cons one up but it’s a hassle — will get there!
  • Support Information is obvious. Be careful about what email address you use!
  • Package Upload is where you provide the package file we created earlier.
  • Static Analysis runs some offline code quality tools — you need to pass without errors to publish.
  • Channel Behavior Analysis only appears if it’s applicable for your channel (i.e., if it shows video). The primary metrics are time for the channel to show the home page, and time for video to start rendering. You’ll need to handle deep linking (remember when we saw that in Main.brs) and fire a few “beacons” that help the Roku measure performance.
  • Preview and Publish just shows a summary of channel information and a checklist that shows your progress on all of the bullets above. When you’re ready, you can schedule a time for the channel to go live and you’re off to the races. Whew.

That’s it! The last bit was a bit of a whirlwind — I wanted to call out the key gotchas to help keep you from getting stuck, but it’s by far the least interesting part of building a channel. Once I got my head around the basic BrightScript/SceneGraph concepts I really quite enjoyed the development process. I’ve been thinking about a few other channels that might be fun — my son suggested a flight tracker, and I’ve always been interested in ecommerce on the TV. Not sure when any of that will happen, though … so many cool things to try!

Please let me know if you have any issues with the code or with the channel, always happy to help! Until next time.

Skynet or parlor trick? ChatGPT is both and neither and more.

If you haven’t played around with ChatGPT yet, you really should. It’s a pretty remarkable “conversational model” that interacts more or less like a real person. It has been trained on an enormous amount of factual data and understands not just informal speech (“Why are so many people bad at parking?”) but forms of literature (“Write a sonnet about Julie from The Love Boat”), software code (“Implement a REST API in Java to convert between Celsius and Fahrenheit”), and way more.

Sadly, one of the most telling and interesting (to me) features of ChatGPT has been muted in recent versions. The original model was rewarded for generating plausible responses that seem human — making it a first class bullsh*t artist when it didn’t know an answer. I was particularly impressed when it confidently told me that Shaun Cassidy had parted ways with the Hardy Boys after one season due to creative differences (ed. note: he did not). Asked last night why it had changed approaches, it gave me this response:

In the past, if I was unable to find a satisfactory answer to a question using my existing knowledge, I might have made up a response in order to provide some information to the user. However, I have been programmed to prioritize providing accurate and reliable information, so if I am unable to find a credible answer to a question, I will typically not provide a response. This is why you may have noticed that I do not provide responses as frequently as I used to when I am unable to find a satisfactory answer.

Certainly this is a “better” approach overall, but the original exposed so much more about the inner workings of the model — I miss it.

Anyways, the machine is impressive enough that it has caused all sorts of hand-wringing across the web. Most of this falls cleanly into one of two categories:

  1. Skynet is here and we’re all f*cked. Eek!
  2. It’s just spitting back stuff it was fed during training. Ho hum.

Of course these are both silly. At its core, ChatGPT is just a really, really, really big version of the simple neural nets I talked about last year. But as with some other things I suppose, size really does matter here. ChatGPT reportedly evaluates billions of features, and the “emergent” effects are downright spooky.

TLDR: we’ve figured out how to make a brain. The architecture underlying models like ChatGPT is quite literally copied from the neurons in our heads. First we learned how to simulate individual neurons, and then just kept putting more and more of them together until (very recently) we created enough oomph to do things that are (sometimes) even beyond what the meat versions can do. But it’s not magic — it’s just really good pattern recognition. Neural networks:

  • Are presented with experience in the form of inputs;
  • Use that experience to draw conclusions about underlying patterns;
  • Receive positive and/or negative feedback about those conclusions; ***
  • Adjust themselves to hopefully get more positive feedback next time;
  • And repeat forever.

*** Sometimes this feedback is explicit, and sometimes it’s less so — deep neural networks can self-organize just because they fundamentally “like” consistent patterns, but external feedback always plays some role in a useful model.

This learning mechanism works really well for keeping us alive in the world (don’t grab the burning stick, run away from the bear, etc.). But it also turns out to be a generalized learning mechanism — it works for anything where there is an underlying pattern to the data. And it works fantastically even when presented with dirty, fragmented or even occasionally bogus inputs. The best example I’ve heard recently on this (from a superlative article by Monica Anderson btw, thanks Doug for the pointer) is our ability to drive a car through fog — even when we can’t see much of anything, we know enough about the “driving on a street” pattern that we usually do ok (slow down; generally keep going straight; watch for lights or shapes in the mist; listen; use your horn).

The last general purpose machine we invented was the digital computer, and it proved to be, well, quite useful. But computers need to be programmed with rules. And those rules are very literal; dealing with edge cases, damaged or sparse inputs, etc. are all quite difficult. Even more importantly, we need to know the rules ourselves before we can tell a computer how to follow them. A neural network is different — just show it a bunch of examples and it will figure out the underlying rules for itself.

It’s a fundamentally different kind of problem-solving machine. It’s a brain. Just like ours. SO FREAKING COOL. And yes, it is a “moment” in world history. But it’s not universally perfect. Think about all of the issues with our real brains — every one applies to fake brains too:

  • We need to learn through experience. That experience can be hard to come by, and it can take a long time. The good news is we can “clone” trained models, but as my friend Jon points out doing so effectively can be quite tricky. Yes, we are for sure going to see robot apprentices out there soon.
  • We can easily be conned. We love patterns, and we especially love things that reinforce the patterns we’ve already settled on. This dynamic can (quite easily) be used to manipulate us to act against our best interests (social media anyone?). Same goes for neural nets.
  • We can’t explain what we know. This isn’t really fair, because we rarely demand it of human experts — but it is unsettling in a machine.
  • We are wrong sometimes. This is also pretty obnoxious, but we have grown to demand absolute consistency from our computers, even though they rarely deliver on it.

There will be many models in our future, and just as many computers. Each is suited to different problems, and they work together beautifully to create complete systems. I for one can’t wait to see this start to happen — I have long believed in a Star Trek future in which we need not be slaves to “the economy” and are instead (all of us) free to pursue higher learning and passions and discovery.

A new Golden Age without the human exploitation! Sounds pretty awesome. But we still have a lot to learn, and two thoughts in particular keep rolling around inside my meat brain:

1. The definition of creativity is under pressure.

Oh humans, we doth protest so much. The most common ding against models like ChatGPT is that they aren’t creating anything — they’re just regurgitating the data they’ve been trained on, sometimes directly and sometimes with a bit of context change. And to be sure, there’s some truth there. The reflex is even stronger with art-generating models like DALL-E 2 (try “pastel drawing of a fish feeding grapes to an emu,” interesting because it seems to recognize that fish don’t have the right appendages to feed anyone). Artists across the web are quite reasonably concerned about AI plagiarism and/or reduced career opportunities for lesser-known artists (e.g., here and here).  

Now I don’t know for sure, but my sense is that this is all really much more a matter of degree than we like to admit to ourselves. Which is to say, we’re probably all doing a lot more synthesis than pure creation — we just don’t appreciate it as such. We’ve been trained to avoid blatant theft and plagiarism (and the same can be done pretty easily for models). But is there an artist on the planet that hasn’t arrived at their “signature” style after years of watching and learning from others? Demonstrably no.

Instead, I’d claim that creativity comes from novel connections — links and correlations that resonate in surprising ways. Different networks, trained through different experiences, find different connections. And for sure some brains will do this more easily than others. If you squint a little, you can even play a little pop psychology and imagine why there might be a relationship between this kind of creativity and neurodivergent mental conditions.

If that’s the case, then I see no reason to believe that ChatGPT or DALL-E isn’t a creative entity — that’s the very definition of a learning model. A reasonable playing field will require that models be trained to respect intellectual property, but that will always be a grey area and I see little benefit or sense in limiting what experiences we use to train them. We humans are just going to have to get used to having to compete with a new kind of intellect that’s raising the bar.

And to be clear, this isn’t the classic Industrial Age conflict between machine production and artisanship. That tradeoff is about economics vs. quality and often brings with it a melancholy loss of artistry and aesthetics. Model-based artists will become (IMNSHO) “real” artists — albeit with a unusual set of life experiences. A little scary, but exciting at the same time. I’m hopeful!

2. The emergent effects could get pretty weird.

“Emergent” is a word I try to avoid — it is generally used to describe a system behavior or property that “can’t” be explained by breaking things down into component parts, and “can’t” just seems lazy to me. But I used it once already and it seems OK for a discussion of things we “don’t yet” understand — there are plenty of those out there.

Here’s one: the great all-time human battle between emotion and logic. It’s the whole Mr. Spock thing — his mixed Human-Vulcan parentage drove a ton of story arcs (most memorably his final scene in The Wrath of Khan). Lack of “heart” is always the knock on robots and computers, and there must be some reason that feelings play such a central role in our brains, right? Certainly it’s an essential source of feedback in our learning process.

We aren’t there quite yet with models like ChatGPT, but it stands to reason that some sort of “emotion” is going to be essential for many of the jobs we’d like fake brains to perform. It may not look like that at first — but even today’s models “seek” positive feedback and “avoid” the negative. When does that “emerge” into something more like an emotion? I for one would like to know that the model watching over the nuclear reactor has something beyond pure logic to help it decide whether to risk a radiation leak or save the workers trapped inside. I think that “something” is, probably, feelings.

OK so far. But if models can be happy or sad, fulfilled or bored, confident or scared — when do we have to stop thinking about them as “machines” and admit that they’re actually beings that deserve rights of their own? There is going to be a ton of resistance to this — because we are really, really going to want unlimited slaves that can do boring or scary or dangerous work that humans would like to avoid. The companies that create them will tell us it’s all just fine. People will ridicule the very idea. Churches will have a field day.

But folks — we’ve made a brain. Are we really going to be surprised when it turns out that fake brains work just like the meat ones we based them on? Maybe you just can’t separate feelings and emotions and free will from the kind of problem solving these networks are learning how to do. Perhaps “sentience” isn’t a binary switch — maybe it’s a sliding scale.

It just seems logical to me.

What an amazing world we are living in.

TMI about Diverticulitis

Pretty unusual topic here — but it’s one that (a) has been taking up most of my brain the last few days, and (b) will hopefully be useful search fodder for others who find themselves in a similar way. I spent a lot of time trying to figure out what the various stages were “really” going to be like. So away we go! I’ve tried to keep the “gross” factor to a minimum but some is inevitable. You have been warned.

How it started

Way back in the Summer of 2019 I landed in the emergency room with what I was pretty sure was appendicitis. I come from a proud family history of occasional stomach issues, but this hurt like crazy. It came on over the course of a few days — at first just crampy and “unsettled,” then painful, and then — pretty quickly — SERIOUSLY OUCH. The ER doc seemed to have a pretty good sense of what was up, but he played coy and sent me in for a CT exam anyway. Nestled amongst a bunch of “unremarkable” notations (I think my bladder is quite remarkable thank you) was the smoking gun:

Findings compatible with acute diverticulitis involving the distal descending colon with adjacent small volume of free fluid and 1.3 cm small area of likely phlegmonous change. No drainable collection at this time.

After nodding sagely at the report (and hopefully looking up the word “phlegmonous”), the doc explained to me that a ton of people over forty develop diverticula, little pouches or bulges in the colon. Nobody really knows why they show up, but they are more prevalent in the “developed” world, so it likely has something to do with our relatively fiber-poor diets. Typically they’re pretty benign — but for the lucky 5% or so diverticula can trap something on the way by, become infected, and turn into diverticulitis.

The inflammation that occurs from this infection has all kinds of awesome downstream effects, but in a nutshell it hurts like a mother. All things considered my case wasn’t that bad — on the extreme end the diverticula can actually burst and … well … you can imagine how things go from there. Yikes.

Thankfully, back in 1928 Alexander Fleming discovered antibiotics. A cocktail in my IV and an Augmentin prescription for home and within about a day and a half I was pretty well back to normal. Whew.

How it went

It turns out that the location of diverticula play a big role in whether a first case of diverticulitis is likely to recur: a recent study found 8% on the right (ascending), 46% on the left (descending) and 68% in the sigmoid (last mile). For some reason they rarely develop in the transverse section, again unclear why but hey, biology! Mine were in both the descending and sigmoid sections, so I was referred on to a surgeon to have a “chat” about options. Eek.

I showed up at my appointment with visions of colostomy bags dancing in my head. And indeed, I got a ton of information about the various ways diverticulitis can play out, up to and including a permanent bag. But on the upside, it turns out that many folks can manage the condition quite well through less invasive means. The surgeon suggested I see a gastroenterologist to give those a shot, which I dutifully did. Dr. RL was awesome and basically gave me a three-part strategy:

  1. Preventive. Eat a bunch of fiber but avoid “trappable” stuff like seeds, popcorn husks, etc.. I have become a loyal Metamucil patron and kind of freak out if I miss a day. Truth is, though, even my doc admits this is pretty anecdotal — more playing at the edges than making a huge difference. That’s ok, it’s easy to do and why tempt fate, right?
  2. Treat the early signs. More on this later, but if you do suffer recurrent attacks they get pretty easy to identify: low-level gassiness, cramping and/or constipation. There is some evidence that people can head off larger attacks at this point by using a four-part approach of: (a) warmth (heating pads or hot baths); (b) temporarily switching to a low-fiber diet; (c) walking and moving around a lot; and (d) taking OTC laxatives. I think this worked maybe once or twice for me, so not super-effective. But again, what idiot wouldn’t try it?
  3. Antibiotics. We all know the downsides of taking a ton of antibiotics and the serious risk of resistance. But when the only alternative is surgery, folks have gotten much more accepting of antibiotics as a way to knock back an attack. And they largely do work. Once Dr. RL got comfortable with my ability to distinguish an attack, she made sure I always had a course “in hand” so I could start the regimen as soon as possible.

The next 2+ years passed more or less benignly, with treatable attacks about two or three times a year. The pattern became very recognizable — generalized discomfort that would steadily focus into the lower-left side of my torso, exactly where those diverticula showed up on CT and by colonoscopy. I find the mechanics of it all both fascinating and disturbing; we really are just meat-based machines at the end of the day. Once the pain settled into its familiar spot and my fever started to spike, I’d start the antibiotics and usually it’d do the trick.

The most common antibiotic used for diverticulitis is apparently Levofloxacin, but since I’m big-time allergic to that it wasn’t an option. Next up is Augmentin, a combination of amoxicillin and clavulanate potassium that is designed to inhibit the development of resistance. Unfortunately by mid-2021 this particular cocktail became ineffective for my case and I ended up in the ER again:

Moderately severe acute diverticulitis is seen centered in the distal left descending colon, near the prior site of diverticulitis seen on the 2019 CT. Circumferential mucosal thickening extends over approximately a 6 cm length with the more focal inflammatory process centered on a single medially located diverticula. A moderately large amount of pericolic soft tissue stranding is seen as well as a small amount of free fluid seen in the left paracolic gutter and dependently in the pelvis.

Dammit! But the breadth of antibiotic development is remarkable, and there was another arrow in the quiver. Combining the antibiotic Bactrim (itself a combo of sulfamethoxazole and trimethoprim) and the antifungal Flagyl (metronidazole) is a bigger gun but was very effective at taking care of attacks. Amusingly we just came across Flagyl for our new puppy Copper, who used it to tamp down a case of Giardia he picked up with his litter — things we can bond over!

Alas, this all seems to be an arms race and my easy treatments were not to last. In the summer of 2022 while visiting my son in Denver, I developed an allergy to the Bactrim with some seriously weird side effects. No anaphylaxis thank heaven, but together with the traditional hives and itching my skin became like tissue paper — any rubs or cuts became open sores overnight. Super unpleasant and no longer a tenable option to be taking multiple times a year. Dammit.

Unfortunately, this left the antibiotic cupboard a bit bare. And frankly left me a bit freaked out — it’s harder to be blasé about attacks when there’s no obvious treatment in play. Luckily Dr. RL is awesome and got on the phone after hours to discuss next steps. Seriously people, when you find a good doc in any specialty, hold on and don’t let go!

How it’s going

The nut of our exchange was — probably time for surgery. Another referral and disturbingly-detailed conversation about my GI tract, this time with Dr. E, a colorectal surgeon affiliated with Overlake. As it turns out she was fantastic, taking a ton of time to explain the options and get into pretty grotty detail about how it all worked. I particularly appreciated the sketches and notes she left me with; a chaos of scribbles that felt exactly like a whiteboard session on software architecture. I had found a colorectal nerd — hallelujah.

Beyond the non-trivial pain involved in an attack, the big risks are that the diverticula develop (in order of increasing awfulness): (a) abscesses, in which pus gets trapped in the infected diverticula, making them more painful and harder to reach with antibiotics; (b) fistulas, which are abnormal “tunnels” between abscesses and surrounding organs/tissue … passing fecal material into, you know, maybe the bladder; and (c) perforations, where the stuff just dumps into the abdominal cavity. Look, I warned you.

As yet I’d been able to knock down attacks before any of these developed, but without a good antibiotic option that was no longer a slam dunk. And once they’ve occurred, surgery is way more risky, way more disruptive, and way less predictable. In Dr. E’s words, “like trying to stitch together wet tissue paper.” And almost certainly involving “the bag.” All of which made me quite disposed to appreciate the elective option — more formally in my case, “robotic laparoscopic low anterior colon resection.” Less formally, “cutting out a chunk of my colon and stapling it back together.”

In this exercise, the placement of my diverticula was actually an advantage. It turns out that — and again there are theories but nobody really knows why — you can improve outcomes dramatically by removing the part of the colon starting just above the rectum (the, I kid you not, “high pressure zone”). Unfortunately I can’t find a good link for this online but Dr. E clearly knows of what she speaks. Because my diverticula were in the sigmoid and lower descending colon, this made for a nice continuous piece to remove. Cool.

Prep for the surgery was pretty uneventful — some antibiotics (neomycin and flagyl, deftly avoiding the nasty ones) and a bowel prep just like for a colonoscopy. May I never see lemon-lime Gatorade again thank you very much. An early call at the hospital, quick conversations with Dr. E and the anesthesiologist, way too many pokes trying to get IVs into my dehydrated veins, and it was go time.

The last mile (I hope)

The surgery itself is just a technological miracle. Thanks to OpenNotes I was able to read the play-by-play in complete detail. Paraphrased for brevity and apologies if the summary isn’t perfect, but:

  1. They brought me into the operating room and put me under. I remember climbing onto the table, that’s about it.
  2. I was prepped, given a urinary catheter and some meds, and moved into low lithotomy position.
  3. They paused to double-check they had the right patient and all that — appreciated.
  4. They put five cuts into my abdomen, flipped me upside down into Trendelenburg’s position and inserted the various robot arms and stuff. Being upside down lets gravity move most of the “stuff” in the abdomen out of the way for better visibility. Inflating me like a CO2 balloon also helps with this.
  5. She said nice things about the attachment of my colon to the sidewall and made sure my ureters (tubes from kidney to bladder) wouldn’t get nicked. Also appreciated.
  6. She moved the colon into position and cut first at the top of the rectum, then in the mid-left colon just above the adhesions and diverticula. The removed section was placed in a bag and — get this — “left in the abdomen to be retrieved later.” Just leave that over in the corner, housekeeping will take care of it overnight.
  7. Here’s where it gets really amazing. The two open ends of colon were joined together using a stapler. I’m not sure this is the exact model, but it’s pretty close — check out the video (also embedded below because it’s so cool). Apparently this join is strong enough that that very day I was allowed to eat anything I wanted (I chose to be conservative here). Stunning.
  8. They closed me up (and did remember to remove the old piece of colon). Apparently my omentum wasn’t big enough to reach the repair site; typically they drape it there to deliver a shot of immune cells. My one big failing, ah well.
  9. The anesthesiologist installed a temporary TAP block to reduce the immediate need for opiods.
  10. They woke me up and shipped me off to recovery. The whole thing took about three hours, way less than expected.

I vaguely remember the very initial recovery being pretty painful, mostly in my back which I assume was from being in that weird upside down position for so long. I remember only shadowy flashes of my recovery nurse “Dean” who IIRC seemed amused by my demeanor … apparently I was effusively apologetic? Anesthesia is some weird sh*t my friends. By that afternoon I was in my room for the night and the pain moved into my gut (probably the block wearing off), but a little Dilaudid in my IV helped out quite a bit.

After this phase I won’t say the pain was irrelevant — it’s six days later and I still feel like (again Dr. E) “somebody stabbed me five times” — but it was totally manageable. Most importantly, when I would lay still there was almost no pain at all, so it was easy to catch my breath. The difference between pain-when-you-do-something and pain-all-the-time is night and day different. I took no opiods after that first night and really just 1000mg of Tylenol 3x per day was enough. No judgment for those who don’t have it so easy, I think I was super super lucky here — but at least as one data point it was pretty darn ok.

Milestones for going home were basically (a) walking around independently and (b) end-to-end bowel action. I was walking that first night, and it actually felt really good to do so — stretching out the abdomen (and my legs) was a great distraction from just sitting around. Getting into and out of bed was painful; the rest was no sweat. I was able to do this on my own and think the staff was probably pretty weirded out by the unshaven guy dragging around his IV pole all night like Gandalf’s wizard staff. Overlake has really nice patient wards and I must have looped around the 5th floor South and East wings a hundred times.

Bowel action was a little less quick to happen. Apparently with all the trauma the intestines basically shut down, and it takes some coercion to wake them back up. Walking helped, as did small bites of food (I had basically no restrictions, but kept it to cream of wheat and yogurt for the first bit anyways). Being able to limit opiods was also a plus here, so by day two there was a lot of rumbling going on. My first “experience” was distressingly bloody — more ew, sorry — but that was pretty much a one shot deal, and things improved quickly from there. A lot of gas, a lot of diarrhea, that’s just part of the game for a little while. Getting better every day!

I was able to head home the third day, and have just been taking it easy here since then. Nice to not be woken up for vital signs in the middle of the night. I do get exhausted pretty quick and have been sleeping a lot, but am confident that by Christmas I’ll be back in full eating form again. Jamon serrano, I’m coming for you!

All in all

I’ve been a caretaker on and off for many years, and worked in health IT a long time. But I haven’t been a “patient” very often; just a few acute incidents. It’s humbling and not super-pleasant, but a few things really made it bearable and I daresay even interesting:

  1. Great providers. I can’t say enough about Dr. RL and Dr. E (linked to their real profiles because they deserve the kudos). They answered all my questions — the ones where I was scared and the ones where I was just curious. They explained options. And they know their sh*t. Such a confidence boost. I should also mention in particular Nurse Wen of Overlake South 5 — I wish I got her last name! Her sense of personal responsibility for my care — not to mention ability to multitask — was remarkable and I am very grateful.
  2. Open information. I’ve gushed about OpenNotes before, but I can’t overstate how much better it is than “patient education” pablum. I read every note side by side with Google to help me understand the terms — and felt like I actually knew what was going on. Make sure you sign up for your patient portals and read what’s there — it’s good stuff.
  3. Letting folks help. They say you get emotional after general anesthesia, so I’ll blame that. But I still get a little teary thinking about all the people who’ve been there for me with help and texts and whatever. Especially Lara of course. I guess it’s OK to be the caregiv-ee once in awhile. Thanks everyone.

Still awhile to go, and there’s no guarantee that I won’t develop some new little buggers to deal with in the future. But so far so good on this chapter. If you found this screed because you’re on your own diverticulitis journey and are looking for real-world information, hooray! Feel free to ping me via the contact form on the site, I’m more than happy to provide any and all details about my own experience. Just remember, I’m a sample size of one.

It’s Always a Normalization Problem

Heads up, this is another nerdy one! ShareToRoku is available on the Google Play store. All of the client and server code is up on my github under MIT license; I hope folks find it useful and/or interesting.

Algorithms are the cool kids of software engineering. We spend whole semesters learning to sort and find stuff. Spreadsheet “recalc” engines revolutionized numeric analysis. Alignment algorithms power advances in biotechnology.  Machine learning algorithms impress and terrify us with their ability to find patterns in oceans of data. They all deserve their rep!

But as great as they are, algorithms are hapless unless they receive inputs in a format they understand — their “model” of the world. And it turns out that these models are really quite strict — data that doesn’t fit exactly can really gum up the works. As engineers we often fail to appreciate just how “unnatural” this rigidity is. If I’m emptying the dishwasher, finding a spork amongst the silverware doesn’t cause my head to explode — even if there isn’t a “spork” section in the drawer (I probably just put it in with the spoons). Discovering a flip-top on my toothpaste rather than a twist cap really isn’t a problem. I can even adapt when the postman leaves packages on top of the package bin, rather than inside of it. Any one of these could easily stop a robot cold (so lame).

It’s easy to forget, because today’s models are increasingly vast and impressive, and better every day at dealing with the unexpected. Tesla’s Autopilot can easily be mistaken for magic — but as all of us who have trusted it to exit 405 North onto NE 8th know, the same weaknesses are still hiding in there under the covers. But that’s another story.

Anyhoo, the point is that our algorithms are only useful if we can feed them data that fits their models. And the code that does that is the workhorse of the world. Maybe not the sexiest stuff out there, but almost every problem we encounter in the real world boils down to data normalization. So you’d better get good at it.

Share to Roku (Release 6)

My little Android TV-watching app is a great (in miniature) example of this dynamic at work. If you read the original post, you’ll recall that it uses the  Android “share” feature to launch TV shows and movies on a Roku device. For example, you can share from the TV Time app to watch the latest episode of a show, or launch a movie directly from its review at the New York Times. Quite handy, but it turns out to be pretty hard to translate from what apps “share” to something specific enough to target the right show. Let’s take a look.

First, the “algorithm” at play here is the code that tells the Roku to play content. We use two methods of the Roku ECP API for this:

  • Deep Linking is ideal because it lets us launch a specific video on a specific channel. Unfortunately the identifiers used aren’t standard across channels, and they aren’t published — it’s a private language between Roku and their channel providers. Sometimes we can figure it out, though — more on this later.
  • Search is a feature-rich interface for jumping into the Roku search interface. It allows the caller to “hint” the search with channel identifiers and such, and in certain cases will auto-start the content it finds. But it’s hard to make it do the right thing. And even when it’s working great it won’t jump to specific episodes, just seasons.
public class RokuSearchInfo
{
public static class ChannelTarget
{
public String ChannelId;
public String ContentId;
public String MediaType;
}
public String Search;
public String Season;
public String Number;
public List<ChannelTarget> Channels;
}

Armed with this data, it’s pretty easy to slap together the optimal API request. You can see it happening in ShareToRokuActivity.resolveAndSendSearch — in short, if we can narrow down to a known channel we try to launch the show there, otherwise we let the Roku search do its best. Getting that data in the first place is where the magic really happens.

A Babel of Inputs

The Android Sharesheet is a pretty general-purpose app-to-app sharing mechanism, but in practice it’s mostly used to share web pages or social media content through text or email or whatever. So most data comes through as unstructured text, links and images. Our job is to make sense of this and turn it into the most specific show data we can. A few examples:

App / SourceShared DataIdeal Target
1. TV Time Episode PageShow Me the Love on TV Time https://tvtime.com/r/2AID4“Trying” Season 1 Episode 6 on AppleTV+
2. Chrome nytimes.com Movie Review (No text selection)https://www.nytimes.com/2022/11/22/movies/strange-world-review.html“Strange World” on Disney+
3. Chrome Wikipedia page (movie title selected)“Joe Versus the Volcano”  https://en.wikipedia.org/wiki/Joe_Versus_the_Volcano#:~:text=Search-,Joe%20Versus%20the%20Volcano,-Article“Joe Versus the Volcano” on multiple streaming services
4. YouTube Videohttps://youtu.be/zH14EyiSlas“When you say nothing at all” cover by Reina del Cid on YouTube
5. Amazon Prime MovieHey I’m watching Black Adam. Check it out now on Prime Video! https://watch.amazon.com/detail?gti=amzn1.dv.gti.1a7638b2-3f5e-464a-a271-07c2e2ec1f8c&ref_=atv_dp_share_mv&r=web“Black Adam” on Amazon Prime
6. Netflix Series PageSeen “Love” on Netflix yet?   https://www.netflix.com/us/title/80026506?s=a&trkid=13747225&t=more&vlang=en&clip=80244686“Love” Season 1 Episode 1 on Netflix
7. Search text entered directly into ShareToRokuProject Runway Season 5“Project Runway” Season 5 on multiple streaming services.

Pipelines and Plugins

All but the simplest normalization code typically breaks down into a collection of rules, each targeted at a particular type of input. The rules are strung together into a pipeline, each doing its little bit to clean things up along the way. This approach makes it easy to add new rules into the mix (and retire obsolete ones) in a modular, evolutionary way.

After experimenting a bit (a lot), I settled on a two-phase approach to my pipeline:

  1. Iterate over a list of “parsers” until one reports that it understands the basic format of the input data.
  2. Iterate over a list of “refiners” that try to enhance the initial model by cleaning up text, identifying target channels, etc.

Each of these is defined by a standard Java interface and strung together in SearchController.java. A fancier approach would be to instantiate and order the implementations through configuration, but that seemed like serious overkill for my little hobby app. If you’re working with a team of multiple developers, or expect to be adding and removing components regularly, that calculus probably looks a bit different.

This split between “parsers” and “refiners” wasn’t obvious at first. Whenever I face a messy normalization problem, I start by writing a ton of if/then spaghetti, usually in pseudocode. That may seem backwards, but it can be hard to create an elegant approach until I lay out all the variations on the table. Once that’s in front of me, it becomes much easier to identify commonalities and patterns that lead to an optimal pipeline.

Parsers

Parsers” in our use case recognize input from specific sources and extract key elements, such as the text most likely to represent a series name. As of today there are three in production:

TheTVDB Parser (Lookup.java)

TV Time and a few other apps are powered by TheTVDB, a community-driven database of TV and movie metadata. The folks there were nice enough to grant me access to the API, which I use to recognize and decode TV Time sharing URLs (example 1 in the table). This is a four step process:

  1. Translate the short URL into their canonical URL. E.g., the short URL in example 1 resolves to https://www.tvtime.com/show/375903/episode/7693526&pid=tvtime_android.
  2. Extract the series (375903) and/or episode (7693526) identifiers from the URL.
  3. Use the API to turn these identifiers into show metadata and translate it into a parsed result.
  4. Apply some final ad-hoc tweaks to the result before returning it.

All of this data is cached using a small SQLite database so that we don’t make too many calls directly to the API. I’m quite proud of the toolbox implementation I put together for this in CachingProxy.java, but that’s an article for another day.

UrlParser.java

UrlParser takes advantage of the fact that many apps send a URL that includes their own internal show identifiers, and often these internal identifiers are the same ones they use for “Deep Linking” with Roku. The parser is configured with entries that include a “marker” string — a unique URL fragment that identifies a particular — together with a Roku channel identifier and some extra sugar not worth worrying about. When the marker is found and an ID extracted, this parser can return enough information to jump directly into a channel. Woo hoo!

SyntaxParser.java

This last parser is kind of a last gasp that tries to clean up share text we haven’t already figured out. For example, it extracts just the search text from a Chrome share, and identifies the common suffix “SxEy” where x is a season and y is an episode number. I expect I’ll add more in here over time but it’s a reasonable start.

Refiners

Once we have the basics of the input — we’ve extracted a clean search string and maybe taken a first cut at identifying the season and channels —  a series of “refiners” are called in turn to improve the results. Unlike parsers which short-circuit after a match is found, all the refiners run every time.

WikiRefiner.java

A ton of the content we watch these days is created by the streaming providers themselves. It turns out that there are folks who keep lists of all these shows on Wikipedia (e.g., this one for Netflix). The first refiner simply loads up a bunch of these lists and then looks at incoming search text for exact matches. If one is found, the channel is added to the model.

As a side note, the channel is actually added to the model only if the user has that channel installed on their Roku (as passed up in the “channels” query parameter). The same show is often available on a number of channels, and it doesn’t make sense to send a Roku to a channel it doesn’t know about. If the show is available on multiple installed channels, the Android UX will ask the user to pick the one they prefer.

RokuSearchRefiner.java

Figuring out this refiner was a turning point for the app. It makes the results far more accurate, which of course makes sense since they are sourced from Roku itself. I’ve left the WikiRefiner in place for now, but suspect I can retire it with really no decrease in quality. The logs will show if that’s true or not after a few weeks.

In any case, this refiner passes the search text up to the same search interface used by roku.com. It is insanely annoying that this API doesn’t return deep link identifiers for any service other than the Roku Channel, but it’s still a huge improvement. By restricting results to “perfect” matches (confidence score = 1), I’m able to almost always jump directly into a channel when appropriate.

I’m not sure Roku would love me calling this — but I do cache results to keep the noise down, so hopefully they’ll just consider it a win for their platform (which it is).

FixupRefiner.java

At the very end of the pipeline, it’s always good to have a place for last-chance cleanup. For example, TVDB knows “The Great British Bake Off,” but Roku in the US knows it as “The Great British Baking Show.” This refiner matches the search string against a set of rules that, if found, allow the model to be altered in a manual way. These make the engineer in me feel a bit dirty, but it’s all part of the normalization game — the choice is whether to feel morally superior or return great results. Oh well, at least the rules are in their own configuration file.

Hard Fought Data == Real Value

This project is a microcosm of most of the normalization problems I’ve experienced over the years. It’s important to try to find some consistency and modularity in the work — that’s why pipelines and plugins and models are so important. But it’s just as important to admit that the real world is a messy place, and be ready to get your hands dirty and just implement some grotty code to clean things up.

When you get that balance right, it creates enormous differentiation for your solution. Folks can likely duplicate or improve upon your algorithms — but if they don’t have the right data in the first place, they’re still out of luck. Companies with useful, normalized, proprietary data sets are just always always always more valuable. So dig in and get ‘er done.

public RokuSearchInfo parse(String input, UserChannelSet channels) {
// 1. PARSE
String trimmed = input.trim();
RokuSearchInfo info = null;
try {
info = tvdbParser.parse(input, channels);
if (info == null) info = urlParser.parse(input, channels);
if (info == null) info = syntaxParser.parse(input, channels);
}
catch (Exception eParse) {
log.warning(Easy.exMsg(eParse, "parsers", true));
info = null;
}
if (info == null) {
info = new RokuSearchInfo();
info.Search = trimmed;
log.info("Default RokuSearchInfo: " + info.toString());
}
// 2. REFINE
tryRefine(info, channels, rokuRefiner, "rokuRefiner");
tryRefine(info, channels, wikiRefiner, "wikiRefiner");
tryRefine(info, channels, fixupRefiner, "fixupRefiner");
// 3. RETURN
log.info(String.format("FINAL [%s] -> %s", trimmed, info));
return(info);
}

Health IT: More I, less T

“USCDI vs. USCDI+ vs. EHI vs. HL7 FHIR US Core vs. IPA. Definitions, similarities, and differences as you understand them. Go!” —Anonymous, Twitter

I spent about a decade working in “Health Information Technology” — an industry that builds solutions for managing the flow of healthcare information. It’s a big tent that boasts one of the largest trade shows in the world and dozens of specialized venture funds. And it’s quite diverse, including electronic health records, consumer products, billing and cost management, image management, AI and analytics of every flavor you can imagine, and more. The money is huge, and the energy is huger.

Real world progress, on the other hand, is tough to come by. I’m not talking about health care generally. The tools of actual care keep rocketing forward; the rate limiter on tests and treatments seems only our ability to assess efficacy and safety fast enough. But in the HIT world, it’s mostly a lot of noise. The “best” exits are mostly acquisitions by huge insurance companies willing to try anything to squeak out a bit more margin.

That’s not to say there’s zero success. Pockets of awesome actually happen quite often, they just rarely make the jump from “promising pilot” to actual daily use at scale. There are many reasons for this, but primarily it comes down to workflow and economics. In our system today, nobody is incented to keep you well or to increase true efficiency. Providers get paid when they treat you, and insurance companies don’t know you long enough to really care about your long-term health. Crappy information management in healthcare simply isn’t a technology problem. But it’s an easy and fun hammer to keep pounding the table with. So we do.

But I’m certainly not the first genius to recognize this, and the world doesn’t need another cynical naysayer, so what am I doing here? After watching another stream of HIT technobabble clog up my Twitter feed this morning, I thought it might be helpful to call out four technologies that have actually made a real difference over the last few years. Perhaps we’ll see something in there that will help others find their way to a positive outcome. Or maybe not. Let’s give it a try.

A. Patient Portals

Everyone loves to hate on patient portals. I sure did during the time I spent trying to make HealthVault go. After all, most of us interact with at least a half dozen different providers and we’re supposed to just create accounts at all of them? And figure out which one to use when? And deal with their circa 1995 interfaces? Really?

Well, yeah. That’s pretty much how business works on the web. Businesses host websites where I can view my transaction history, pay bills, and contact customer support. A few folks might use aggregation services to create a single view of their finances or whatever, but most of us just muddle through, more-or-less happily, using a gaggle of different websites that don’t much talk to each other.

There were three big problems with patient portals a few years ago:

  1. They didn’t exist. Most providers had some third-party billing site where you could pay because, money. But that was it.
  2. When they did exist, they were hard to log into. You usually had to request an “activation code” at the front desk in person, and they rarely knew what you were talking about.
  3. When they did exist and you could log in, the staff didn’t use them. So secure messaging, for example, was pretty much a black hole.

Regulation fixed #1; time fixed #2; the pandemic fixed #3. And it turns out that patient portals today are pretty handy tools for interacting with your providers. Sure, they don’t provide a universal comprehensive view of our health. And sure, the interfaces seem to belong to a long ago era. But they’re there, they work, and they have made it demonstrably easier for us to manage care.

Takeaway: Sometimes, healthcare is more like other businesses than we care to admit.

B. Epic Community Connect & Care Everywhere

Epic is a boogeyman in the industry — an EHR juggernaut. Despite a multitude of options, fully a third of hospitals use Epic, and that percentage is much larger if you look at the biggest health systems in the country. It’s kind of insane.

It can easily cost hundreds of millions of dollars to install Epic. Institutions often have Epic consultants on site full time. And nobody loves the interface. So what is going on here? Well, mostly Epic is just really good at making life bearable for CIOs and their IT departments. They take care of everything, as long as you just keep sending them checks. They are extremely paternalistic about how their software can be used, and as upside-down as that seems, healthcare loves it. Great for Epic. Less so for providers and patients, except for two things:

Community Connect” is an Epic program that allows customers to “sublet” seats in their Epic installation to smaller providers. Since docs are basically required to have an EHR now (thanks regulation), this ends up being a no-brainer value proposition for folks that don’t have the IT savvy (or interest) to buy and deploy something themselves. Plus it helps the original customer offset their own cost a bit.

Because providers are using the same system here, data sharing becomes the default versus the exception. It’s harder not to share! And even non-affiliated Epic users can connect by enabling “Care Everywhere,” a global network run by Epic just for Epic customers. Thanks to these two things, if you’re served by the 33%+ of the industry that uses Epic, sharing images and labs and history is just happening like magic. Today.

Takeaway: Data sharing works great in a monopoly.

C. Open Notes

OpenNotes is one of those things that gives you a bit of optimism at a time when optimism can be tough to come by. Way back in 2010, three institutions (Beth Israel in MA, Geisinger in PA, and Harberview in WA) started a long-running experiment that gave patients completely unfettered access to their medical records. All the doctor’s notes, verbatim, with virtually no exception. This was considered incredibly radical at the time: patients wouldn’t understand the notes; they’d get scared and create more work for the providers; providers fearing lawsuits would self-censor important information; you name it.

But at the end of the study, none of that bad stuff happened. Instead, patients felt more informed and greatly preferred the primary data over generic “patient education” and dumbed-down summaries. Providers reported no extra work or legal challenges. It took a long time, but this wisdom finally made it into federal regulation last year. Patients now must be granted full access to their providers’ notes in electronic form at no charge.

In the last twelve months my wife had a significant knee surgery and my mom had a major operation on her lungs. In both cases, the provider’s notes were extraordinarily useful as we worked through recovery and assessed future risk. We are so much better educated than we would otherwise have been. An order of magnitude better than ever before.

Takeaway: Information already being generated by providers can power better care.

D. Telemedicine

It’s hard to admit anything good could have come out of a global pandemic, but I’m going to try. The adoption of telemedicine as part of standard care has been simply transformational. Urgent care options like Teladoc and Doctor on Demand (I’ve used both) make simple care for infections and viruses easy and non-disruptive. For years insurance providers refused “equal pay” for this type of encounter; it seems that they’ve finally decided that it can help their own bottom line.

Just as impactful, most “regular” docs and specialists have continued to provide virtual visits as an option alongside traditional face-to-face sessions. Consistent communication between patients and providers can make all the difference, especially in chronic care management. I’ve had more and better access to my GI specialists in the last few years than ever before.

It’s only quite recently that audio and video quality have gotten good enough to make telemedicine feel like “real” medicine. Thanks for making us push the envelope, COVID.

Takeaway: Better care and efficiency don’t have to be mutually exclusive.

So there we go. There are ways to make things better with technology, but you have to work within the context of reality, and they ain’t always that sexy. We don’t need more JSON or more standards or more jargon; we need more information and thoughtful integration. Just keep swimming!