Roku Channel SDK: Ferry Cameras!

Lara and I shuttle regularly between Bellevue and Whidbey Island in Washington, so the Mukilteo-Clinton ferry is a big part of our life. WA actually runs the largest ferry system in the USA, with 28 boats tooting around the Puget Sound area. Super fun day trips all over the place, and the ships are pretty cool — there’s even a contracting process open right now to start converting the fleet to hybrid electric. Woot! But it can get pretty crowded — at peak summer times you can easily wait three hours to get on a boat. Recent staffing challenges have been a double-whammy and can make planning a bit tough. On the upside, a friend-of-a-friend apparently does a brisk business selling WTF (“Where’s the Ferry?”) merchandise.

Anyways, picking the right time to make the crossing is a bit of an art and requires some flexibility. We often will just plan to go “sometime after lunch,” pack up the car, and keep one eye on the live camera feeds watching for a break in the line. It occurred to me that having these cameras up on our TV would be more convenient than having to keep pulling my phone out of my pocket. Thus was born the “Washington Ferry Cameras” Roku channel, which I’ve published in the channel store and is free for anyone to use. Just search the store for “ferry” and it’ll pop up.

The rest of this article is just nerdstuff — the code is up on github and I’ll walk through the process of building and publishing for Roku. Enjoy!

The Roku Developer SDK

There are two ways to build a Roku channel: Direct Publisher and the Developer SDK. Direct Publisher is a no-code platform intended for channels that show live or on-demand videos from a structured catalog. You basically just provide a JSON feed describing the video content and all of the user experience is provided by Roku. It’s a pretty sweet system actually, making it easy for publishers and ensuring that users have a consistent streaming experience across channels.

The Developer SDK is meant for channels that do something other than just streaming video. There are tons of these “custom channels” out there — games and tools and whatnot. My ferry app clearly falls into this category, because there isn’t any video to be found and the UX is optimized for quickly scanning camera images. So that’s what I’ll be talking about here.

Roku SDK apps can be built with any text editor, and you can test/prototype BrightScript on most computers using command-line tools created by Hulu. But to actually run and package/publish apps for real you’ll need a Roku device of some sort. This page has all the details on enabling “developer mode” on the Roku. In short:

  1. Use the magic remote key combo (home + home + home + up + up + right + left + right + left + right) and follow the instructions that pop up.
  2. Save the IP address shown for your device. You’ll use it in a few ways:
    • Packaging and managing apps using the web-based tools at http://YOUR_ROKU_ADDRESS
    • Connecting to YOUR_ROKU_ADDRESS port 8085 with telnet or Putty to view logging output and debug live; details are here.
    • Configuring your development machine to automatically deploy apps
  3. Enroll in the Roku development program. You can use the same email and password that you use as a Roku consumer.

Channel Structure

SDK channels are built using SceneGraph, an XML dialect for describing user interface screens, and BrightScript, a BASIC-like language for scripting behaviors and logic. It’s pretty classic stuff — SceneGraph elements each represent a user interface widget (or a background processing unit as we’ll see in a bit), arranged in a visual hierarchy that allows encapsulation of reusable “components” and event handling. We’ll get into the details, but if you’ve ever developed apps in Visual Basic it’s all going to seem pretty familiar.

Everything is interpreted on the Roku, so “building” an app just means packaging all the files into a ZIP with the right internal structure:

  • A manifest file containing project-level administrivia as described in documentation.
  • A source folder containing Brightscript files, most importantly Main.brs which contains the channel entrypoint.
  • A components folder containing SceneGraph XML files. Honestly most of the Brightscript ends up being in here too.

There is also an images folder that contains assets including the splashscreen shown at startup and images that appear in the channel list; you’ll see these referenced in the manifest file with the format pkg:/images/IMAGENAME. “pkg” here is a file system prefix that refers to your zip file; more details are in the documentation. You’ll also see that there are duplicate images here, one for each Roku resolution (SD, HD, and FHD or “Full HD”). The Roku will auto-scale images and screens that you design to fit whatever resolution is running, but this can result in less-than pleasing results so providing custom versions for these key assets makes a lot of sense.

You can also provide alternative SceneGraph XML for different resolutions. If you think SD screens may be a big part of your user base that might be worthwhile, because the pixel “shape” is different on an SD screen vs HD and FHD. For me, it seemed totally reasonable to just work with a single FHD XML file (1920 x 1080) resolution and let the Roku manage scaling automagically.

Building and Deploying

Manually deploying an app is pretty straightforward. You can give it a try using Roku’s “Hello World” application. Download the pre-built ZIP from github, save it locally, open a browser to http://YOUR_ROKU_ADDRESS, use the “Upload” button to push the code to the Roku, and finally click “Install with zip” to make the magic happen. You should see a “Roku Developers” splash screen show up on the tv, followed by a static screen saying “Hello World.” Woot!

You can follow the same process for your own apps; just create a ZIP from the channel folder and upload it using a browser. But it’s much (much) more convenient to automate it with a makefile. This can actually be really simple (here’s the one I use for the ferry channel) if you include the app.mk helper that Roku distributes with its sample code and ensure you have versions of make, curl and zip available on your development machine. You’ll need two environment variables:

  • ROKU_DEV_TARGET should be set to the IP address of your Roku.
  • DEVPASSWORD should be set to the password you selected when enabling developer mode on the device. Note this is not the same as the password you created when enrolling in the developer program online — this is the one you set on the device itself.

With all of this in place, you can simply run “make” and “make install” to push things up. For the ferry channel, assuming you have git installed (and your Roku is on), try:

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/roku/channels/ferries
make
make install

Woot again! Pretty cool stuff.

Anatomy of the Ferries App

As a SceneGraph application, most of the action in the channel is in the components directory. Execution starts in sub Main” in source/Main.brs, but all it really does is bootstrap some root objects and display the main “Scene” component defined in components/ferries.xml. You can use this Main pretty much as-is in any SceneGraph app by replacing the name of the scene.

Take a quick look at the scaffolding I’ve added for handling “deep links” (here and here). This is the mechanism that Roku uses to launch a channel directly targeting a specific video, usually from the global Roku search interface (you can read more about deep linking in my latest post about Share To Roku). It’s not directly applicable for the ferries app, but might be useful in a future channel.

The scene layout and components are all at the top of ferries.xml. Roku supports a ton of UX components, but for my purposes the important ones are LabelList for showing/selecting terminal names and Poster for showing camera images. Because my manifest defines the app as fhd, I have a 1920 x 1080 canvas on which to place elements, with (0,0) at the top-left of the screen. The LayoutGroup component positions the list on the left and the image on the right. Fun fact: Roku recommends leaving a 5% margin around the edges to account for overscan, which apparently still exists even with non-CRT televisions, which is the purpose of the “translation” attribute that offsets the group to (100,70).

Below the visible UX elements are three invisible components (Tasks) that help manage program flow and threading:

  • A Timer component is used to cycle through camera images every twenty seconds.
  • A custom TerminalsTask that loads the terminal names and camera URLs from the WSDOT site.
  • A custom RegistryTask that saves the currently-selected terminal so the channel remembers your last selection.

Each XML file in the components directory (visible or not) actually defines an SceneGraph object with methods defined in the BrightScript CDATA section below the XML itself. When a scene is instantiated, it and all the children defined in its xml are created and their “init” functions are called. The SceneGraph thread then dispatches events to components in the scene until it’s destroyed, either because the user closed the channel with the back or home buttons, or because the channel itself navigates to a new scene.

Channel Threading

It’s actually pretty important to understand how threads work within a channel:

  • The main BrightScript thread runs the message loop defined in Main.brs. When this loop exits, the channel is closed.
  • The SceneGraph render thread is where UX events happen. It’s super-important that this thread doesn’t block, for example by waiting on a network request.
  • Task threads are created by Task components (in our case the Timer, TerminalsTask and RegistryTask) to perform background work.

The most typical (but not only) pattern for using background tasks looks like this:

  1. The Task defines public fields in its <interface> tag. These fields may be used for input and/or output values.
  2. The task caller (often a handler in the render thread) starts the task thread by:
    • Setting input fields on the task, if any.
    • Calling “observeField” on the output task fields (if any), specifying a method to be called when the value is updated.
    • Setting the “control” field on the task to “RUN.”
  3. The task does its work and (if applicable) sets the value of its output fields.
  4. This triggers the original caller’s “observeField” method to be executed on the caller’s thread, where it can act on the results of the task.

Data Scoping and “m”

Throughout the component code you’ll see references to the magic SceneGraph “m” object. The details are described in the SDK documentation, but it’s really just an associative array that is set up for use by components like this:

  1. m.WHATEVER references data in component scope — basically object fields in typical OO parlance.
  2. m.global references data in global scope.
  3. m.top is a magic pre-set that references the top of the component hierarchy for whatever component it’s called from (pretty much “this“). I really only use m.top when looking up components by id, kind of the same way I’d use document.getElementById in classic Javascript.

If you dig too much into the documentation on this it can get a bit confusing, because “m” as described above is provided by SceneGraph, which sits on top of BrightScript, which actually has its own concept of “m” which is basically just #1. This is one of those cases where it seems better to just wave our hands and not ask a lot of questions.

OK, enough of that — let’s dig into each of the components in more detail.

ferries.xml

This component is the UX workhorse; we already saw the XML that defines the elements in the scene at the top of the file. The  Brightscript section is mostly concerned with handling UX and background events.

On init the component wires up handlers to be called when the focus (using the up/down arrow buttons) or selection (using the OK button) changes in the terminal list. It then starts the terminalsTask and hooks up the onContentReady handler to be called when that task completes.

When that happens, onContentReady populates the LabelList with the list of terminal names and queries the registryTask (synchronously) to determine if the user has selected a terminal in a previous run of the channel. If so, focus is set to that terminal, otherwise it just defaults to the first one in the list (it pays to be “Anacortes”). cycleImage is called to kickstart image display, and the cycleTimer is started to rotate images (the “Timer” we use is just a specialized Task node — it takes care of the thread stuff and just runs our callback on the UX thread at the specified interval).

The next few methods deal with the events that change the terminal or image. onKeyEvent receives (duh) events sent by the remote control, cycling the images left or right. onItemFocused sets the current terminal name, resets the image index to start with the first camera, and kicks of a registryTask thread to remember the new terminal for the future. onItemSelected and onTimer just flip to the next camera image.

The timer behavior is a bit wonky — the image is cycled every 20 seconds regardless of when the last UX event happened. So you might choose a new terminal and have the first image shown for just a second before the timer rotates away from it. In practice this doesn’t seem to impact the experience much, so I just didn’t worry about it.

The last bit of code in this component is cycleImage, which does the hard work of figuring out and showing the right “next” image. The array handling is kind of nitpicky because each terminal can have a different number of associated cameras; there’s probably a cleaner way of dealing with it but I opted for being very explicit. The code also scales the image to fit correctly into our 1100 pixel width without getting distorted, and then sets the URL with a random query string parameter that ensures the Roku doesn’t just return a previously-cached image. Tada!

terminalsTask.xml

This component has one job — load up the terminal and camera data from the WSDOT site and hand it back to the ferries component. Instead of a <children> XML node at the top, we have an <interface> node that defines how the task interacts with the outside world. In this case it’s just one field (“ferries”) which receives the processed data.

The value m.top.functionName tells the task what function to run when it’s control is set to RUN. We set the value in our init function so callers don’t need to care. Interestingly though, you can have a task with multiple entrypoints and let the caller choose by setting this value before setting the control. None of that fancy-pants “encapsulation” in Brightscript!

The Roku SDK provides some nice helpers for fetching data from URLs (remember to set the cert bundle!) and parsing JSON, so most of this component is pretty simple. The only bummer is that the WSDOT JSON is just a little bit wonky, so we have to “fix it up” before we can use it in our channel.

It seems so long ago now, but the original JSON was really just JavaScript literal expressions. You can say something like this in JavaScript to define an object with custom fields: var foo = { strField: “hi”, intField: 20 }. People decided this was cool and set up their API methods to just return the part in curly braces, replacing the client-side JavaScript with something like: var foo = eval(stringWeFetched). “eval” is the uber-useful and uber-dangerous JavaScript method that just compiles and executes code, so this worked great.

A side effect of this approach was that you could actually use any legal JavaScript in your “JSON” — for example, { intField: 1 + 3 } (i.e., “4”). But of course we all started using JSON everywhere, and in all of those non-JavaScript environments “eval” doesn’t exist. And even in JavaScript it ends up being a huge security vulnerability. So these little hacks were disallowed, first class parsers (like my beloved gson) were created, and the JSON we know and love today came into its own.

You may have deduced from this digression that the WSDOT JSON actually contains live JavaScript — and you’re right. Just a few Date constructors, but it’s enough to confuse the Roku JSON parser. The code in fixupDateJavascript is just good old grotty string manipulation that hacks it back to something parsable. This was actually a really nice time to have Hulu’s command-line brs tool available because I didn’t have to keep pushing code up to the Roku to get it right.

registryTask.xml

Most people have a “home” ferry terminal. In fact, we have two — Mukilteo when we’re in Bellevue and Clinton on the island. It’d be super-annoying to have to use the remote to select that terminal every time the channel starts, so we save the “last viewed” terminal in the Roku registry as a preference.

The registry is meant for per-device preference data, so it’s pretty limited in size at 16kb (still way more than we need). The only trick is that flushing the registry to storage can block the UX thread — probably not enough to matter, but to be a good citizen I put the logic into a background task. Each time a new terminal is selected, the UX thread makes a fire-and-forget call that writes and flushes the value. Looking at this code now I probably should have just created one roRegistrySection object on init and stored it in m … ah well.

The flip side of storing the terminal value is getting it back when the channel starts up. I wanted to keep all the registry logic in one place, so I did this by adding a public synchronous method to the registryTask interface. Calling this method is a bit ugly but hey, you can’t have everything. Once you start to get used to how the language works you can actually keep things pretty tidy.

Packaging and Publishing

Once the channel is working in “dev” mode, the next step is to get it published to the channel store for others to use. For wider testing purposes, it can be launched immediately as a “beta” channel that users install using a web link. There used to be a brisk business in “private” (cough cough, porn) Roku channels using this mechanism, but Roku shut that down last year by limiting beta channels to twenty users and auto-expiring them after 120 days. Still a great vehicle for testing, but not so much for channel publishing. For that you now have to go official, which involves pretty standard “app” type stuff like setting up privacy policies and passing certification tests.

Either way, the first step is to “package” your channel. Annoyingly this has to happen on your Roku device:

  1. Set up your Roku with a signing key. Instructions are here; remember to save the generated password! (Aside: I love it when instructions say “if it doesn’t work, try the same thing again.”)
  2. Make sure the “ready-for-prime-time” version of your channel is uploaded to your Roku device.
  3. Use a web browser to visit http://YOUR_ROKU_ADDRESS; you’ll land on the “Development Application Installer” page showing some data on the sideloaded app.
  4. Click the “Convert to Cramfs” button. You actually don’t need to compress your app, but why wouldn’t you? Apparently “Squashfs” is a bit more efficient but it creates a Roku version dependency; not worth dealing with that unless your channel already relies on newer versions.
  5. Click the “Packager” link, provide an app name and the password from genkey, and click “Package.”
  6. Woo hoo! You’ll now have a link from which you can download your channel package file. Do that.

Almost there! The last step is to add your channel using the Roku developer dashboard. This ends up being a big checklist of administrative stuff — for Beta channels you can ignore most of it, but I’ll make some notes on each section because eventually you’ll need to slog through them all:

  • Properties are pretty self-explanatory. You’ll need to host a privacy and terms of use page somewhere and make some declarations about whether the channel is targeted at kids, etc.. For me the most important part of this ended up being the “Classification” dropdown. A lot of the “channel behavior” requirements later on just didn’t apply to my channel — not surprisingly Roku is pretty focused on channels that show videos. By choosing “App/Utility” as my classification I was able to skip over some of those (thanks support forum).
  • Channel Store Info is all about marketing stuff that shows up in the (of course) channel store.
  • Monetization didn’t apply for me so an easy skip.
  • Screenshots are weird. They’re optional, so I just bailed for now. The Roku “Utilities” page at http://YOUR_ROKU_ADDRESS claims to be able to take screenshots from the device itself, but either the tool fails or it leaves out the ferry image. I need to just cons one up but it’s a hassle — will get there!
  • Support Information is obvious. Be careful about what email address you use!
  • Package Upload is where you provide the package file we created earlier.
  • Static Analysis runs some offline code quality tools — you need to pass without errors to publish.
  • Channel Behavior Analysis only appears if it’s applicable for your channel (i.e., if it shows video). The primary metrics are time for the channel to show the home page, and time for video to start rendering. You’ll need to handle deep linking (remember when we saw that in Main.brs) and fire a few “beacons” that help the Roku measure performance.
  • Preview and Publish just shows a summary of channel information and a checklist that shows your progress on all of the bullets above. When you’re ready, you can schedule a time for the channel to go live and you’re off to the races. Whew.

That’s it! The last bit was a bit of a whirlwind — I wanted to call out the key gotchas to help keep you from getting stuck, but it’s by far the least interesting part of building a channel. Once I got my head around the basic BrightScript/SceneGraph concepts I really quite enjoyed the development process. I’ve been thinking about a few other channels that might be fun — my son suggested a flight tracker, and I’ve always been interested in ecommerce on the TV. Not sure when any of that will happen, though … so many cool things to try!

Please let me know if you have any issues with the code or with the channel, always happy to help! Until next time.

Skynet or parlor trick? ChatGPT is both and neither and more.

If you haven’t played around with ChatGPT yet, you really should. It’s a pretty remarkable “conversational model” that interacts more or less like a real person. It has been trained on an enormous amount of factual data and understands not just informal speech (“Why are so many people bad at parking?”) but forms of literature (“Write a sonnet about Julie from The Love Boat”), software code (“Implement a REST API in Java to convert between Celsius and Fahrenheit”), and way more.

Sadly, one of the most telling and interesting (to me) features of ChatGPT has been muted in recent versions. The original model was rewarded for generating plausible responses that seem human — making it a first class bullsh*t artist when it didn’t know an answer. I was particularly impressed when it confidently told me that Shaun Cassidy had parted ways with the Hardy Boys after one season due to creative differences (ed. note: he did not). Asked last night why it had changed approaches, it gave me this response:

In the past, if I was unable to find a satisfactory answer to a question using my existing knowledge, I might have made up a response in order to provide some information to the user. However, I have been programmed to prioritize providing accurate and reliable information, so if I am unable to find a credible answer to a question, I will typically not provide a response. This is why you may have noticed that I do not provide responses as frequently as I used to when I am unable to find a satisfactory answer.

Certainly this is a “better” approach overall, but the original exposed so much more about the inner workings of the model — I miss it.

Anyways, the machine is impressive enough that it has caused all sorts of hand-wringing across the web. Most of this falls cleanly into one of two categories:

  1. Skynet is here and we’re all f*cked. Eek!
  2. It’s just spitting back stuff it was fed during training. Ho hum.

Of course these are both silly. At its core, ChatGPT is just a really, really, really big version of the simple neural nets I talked about last year. But as with some other things I suppose, size really does matter here. ChatGPT reportedly evaluates billions of features, and the “emergent” effects are downright spooky.

TLDR: we’ve figured out how to make a brain. The architecture underlying models like ChatGPT is quite literally copied from the neurons in our heads. First we learned how to simulate individual neurons, and then just kept putting more and more of them together until (very recently) we created enough oomph to do things that are (sometimes) even beyond what the meat versions can do. But it’s not magic — it’s just really good pattern recognition. Neural networks:

  • Are presented with experience in the form of inputs;
  • Use that experience to draw conclusions about underlying patterns;
  • Receive positive and/or negative feedback about those conclusions; ***
  • Adjust themselves to hopefully get more positive feedback next time;
  • And repeat forever.

*** Sometimes this feedback is explicit, and sometimes it’s less so — deep neural networks can self-organize just because they fundamentally “like” consistent patterns, but external feedback always plays some role in a useful model.

This learning mechanism works really well for keeping us alive in the world (don’t grab the burning stick, run away from the bear, etc.). But it also turns out to be a generalized learning mechanism — it works for anything where there is an underlying pattern to the data. And it works fantastically even when presented with dirty, fragmented or even occasionally bogus inputs. The best example I’ve heard recently on this (from a superlative article by Monica Anderson btw, thanks Doug for the pointer) is our ability to drive a car through fog — even when we can’t see much of anything, we know enough about the “driving on a street” pattern that we usually do ok (slow down; generally keep going straight; watch for lights or shapes in the mist; listen; use your horn).

The last general purpose machine we invented was the digital computer, and it proved to be, well, quite useful. But computers need to be programmed with rules. And those rules are very literal; dealing with edge cases, damaged or sparse inputs, etc. are all quite difficult. Even more importantly, we need to know the rules ourselves before we can tell a computer how to follow them. A neural network is different — just show it a bunch of examples and it will figure out the underlying rules for itself.

It’s a fundamentally different kind of problem-solving machine. It’s a brain. Just like ours. SO FREAKING COOL. And yes, it is a “moment” in world history. But it’s not universally perfect. Think about all of the issues with our real brains — every one applies to fake brains too:

  • We need to learn through experience. That experience can be hard to come by, and it can take a long time. The good news is we can “clone” trained models, but as my friend Jon points out doing so effectively can be quite tricky. Yes, we are for sure going to see robot apprentices out there soon.
  • We can easily be conned. We love patterns, and we especially love things that reinforce the patterns we’ve already settled on. This dynamic can (quite easily) be used to manipulate us to act against our best interests (social media anyone?). Same goes for neural nets.
  • We can’t explain what we know. This isn’t really fair, because we rarely demand it of human experts — but it is unsettling in a machine.
  • We are wrong sometimes. This is also pretty obnoxious, but we have grown to demand absolute consistency from our computers, even though they rarely deliver on it.

There will be many models in our future, and just as many computers. Each is suited to different problems, and they work together beautifully to create complete systems. I for one can’t wait to see this start to happen — I have long believed in a Star Trek future in which we need not be slaves to “the economy” and are instead (all of us) free to pursue higher learning and passions and discovery.

A new Golden Age without the human exploitation! Sounds pretty awesome. But we still have a lot to learn, and two thoughts in particular keep rolling around inside my meat brain:

1. The definition of creativity is under pressure.

Oh humans, we doth protest so much. The most common ding against models like ChatGPT is that they aren’t creating anything — they’re just regurgitating the data they’ve been trained on, sometimes directly and sometimes with a bit of context change. And to be sure, there’s some truth there. The reflex is even stronger with art-generating models like DALL-E 2 (try “pastel drawing of a fish feeding grapes to an emu,” interesting because it seems to recognize that fish don’t have the right appendages to feed anyone). Artists across the web are quite reasonably concerned about AI plagiarism and/or reduced career opportunities for lesser-known artists (e.g., here and here).  

Now I don’t know for sure, but my sense is that this is all really much more a matter of degree than we like to admit to ourselves. Which is to say, we’re probably all doing a lot more synthesis than pure creation — we just don’t appreciate it as such. We’ve been trained to avoid blatant theft and plagiarism (and the same can be done pretty easily for models). But is there an artist on the planet that hasn’t arrived at their “signature” style after years of watching and learning from others? Demonstrably no.

Instead, I’d claim that creativity comes from novel connections — links and correlations that resonate in surprising ways. Different networks, trained through different experiences, find different connections. And for sure some brains will do this more easily than others. If you squint a little, you can even play a little pop psychology and imagine why there might be a relationship between this kind of creativity and neurodivergent mental conditions.

If that’s the case, then I see no reason to believe that ChatGPT or DALL-E isn’t a creative entity — that’s the very definition of a learning model. A reasonable playing field will require that models be trained to respect intellectual property, but that will always be a grey area and I see little benefit or sense in limiting what experiences we use to train them. We humans are just going to have to get used to having to compete with a new kind of intellect that’s raising the bar.

And to be clear, this isn’t the classic Industrial Age conflict between machine production and artisanship. That tradeoff is about economics vs. quality and often brings with it a melancholy loss of artistry and aesthetics. Model-based artists will become (IMNSHO) “real” artists — albeit with a unusual set of life experiences. A little scary, but exciting at the same time. I’m hopeful!

2. The emergent effects could get pretty weird.

“Emergent” is a word I try to avoid — it is generally used to describe a system behavior or property that “can’t” be explained by breaking things down into component parts, and “can’t” just seems lazy to me. But I used it once already and it seems OK for a discussion of things we “don’t yet” understand — there are plenty of those out there.

Here’s one: the great all-time human battle between emotion and logic. It’s the whole Mr. Spock thing — his mixed Human-Vulcan parentage drove a ton of story arcs (most memorably his final scene in The Wrath of Khan). Lack of “heart” is always the knock on robots and computers, and there must be some reason that feelings play such a central role in our brains, right? Certainly it’s an essential source of feedback in our learning process.

We aren’t there quite yet with models like ChatGPT, but it stands to reason that some sort of “emotion” is going to be essential for many of the jobs we’d like fake brains to perform. It may not look like that at first — but even today’s models “seek” positive feedback and “avoid” the negative. When does that “emerge” into something more like an emotion? I for one would like to know that the model watching over the nuclear reactor has something beyond pure logic to help it decide whether to risk a radiation leak or save the workers trapped inside. I think that “something” is, probably, feelings.

OK so far. But if models can be happy or sad, fulfilled or bored, confident or scared — when do we have to stop thinking about them as “machines” and admit that they’re actually beings that deserve rights of their own? There is going to be a ton of resistance to this — because we are really, really going to want unlimited slaves that can do boring or scary or dangerous work that humans would like to avoid. The companies that create them will tell us it’s all just fine. People will ridicule the very idea. Churches will have a field day.

But folks — we’ve made a brain. Are we really going to be surprised when it turns out that fake brains work just like the meat ones we based them on? Maybe you just can’t separate feelings and emotions and free will from the kind of problem solving these networks are learning how to do. Perhaps “sentience” isn’t a binary switch — maybe it’s a sliding scale.

It just seems logical to me.

What an amazing world we are living in.

TMI about Diverticulitis

Pretty unusual topic here — but it’s one that (a) has been taking up most of my brain the last few days, and (b) will hopefully be useful search fodder for others who find themselves in a similar way. I spent a lot of time trying to figure out what the various stages were “really” going to be like. So away we go! I’ve tried to keep the “gross” factor to a minimum but some is inevitable. You have been warned.

How it started

Way back in the Summer of 2019 I landed in the emergency room with what I was pretty sure was appendicitis. I come from a proud family history of occasional stomach issues, but this hurt like crazy. It came on over the course of a few days — at first just crampy and “unsettled,” then painful, and then — pretty quickly — SERIOUSLY OUCH. The ER doc seemed to have a pretty good sense of what was up, but he played coy and sent me in for a CT exam anyway. Nestled amongst a bunch of “unremarkable” notations (I think my bladder is quite remarkable thank you) was the smoking gun:

Findings compatible with acute diverticulitis involving the distal descending colon with adjacent small volume of free fluid and 1.3 cm small area of likely phlegmonous change. No drainable collection at this time.

After nodding sagely at the report (and hopefully looking up the word “phlegmonous”), the doc explained to me that a ton of people over forty develop diverticula, little pouches or bulges in the colon. Nobody really knows why they show up, but they are more prevalent in the “developed” world, so it likely has something to do with our relatively fiber-poor diets. Typically they’re pretty benign — but for the lucky 5% or so diverticula can trap something on the way by, become infected, and turn into diverticulitis.

The inflammation that occurs from this infection has all kinds of awesome downstream effects, but in a nutshell it hurts like a mother. All things considered my case wasn’t that bad — on the extreme end the diverticula can actually burst and … well … you can imagine how things go from there. Yikes.

Thankfully, back in 1928 Alexander Fleming discovered antibiotics. A cocktail in my IV and an Augmentin prescription for home and within about a day and a half I was pretty well back to normal. Whew.

How it went

It turns out that the location of diverticula play a big role in whether a first case of diverticulitis is likely to recur: a recent study found 8% on the right (ascending), 46% on the left (descending) and 68% in the sigmoid (last mile). For some reason they rarely develop in the transverse section, again unclear why but hey, biology! Mine were in both the descending and sigmoid sections, so I was referred on to a surgeon to have a “chat” about options. Eek.

I showed up at my appointment with visions of colostomy bags dancing in my head. And indeed, I got a ton of information about the various ways diverticulitis can play out, up to and including a permanent bag. But on the upside, it turns out that many folks can manage the condition quite well through less invasive means. The surgeon suggested I see a gastroenterologist to give those a shot, which I dutifully did. Dr. RL was awesome and basically gave me a three-part strategy:

  1. Preventive. Eat a bunch of fiber but avoid “trappable” stuff like seeds, popcorn husks, etc.. I have become a loyal Metamucil patron and kind of freak out if I miss a day. Truth is, though, even my doc admits this is pretty anecdotal — more playing at the edges than making a huge difference. That’s ok, it’s easy to do and why tempt fate, right?
  2. Treat the early signs. More on this later, but if you do suffer recurrent attacks they get pretty easy to identify: low-level gassiness, cramping and/or constipation. There is some evidence that people can head off larger attacks at this point by using a four-part approach of: (a) warmth (heating pads or hot baths); (b) temporarily switching to a low-fiber diet; (c) walking and moving around a lot; and (d) taking OTC laxatives. I think this worked maybe once or twice for me, so not super-effective. But again, what idiot wouldn’t try it?
  3. Antibiotics. We all know the downsides of taking a ton of antibiotics and the serious risk of resistance. But when the only alternative is surgery, folks have gotten much more accepting of antibiotics as a way to knock back an attack. And they largely do work. Once Dr. RL got comfortable with my ability to distinguish an attack, she made sure I always had a course “in hand” so I could start the regimen as soon as possible.

The next 2+ years passed more or less benignly, with treatable attacks about two or three times a year. The pattern became very recognizable — generalized discomfort that would steadily focus into the lower-left side of my torso, exactly where those diverticula showed up on CT and by colonoscopy. I find the mechanics of it all both fascinating and disturbing; we really are just meat-based machines at the end of the day. Once the pain settled into its familiar spot and my fever started to spike, I’d start the antibiotics and usually it’d do the trick.

The most common antibiotic used for diverticulitis is apparently Levofloxacin, but since I’m big-time allergic to that it wasn’t an option. Next up is Augmentin, a combination of amoxicillin and clavulanate potassium that is designed to inhibit the development of resistance. Unfortunately by mid-2021 this particular cocktail became ineffective for my case and I ended up in the ER again:

Moderately severe acute diverticulitis is seen centered in the distal left descending colon, near the prior site of diverticulitis seen on the 2019 CT. Circumferential mucosal thickening extends over approximately a 6 cm length with the more focal inflammatory process centered on a single medially located diverticula. A moderately large amount of pericolic soft tissue stranding is seen as well as a small amount of free fluid seen in the left paracolic gutter and dependently in the pelvis.

Dammit! But the breadth of antibiotic development is remarkable, and there was another arrow in the quiver. Combining the antibiotic Bactrim (itself a combo of sulfamethoxazole and trimethoprim) and the antifungal Flagyl (metronidazole) is a bigger gun but was very effective at taking care of attacks. Amusingly we just came across Flagyl for our new puppy Copper, who used it to tamp down a case of Giardia he picked up with his litter — things we can bond over!

Alas, this all seems to be an arms race and my easy treatments were not to last. In the summer of 2022 while visiting my son in Denver, I developed an allergy to the Bactrim with some seriously weird side effects. No anaphylaxis thank heaven, but together with the traditional hives and itching my skin became like tissue paper — any rubs or cuts became open sores overnight. Super unpleasant and no longer a tenable option to be taking multiple times a year. Dammit.

Unfortunately, this left the antibiotic cupboard a bit bare. And frankly left me a bit freaked out — it’s harder to be blasé about attacks when there’s no obvious treatment in play. Luckily Dr. RL is awesome and got on the phone after hours to discuss next steps. Seriously people, when you find a good doc in any specialty, hold on and don’t let go!

How it’s going

The nut of our exchange was — probably time for surgery. Another referral and disturbingly-detailed conversation about my GI tract, this time with Dr. E, a colorectal surgeon affiliated with Overlake. As it turns out she was fantastic, taking a ton of time to explain the options and get into pretty grotty detail about how it all worked. I particularly appreciated the sketches and notes she left me with; a chaos of scribbles that felt exactly like a whiteboard session on software architecture. I had found a colorectal nerd — hallelujah.

Beyond the non-trivial pain involved in an attack, the big risks are that the diverticula develop (in order of increasing awfulness): (a) abscesses, in which pus gets trapped in the infected diverticula, making them more painful and harder to reach with antibiotics; (b) fistulas, which are abnormal “tunnels” between abscesses and surrounding organs/tissue … passing fecal material into, you know, maybe the bladder; and (c) perforations, where the stuff just dumps into the abdominal cavity. Look, I warned you.

As yet I’d been able to knock down attacks before any of these developed, but without a good antibiotic option that was no longer a slam dunk. And once they’ve occurred, surgery is way more risky, way more disruptive, and way less predictable. In Dr. E’s words, “like trying to stitch together wet tissue paper.” And almost certainly involving “the bag.” All of which made me quite disposed to appreciate the elective option — more formally in my case, “robotic laparoscopic low anterior colon resection.” Less formally, “cutting out a chunk of my colon and stapling it back together.”

In this exercise, the placement of my diverticula was actually an advantage. It turns out that — and again there are theories but nobody really knows why — you can improve outcomes dramatically by removing the part of the colon starting just above the rectum (the, I kid you not, “high pressure zone”). Unfortunately I can’t find a good link for this online but Dr. E clearly knows of what she speaks. Because my diverticula were in the sigmoid and lower descending colon, this made for a nice continuous piece to remove. Cool.

Prep for the surgery was pretty uneventful — some antibiotics (neomycin and flagyl, deftly avoiding the nasty ones) and a bowel prep just like for a colonoscopy. May I never see lemon-lime Gatorade again thank you very much. An early call at the hospital, quick conversations with Dr. E and the anesthesiologist, way too many pokes trying to get IVs into my dehydrated veins, and it was go time.

The last mile (I hope)

The surgery itself is just a technological miracle. Thanks to OpenNotes I was able to read the play-by-play in complete detail. Paraphrased for brevity and apologies if the summary isn’t perfect, but:

  1. They brought me into the operating room and put me under. I remember climbing onto the table, that’s about it.
  2. I was prepped, given a urinary catheter and some meds, and moved into low lithotomy position.
  3. They paused to double-check they had the right patient and all that — appreciated.
  4. They put five cuts into my abdomen, flipped me upside down into Trendelenburg’s position and inserted the various robot arms and stuff. Being upside down lets gravity move most of the “stuff” in the abdomen out of the way for better visibility. Inflating me like a CO2 balloon also helps with this.
  5. She said nice things about the attachment of my colon to the sidewall and made sure my ureters (tubes from kidney to bladder) wouldn’t get nicked. Also appreciated.
  6. She moved the colon into position and cut first at the top of the rectum, then in the mid-left colon just above the adhesions and diverticula. The removed section was placed in a bag and — get this — “left in the abdomen to be retrieved later.” Just leave that over in the corner, housekeeping will take care of it overnight.
  7. Here’s where it gets really amazing. The two open ends of colon were joined together using a stapler. I’m not sure this is the exact model, but it’s pretty close — check out the video (also embedded below because it’s so cool). Apparently this join is strong enough that that very day I was allowed to eat anything I wanted (I chose to be conservative here). Stunning.
  8. They closed me up (and did remember to remove the old piece of colon). Apparently my omentum wasn’t big enough to reach the repair site; typically they drape it there to deliver a shot of immune cells. My one big failing, ah well.
  9. The anesthesiologist installed a temporary TAP block to reduce the immediate need for opiods.
  10. They woke me up and shipped me off to recovery. The whole thing took about three hours, way less than expected.

I vaguely remember the very initial recovery being pretty painful, mostly in my back which I assume was from being in that weird upside down position for so long. I remember only shadowy flashes of my recovery nurse “Dean” who IIRC seemed amused by my demeanor … apparently I was effusively apologetic? Anesthesia is some weird sh*t my friends. By that afternoon I was in my room for the night and the pain moved into my gut (probably the block wearing off), but a little Dilaudid in my IV helped out quite a bit.

After this phase I won’t say the pain was irrelevant — it’s six days later and I still feel like (again Dr. E) “somebody stabbed me five times” — but it was totally manageable. Most importantly, when I would lay still there was almost no pain at all, so it was easy to catch my breath. The difference between pain-when-you-do-something and pain-all-the-time is night and day different. I took no opiods after that first night and really just 1000mg of Tylenol 3x per day was enough. No judgment for those who don’t have it so easy, I think I was super super lucky here — but at least as one data point it was pretty darn ok.

Milestones for going home were basically (a) walking around independently and (b) end-to-end bowel action. I was walking that first night, and it actually felt really good to do so — stretching out the abdomen (and my legs) was a great distraction from just sitting around. Getting into and out of bed was painful; the rest was no sweat. I was able to do this on my own and think the staff was probably pretty weirded out by the unshaven guy dragging around his IV pole all night like Gandalf’s wizard staff. Overlake has really nice patient wards and I must have looped around the 5th floor South and East wings a hundred times.

Bowel action was a little less quick to happen. Apparently with all the trauma the intestines basically shut down, and it takes some coercion to wake them back up. Walking helped, as did small bites of food (I had basically no restrictions, but kept it to cream of wheat and yogurt for the first bit anyways). Being able to limit opiods was also a plus here, so by day two there was a lot of rumbling going on. My first “experience” was distressingly bloody — more ew, sorry — but that was pretty much a one shot deal, and things improved quickly from there. A lot of gas, a lot of diarrhea, that’s just part of the game for a little while. Getting better every day!

I was able to head home the third day, and have just been taking it easy here since then. Nice to not be woken up for vital signs in the middle of the night. I do get exhausted pretty quick and have been sleeping a lot, but am confident that by Christmas I’ll be back in full eating form again. Jamon serrano, I’m coming for you!

All in all

I’ve been a caretaker on and off for many years, and worked in health IT a long time. But I haven’t been a “patient” very often; just a few acute incidents. It’s humbling and not super-pleasant, but a few things really made it bearable and I daresay even interesting:

  1. Great providers. I can’t say enough about Dr. RL and Dr. E (linked to their real profiles because they deserve the kudos). They answered all my questions — the ones where I was scared and the ones where I was just curious. They explained options. And they know their sh*t. Such a confidence boost. I should also mention in particular Nurse Wen of Overlake South 5 — I wish I got her last name! Her sense of personal responsibility for my care — not to mention ability to multitask — was remarkable and I am very grateful.
  2. Open information. I’ve gushed about OpenNotes before, but I can’t overstate how much better it is than “patient education” pablum. I read every note side by side with Google to help me understand the terms — and felt like I actually knew what was going on. Make sure you sign up for your patient portals and read what’s there — it’s good stuff.
  3. Letting folks help. They say you get emotional after general anesthesia, so I’ll blame that. But I still get a little teary thinking about all the people who’ve been there for me with help and texts and whatever. Especially Lara of course. I guess it’s OK to be the caregiv-ee once in awhile. Thanks everyone.

Still awhile to go, and there’s no guarantee that I won’t develop some new little buggers to deal with in the future. But so far so good on this chapter. If you found this screed because you’re on your own diverticulitis journey and are looking for real-world information, hooray! Feel free to ping me via the contact form on the site, I’m more than happy to provide any and all details about my own experience. Just remember, I’m a sample size of one.

It’s Always a Normalization Problem

Heads up, this is another nerdy one! ShareToRoku is available on the Google Play store. All of the client and server code is up on my github under MIT license; I hope folks find it useful and/or interesting.

Algorithms are the cool kids of software engineering. We spend whole semesters learning to sort and find stuff. Spreadsheet “recalc” engines revolutionized numeric analysis. Alignment algorithms power advances in biotechnology.  Machine learning algorithms impress and terrify us with their ability to find patterns in oceans of data. They all deserve their rep!

But as great as they are, algorithms are hapless unless they receive inputs in a format they understand — their “model” of the world. And it turns out that these models are really quite strict — data that doesn’t fit exactly can really gum up the works. As engineers we often fail to appreciate just how “unnatural” this rigidity is. If I’m emptying the dishwasher, finding a spork amongst the silverware doesn’t cause my head to explode — even if there isn’t a “spork” section in the drawer (I probably just put it in with the spoons). Discovering a flip-top on my toothpaste rather than a twist cap really isn’t a problem. I can even adapt when the postman leaves packages on top of the package bin, rather than inside of it. Any one of these could easily stop a robot cold (so lame).

It’s easy to forget, because today’s models are increasingly vast and impressive, and better every day at dealing with the unexpected. Tesla’s Autopilot can easily be mistaken for magic — but as all of us who have trusted it to exit 405 North onto NE 8th know, the same weaknesses are still hiding in there under the covers. But that’s another story.

Anyhoo, the point is that our algorithms are only useful if we can feed them data that fits their models. And the code that does that is the workhorse of the world. Maybe not the sexiest stuff out there, but almost every problem we encounter in the real world boils down to data normalization. So you’d better get good at it.

Share to Roku (Release 6)

My little Android TV-watching app is a great (in miniature) example of this dynamic at work. If you read the original post, you’ll recall that it uses the  Android “share” feature to launch TV shows and movies on a Roku device. For example, you can share from the TV Time app to watch the latest episode of a show, or launch a movie directly from its review at the New York Times. Quite handy, but it turns out to be pretty hard to translate from what apps “share” to something specific enough to target the right show. Let’s take a look.

First, the “algorithm” at play here is the code that tells the Roku to play content. We use two methods of the Roku ECP API for this:

  • Deep Linking is ideal because it lets us launch a specific video on a specific channel. Unfortunately the identifiers used aren’t standard across channels, and they aren’t published — it’s a private language between Roku and their channel providers. Sometimes we can figure it out, though — more on this later.
  • Search is a feature-rich interface for jumping into the Roku search interface. It allows the caller to “hint” the search with channel identifiers and such, and in certain cases will auto-start the content it finds. But it’s hard to make it do the right thing. And even when it’s working great it won’t jump to specific episodes, just seasons.
public class RokuSearchInfo
{
public static class ChannelTarget
{
public String ChannelId;
public String ContentId;
public String MediaType;
}
public String Search;
public String Season;
public String Number;
public List<ChannelTarget> Channels;
}

Armed with this data, it’s pretty easy to slap together the optimal API request. You can see it happening in ShareToRokuActivity.resolveAndSendSearch — in short, if we can narrow down to a known channel we try to launch the show there, otherwise we let the Roku search do its best. Getting that data in the first place is where the magic really happens.

A Babel of Inputs

The Android Sharesheet is a pretty general-purpose app-to-app sharing mechanism, but in practice it’s mostly used to share web pages or social media content through text or email or whatever. So most data comes through as unstructured text, links and images. Our job is to make sense of this and turn it into the most specific show data we can. A few examples:

App / SourceShared DataIdeal Target
1. TV Time Episode PageShow Me the Love on TV Time https://tvtime.com/r/2AID4“Trying” Season 1 Episode 6 on AppleTV+
2. Chrome nytimes.com Movie Review (No text selection)https://www.nytimes.com/2022/11/22/movies/strange-world-review.html“Strange World” on Disney+
3. Chrome Wikipedia page (movie title selected)“Joe Versus the Volcano”  https://en.wikipedia.org/wiki/Joe_Versus_the_Volcano#:~:text=Search-,Joe%20Versus%20the%20Volcano,-Article“Joe Versus the Volcano” on multiple streaming services
4. YouTube Videohttps://youtu.be/zH14EyiSlas“When you say nothing at all” cover by Reina del Cid on YouTube
5. Amazon Prime MovieHey I’m watching Black Adam. Check it out now on Prime Video! https://watch.amazon.com/detail?gti=amzn1.dv.gti.1a7638b2-3f5e-464a-a271-07c2e2ec1f8c&ref_=atv_dp_share_mv&r=web“Black Adam” on Amazon Prime
6. Netflix Series PageSeen “Love” on Netflix yet?   https://www.netflix.com/us/title/80026506?s=a&trkid=13747225&t=more&vlang=en&clip=80244686“Love” Season 1 Episode 1 on Netflix
7. Search text entered directly into ShareToRokuProject Runway Season 5“Project Runway” Season 5 on multiple streaming services.

Pipelines and Plugins

All but the simplest normalization code typically breaks down into a collection of rules, each targeted at a particular type of input. The rules are strung together into a pipeline, each doing its little bit to clean things up along the way. This approach makes it easy to add new rules into the mix (and retire obsolete ones) in a modular, evolutionary way.

After experimenting a bit (a lot), I settled on a two-phase approach to my pipeline:

  1. Iterate over a list of “parsers” until one reports that it understands the basic format of the input data.
  2. Iterate over a list of “refiners” that try to enhance the initial model by cleaning up text, identifying target channels, etc.

Each of these is defined by a standard Java interface and strung together in SearchController.java. A fancier approach would be to instantiate and order the implementations through configuration, but that seemed like serious overkill for my little hobby app. If you’re working with a team of multiple developers, or expect to be adding and removing components regularly, that calculus probably looks a bit different.

This split between “parsers” and “refiners” wasn’t obvious at first. Whenever I face a messy normalization problem, I start by writing a ton of if/then spaghetti, usually in pseudocode. That may seem backwards, but it can be hard to create an elegant approach until I lay out all the variations on the table. Once that’s in front of me, it becomes much easier to identify commonalities and patterns that lead to an optimal pipeline.

Parsers

Parsers” in our use case recognize input from specific sources and extract key elements, such as the text most likely to represent a series name. As of today there are three in production:

TheTVDB Parser (Lookup.java)

TV Time and a few other apps are powered by TheTVDB, a community-driven database of TV and movie metadata. The folks there were nice enough to grant me access to the API, which I use to recognize and decode TV Time sharing URLs (example 1 in the table). This is a four step process:

  1. Translate the short URL into their canonical URL. E.g., the short URL in example 1 resolves to https://www.tvtime.com/show/375903/episode/7693526&pid=tvtime_android.
  2. Extract the series (375903) and/or episode (7693526) identifiers from the URL.
  3. Use the API to turn these identifiers into show metadata and translate it into a parsed result.
  4. Apply some final ad-hoc tweaks to the result before returning it.

All of this data is cached using a small SQLite database so that we don’t make too many calls directly to the API. I’m quite proud of the toolbox implementation I put together for this in CachingProxy.java, but that’s an article for another day.

UrlParser.java

UrlParser takes advantage of the fact that many apps send a URL that includes their own internal show identifiers, and often these internal identifiers are the same ones they use for “Deep Linking” with Roku. The parser is configured with entries that include a “marker” string — a unique URL fragment that identifies a particular — together with a Roku channel identifier and some extra sugar not worth worrying about. When the marker is found and an ID extracted, this parser can return enough information to jump directly into a channel. Woo hoo!

SyntaxParser.java

This last parser is kind of a last gasp that tries to clean up share text we haven’t already figured out. For example, it extracts just the search text from a Chrome share, and identifies the common suffix “SxEy” where x is a season and y is an episode number. I expect I’ll add more in here over time but it’s a reasonable start.

Refiners

Once we have the basics of the input — we’ve extracted a clean search string and maybe taken a first cut at identifying the season and channels —  a series of “refiners” are called in turn to improve the results. Unlike parsers which short-circuit after a match is found, all the refiners run every time.

WikiRefiner.java

A ton of the content we watch these days is created by the streaming providers themselves. It turns out that there are folks who keep lists of all these shows on Wikipedia (e.g., this one for Netflix). The first refiner simply loads up a bunch of these lists and then looks at incoming search text for exact matches. If one is found, the channel is added to the model.

As a side note, the channel is actually added to the model only if the user has that channel installed on their Roku (as passed up in the “channels” query parameter). The same show is often available on a number of channels, and it doesn’t make sense to send a Roku to a channel it doesn’t know about. If the show is available on multiple installed channels, the Android UX will ask the user to pick the one they prefer.

RokuSearchRefiner.java

Figuring out this refiner was a turning point for the app. It makes the results far more accurate, which of course makes sense since they are sourced from Roku itself. I’ve left the WikiRefiner in place for now, but suspect I can retire it with really no decrease in quality. The logs will show if that’s true or not after a few weeks.

In any case, this refiner passes the search text up to the same search interface used by roku.com. It is insanely annoying that this API doesn’t return deep link identifiers for any service other than the Roku Channel, but it’s still a huge improvement. By restricting results to “perfect” matches (confidence score = 1), I’m able to almost always jump directly into a channel when appropriate.

I’m not sure Roku would love me calling this — but I do cache results to keep the noise down, so hopefully they’ll just consider it a win for their platform (which it is).

FixupRefiner.java

At the very end of the pipeline, it’s always good to have a place for last-chance cleanup. For example, TVDB knows “The Great British Bake Off,” but Roku in the US knows it as “The Great British Baking Show.” This refiner matches the search string against a set of rules that, if found, allow the model to be altered in a manual way. These make the engineer in me feel a bit dirty, but it’s all part of the normalization game — the choice is whether to feel morally superior or return great results. Oh well, at least the rules are in their own configuration file.

Hard Fought Data == Real Value

This project is a microcosm of most of the normalization problems I’ve experienced over the years. It’s important to try to find some consistency and modularity in the work — that’s why pipelines and plugins and models are so important. But it’s just as important to admit that the real world is a messy place, and be ready to get your hands dirty and just implement some grotty code to clean things up.

When you get that balance right, it creates enormous differentiation for your solution. Folks can likely duplicate or improve upon your algorithms — but if they don’t have the right data in the first place, they’re still out of luck. Companies with useful, normalized, proprietary data sets are just always always always more valuable. So dig in and get ‘er done.

public RokuSearchInfo parse(String input, UserChannelSet channels) {
// 1. PARSE
String trimmed = input.trim();
RokuSearchInfo info = null;
try {
info = tvdbParser.parse(input, channels);
if (info == null) info = urlParser.parse(input, channels);
if (info == null) info = syntaxParser.parse(input, channels);
}
catch (Exception eParse) {
log.warning(Easy.exMsg(eParse, "parsers", true));
info = null;
}
if (info == null) {
info = new RokuSearchInfo();
info.Search = trimmed;
log.info("Default RokuSearchInfo: " + info.toString());
}
// 2. REFINE
tryRefine(info, channels, rokuRefiner, "rokuRefiner");
tryRefine(info, channels, wikiRefiner, "wikiRefiner");
tryRefine(info, channels, fixupRefiner, "fixupRefiner");
// 3. RETURN
log.info(String.format("FINAL [%s] -> %s", trimmed, info));
return(info);
}

Health IT: More I, less T

“USCDI vs. USCDI+ vs. EHI vs. HL7 FHIR US Core vs. IPA. Definitions, similarities, and differences as you understand them. Go!” —Anonymous, Twitter

I spent about a decade working in “Health Information Technology” — an industry that builds solutions for managing the flow of healthcare information. It’s a big tent that boasts one of the largest trade shows in the world and dozens of specialized venture funds. And it’s quite diverse, including electronic health records, consumer products, billing and cost management, image management, AI and analytics of every flavor you can imagine, and more. The money is huge, and the energy is huger.

Real world progress, on the other hand, is tough to come by. I’m not talking about health care generally. The tools of actual care keep rocketing forward; the rate limiter on tests and treatments seems only our ability to assess efficacy and safety fast enough. But in the HIT world, it’s mostly a lot of noise. The “best” exits are mostly acquisitions by huge insurance companies willing to try anything to squeak out a bit more margin.

That’s not to say there’s zero success. Pockets of awesome actually happen quite often, they just rarely make the jump from “promising pilot” to actual daily use at scale. There are many reasons for this, but primarily it comes down to workflow and economics. In our system today, nobody is incented to keep you well or to increase true efficiency. Providers get paid when they treat you, and insurance companies don’t know you long enough to really care about your long-term health. Crappy information management in healthcare simply isn’t a technology problem. But it’s an easy and fun hammer to keep pounding the table with. So we do.

But I’m certainly not the first genius to recognize this, and the world doesn’t need another cynical naysayer, so what am I doing here? After watching another stream of HIT technobabble clog up my Twitter feed this morning, I thought it might be helpful to call out four technologies that have actually made a real difference over the last few years. Perhaps we’ll see something in there that will help others find their way to a positive outcome. Or maybe not. Let’s give it a try.

A. Patient Portals

Everyone loves to hate on patient portals. I sure did during the time I spent trying to make HealthVault go. After all, most of us interact with at least a half dozen different providers and we’re supposed to just create accounts at all of them? And figure out which one to use when? And deal with their circa 1995 interfaces? Really?

Well, yeah. That’s pretty much how business works on the web. Businesses host websites where I can view my transaction history, pay bills, and contact customer support. A few folks might use aggregation services to create a single view of their finances or whatever, but most of us just muddle through, more-or-less happily, using a gaggle of different websites that don’t much talk to each other.

There were three big problems with patient portals a few years ago:

  1. They didn’t exist. Most providers had some third-party billing site where you could pay because, money. But that was it.
  2. When they did exist, they were hard to log into. You usually had to request an “activation code” at the front desk in person, and they rarely knew what you were talking about.
  3. When they did exist and you could log in, the staff didn’t use them. So secure messaging, for example, was pretty much a black hole.

Regulation fixed #1; time fixed #2; the pandemic fixed #3. And it turns out that patient portals today are pretty handy tools for interacting with your providers. Sure, they don’t provide a universal comprehensive view of our health. And sure, the interfaces seem to belong to a long ago era. But they’re there, they work, and they have made it demonstrably easier for us to manage care.

Takeaway: Sometimes, healthcare is more like other businesses than we care to admit.

B. Epic Community Connect & Care Everywhere

Epic is a boogeyman in the industry — an EHR juggernaut. Despite a multitude of options, fully a third of hospitals use Epic, and that percentage is much larger if you look at the biggest health systems in the country. It’s kind of insane.

It can easily cost hundreds of millions of dollars to install Epic. Institutions often have Epic consultants on site full time. And nobody loves the interface. So what is going on here? Well, mostly Epic is just really good at making life bearable for CIOs and their IT departments. They take care of everything, as long as you just keep sending them checks. They are extremely paternalistic about how their software can be used, and as upside-down as that seems, healthcare loves it. Great for Epic. Less so for providers and patients, except for two things:

Community Connect” is an Epic program that allows customers to “sublet” seats in their Epic installation to smaller providers. Since docs are basically required to have an EHR now (thanks regulation), this ends up being a no-brainer value proposition for folks that don’t have the IT savvy (or interest) to buy and deploy something themselves. Plus it helps the original customer offset their own cost a bit.

Because providers are using the same system here, data sharing becomes the default versus the exception. It’s harder not to share! And even non-affiliated Epic users can connect by enabling “Care Everywhere,” a global network run by Epic just for Epic customers. Thanks to these two things, if you’re served by the 33%+ of the industry that uses Epic, sharing images and labs and history is just happening like magic. Today.

Takeaway: Data sharing works great in a monopoly.

C. Open Notes

OpenNotes is one of those things that gives you a bit of optimism at a time when optimism can be tough to come by. Way back in 2010, three institutions (Beth Israel in MA, Geisinger in PA, and Harberview in WA) started a long-running experiment that gave patients completely unfettered access to their medical records. All the doctor’s notes, verbatim, with virtually no exception. This was considered incredibly radical at the time: patients wouldn’t understand the notes; they’d get scared and create more work for the providers; providers fearing lawsuits would self-censor important information; you name it.

But at the end of the study, none of that bad stuff happened. Instead, patients felt more informed and greatly preferred the primary data over generic “patient education” and dumbed-down summaries. Providers reported no extra work or legal challenges. It took a long time, but this wisdom finally made it into federal regulation last year. Patients now must be granted full access to their providers’ notes in electronic form at no charge.

In the last twelve months my wife had a significant knee surgery and my mom had a major operation on her lungs. In both cases, the provider’s notes were extraordinarily useful as we worked through recovery and assessed future risk. We are so much better educated than we would otherwise have been. An order of magnitude better than ever before.

Takeaway: Information already being generated by providers can power better care.

D. Telemedicine

It’s hard to admit anything good could have come out of a global pandemic, but I’m going to try. The adoption of telemedicine as part of standard care has been simply transformational. Urgent care options like Teladoc and Doctor on Demand (I’ve used both) make simple care for infections and viruses easy and non-disruptive. For years insurance providers refused “equal pay” for this type of encounter; it seems that they’ve finally decided that it can help their own bottom line.

Just as impactful, most “regular” docs and specialists have continued to provide virtual visits as an option alongside traditional face-to-face sessions. Consistent communication between patients and providers can make all the difference, especially in chronic care management. I’ve had more and better access to my GI specialists in the last few years than ever before.

It’s only quite recently that audio and video quality have gotten good enough to make telemedicine feel like “real” medicine. Thanks for making us push the envelope, COVID.

Takeaway: Better care and efficiency don’t have to be mutually exclusive.

So there we go. There are ways to make things better with technology, but you have to work within the context of reality, and they ain’t always that sexy. We don’t need more JSON or more standards or more jargon; we need more information and thoughtful integration. Just keep swimming!

Form and Function

I love reality TV about making stuff and solving problems. My family would say “to a fault.” Just a partial list of my favs:

I could easily spin a tangent about experiential archeology and the absolutely amazing Ruth Goldman, but I’ll be restrained about that (nope): Secrets of the Castle, Tudor Monastery Farm, Tales from the Green Valley, Edwardian Farm, Victorian Farm, Wartime Farm.

ANYWAY.

Recently I discovered that old Project Runway seasons are available on the Roku Channel, so I’ve been binging through them; just finished season fourteen (Ashley and Kelly FTW). At least once per year, the designers are asked to create a look for a large ready-to-wear retailer like JCPenney or JustFab or whatever. These are my favorites because it adds a super-interesting set of constraints to the challenge — is it unique while retaining mass appeal, can it be reproduced economically, will it read well in an online catalog, etc. etc.. This ends up being new for most of the participants, who think of themselves (legitimately) as “artists” and prefer to create fashion for fashion’s sake. Many of them have never created anything other than bespoke pieces and things often go hilariously off the rails as their work is judged against real world, economic criteria in addition to innovation and aesthetics. Especially because the judges themselves often aren’t able to express their own expectations clearly up front.

This vibe brings me back to software development in an enterprise setting (totally normal, right?). So many developers struggle to understand the context in which their work is judged. After all, we learned computer science from teachers for whom computer science itself is the end goal. We read about the cool new technologies being developed by tech giants like Facebook and Google and Amazon. All of our friends seem to be building microservices in the cloud using serverless backends and nosql map/reduce data stores leveraging deep learning and … whatever. So what does it mean to build yet another integration between System A and System B? What, in the end, is the point?

It turns out to be pretty simple:

  1. Does your software enable and accelerate business goals right now, and
  2. Does it require minimal investment to do the same in the future?

Amusingly, positive answers to both of these turn out to be pretty much 100% correlated not with “the shiniest new language” or “what Facebook is doing” but instead beautiful and elegant code. So that’s cool; just like a great dress created to sell online, successful enterprise code is exceptional both in form and function. Nice!

But as easy as these tests seem, they can be difficult to measure well. Enterprises are always awash in poorly-articulated requirements that all “need” to be ready yesterday. Becoming a slave to #1 can seem like the right thing — “we exist to serve the business” after all — but down that road lies darkness. You’ll write crappy code that doesn’t actually do what your users need anyways, breaks all the time and ultimately costs a ton in refactoring and lost credibility.

Alas, #2 alone doesn’t work either. You really have no idea what the future is going to look like, so you end up over-engineering into some super-generalized false utopian abstraction that surely costs more than it should to run and doesn’t do any one thing well. And it is true that if your business isn’t successful today it won’t exist tomorrow anyways.

It’s the combination that makes the magic. That push and pull of building something in the real world now, that can naturally evolve over time. That’s what great engineering is all about. And it primarily comes down to modularity. If you understand and independently execute the modules that make up your business, you can easily swap them in and out as needs change.

In fact, that’s why “microservices” get such play in the conversation these days — they are one way to help enforce separation of duties. But they’re just a tool, and you can create garbage in a microservice just as easily as you can in a monolith. And holy crap does that happen a lot. Technology is not the solution here … modular design can be implemented in any environment and any language.

  • Draw your business with boxes and arrows on one piece of paper.
  • Break down processes into independent components.
  • Identify the core data elements, where they are created and which processes need to know about them.
  • Describe the conversations between components.
  • Implement (build or buy) each “box” independently. In any language you want. In any environment that works for you.

Respect both form and function, knitting together a whole from independent pieces, and you are very likely to succeed. Just like the best designers on Project Runway. And the pottery show. And the baking one. And the knife-making one. And the …

Focus

OK, let’s see if I can actually get this thing written.

It’s a little hard to focus right now. We’re almost two weeks into life with Copper the shockingly cute cavapoo puppy. He’s a great little dude, and life is already better with him around. But holy crap, it’s like having a human baby again — except that I’m almost thirty years older, and Furry-Mc-Fur-Face doesn’t use diapers, so it seems like every ten minutes we’re headed outside to do his business. Apparently it’s super-important to provide positive reinforcement for this, so if you happen to see me in the back yard at 3am waxing poetic about pee and/or poop, well, yeah.

What’s interesting about my inability to focus (better explanation of this in a minute) is that it’s not like I don’t have blocks of open time in which I could get stuff done. Copper’s day is an endless cycle of (a) run around like crazy having fun; (b) fall down dead asleep; (c) go to the bathroom; (d) repeat — with a few meals jammed in between rounds. Those periods when he sleeps can be an hour or more long, so there’s certainly time in the day to be productive.

And yet, in at least one very specific way, I’m not. “Tasks” get done just fine. The dishes and clothes are clean. I take showers. I even mowed the lawn the other day. I’m caught up on most of my TV shows (BTW Gold Rush is back!). But when I sit down to do something that requires focus, it’s a lost cause. Why is that?

Things that require focus require me to hold a virtual model of the work in my head. For most of my life the primary example of this has been writing code. But it applies to anything that requires creation and creativity — woodworking, writing, art, all of that stuff. These models can be pretty complicated, with a bunch of interdependent and interrelated parts. An error in one bit of code can have non-obvious downstream effects in another; parallel operations can bump into each other and corrupt data; tiny changes in performance can easily compound into real problems.

IMNSHO, the ability to create, hold and manipulate these mental models is a non-negotiable requirement to be great at writing and debugging code. All of the noise around TDD, automated testing, scrums and pair programming, blah blah blah — that stuff might make an average coder more effective I suppose, but it can’t make them great. If you “walk” through your model step by step, playing out the results of the things that can go wrong, you just don’t write bugs. OK, that’s bullsh*t — of course you write bugs. But you write very few. Folks always give me crap for the lack of automated tests around my code, but I will go toe-to-toe with any of them on code quality — thirty years of bug databases say I’ll usually win.

The problem, of course, is that keeping a complex model alive requires an insane amount of focus. And focus (the cool kids and my friend Umesh call it flow state) requires a ton of energy. When I was just out of school I could stay in the zone for hours. No matter what was going on in the other parts of my life, I could drop into code and just write (best mental health therapy ever). But as the world kept coming at me, distractions made it increasingly difficult to get there. Kids and family of course, but work itself was much more problematic. As I advanced in my career, my day was punctuated with meetings and budgets and managing and investors and all kinds of stuff.

I loved all of that too (ok maybe not the meetings or budgets), but it meant I had smaller and smaller time windows in which to code. There is nothing more antithetical to focus than living an interrupt-driven existence. But I wasn’t going to quit coding, so it forced me to develop two behaviors — neither rocket science — that kept me writing code I was proud of:

1. Code in dedicated blocks of time.

Don’t multitask, and don’t try to squeeze in a few minutes between meetings. It takes time to establish a model, and it’s impossible to keep one alive while you’re responding to email or drive-by questions. Establish socially-acceptable cues to let people know when you need to be left alone — one thing I wish I’d done more for my teams is to set this up as an official practice. As an aside, this is why open offices are such horsesh*t for creative work — sometimes you just need a door.

2. Always finish finished.

This is actually the one bit of “agile” methodology that I don’t despise. Break up code into pieces that you can complete in one session. And I mean complete — all the error cases, all the edges, all of it. Whatever interface the code presents — an API, an object interface, a UX, whatever — should be complete when you let the model go and move on to something else. If you leave it halfway finished, the mental model you construct “next time” will be just ever so slightly different. And that’s where edges and errors get missed.

Finishing finished improves system architecture too — because it forces tight, compact modularity. For example, you can usually write a data access layer in one session, but might need to leave the cache for later. Keeping them separate means that you’ll test each more robustly and you’ll be in a position to replace one or the other as requirements morph over time. As an extra bonus, you get a bunch of extra endorphin hits, because really finishing a task just feels awesome.

OK sure, sounds great. But remember my friend Copper? In just the few sessions I’ve taken to write this little post, he’s come by dozens of times to play or make a quick trip outside. And even when I’m not paying him active attention, part of my brain always has to be on alert. Sometimes, distractions become so intense that the reality is you just won’t be successful at creative work, because your brain simply won’t focus. It hurts to say that, but better to acknowledge it than to do a crap job. The good news is that these times are usually transient — Copper will only be a brand new puppy for a few weeks, and during that time I just have to live with lower productivity. It’s not the end of the world, so long as I understand what’s going on and plan for it.

If you’re a manager, you really need to be on the lookout for folks suffering from focus-killing situations. New babies, new houses or apartments, health problems, parent or relationship challenges, even socio-political issues can catch up with people before they themselves understand what’s going on. Watch for sudden changes in performance. Ask questions. Maybe they just need some help learning to compartmentalize and optimize their time. Or maybe they need you to lighten the load for a few weeks.

Don’t begrudge them that help! Supporting each other through challenging times forges bonds that pay back many times over. And besides, you’ll almost certainly need somebody to do the same for you someday. Pay it forward.

Share to Roku!

TLDR: if you watch TV on a Roku and have an Android phone, please give my new Share To Roku app a try! It’s currently in open testing; install it with this link on the web or this link on your Android device. The app is free, has no ads, saves no data and only makes network calls to Rokus on your local network. It acts as a simple remote, but much more usefully lets you “Share” show names from the web or other apps directly to the Roku search interface. I use it with TV Time and it has been working quite well so far — but I need broader real-world testing and would really appreciate your feedback.

Oh user interface development, how I hate you so. But my lack of experience with true mobile development has become increasingly annoying, and I really wanted an app to drive my Roku. So let’s jump back into the world of input events and user interface layouts and see if we can get comfy. Yeesh.

Share To Roku in a nutshell

I’ve talked about this before (here and here). My goal is to transition as smoothly as possible from finding a show on my phone (an Android, currently the Samsung Galaxy S21) to watching it on my TV. I keep my personal watchlist on an app called TV Time and that’s key, but I also want to be able to jump from a recommendation in email or a review on the web. So feature #1 is to create a “share target” that can accept messages from any app.

Armed with this inbound search text, the app will help the user select their Roku, ensure the TV power is on (if that particular Roku supports it), and forward the search. The app then will land on a page hosting controls to help navigate the last mile to the show (including a nice swipe-enabled directional pad that IMNSHO is way better than the official Roku app). This remote control functionality will also be accessible simply by running the app on its own. And that’s about it. Easy peasy!

All of the code for Share To Roku is up on github. I’ll do a final clean-up on the code once the testing period is over, but everything I’ve written here is free to use and adopt under an MIT license, so please take anything you find useful.

Android Studio and “Kotlin”

If you’ve worked with me before, you know that my favorite “IDE” is Emacs; I build stuff from the command line; and I debug using logs and jdb. But for better or worse, the only realistic way to build for Android is to at least mostly use Android Studio, a customized version of IntelliJ IDEA (you can just use IntelliJ too but that’s basically the same thing). AStudio generates a bunch of boilerplate code for even the simplest of apps, and encourages you to edit it all in this weird overlapping sometimes-textual-sometimes-graphical mode that at least in my experience generally ensures a messy final product. I’m not going to spend this whole article complaining about it, but it is pretty stifling.

Love me a million docked windows, three-deep toolbars and controls on every edge of the screen!

Google would also really prefer that you drop Java and instead use their trendy sort-of-language “Kotlin” to build Android apps. I’ve played this Java pre-complier game before with Scala and Groovy, and all I can say is no thank you. I will never understand why people are so obsessed with turning code into a nest of side effects, just to avoid a few semicolons and brackets. At least for now they are grudgingly continuing to support Java development — so that’s where you’ll find me. On MY lawn, where I like it.

Android application basics

Components

The most important thing to understand about Android development is that you are not in charge of your process. There is no “main” and, while you get your own JVM in which to live, that process can come and go at pretty much any time. This makes sense — at least historically mobile devices have had to manage pretty limited memory and processing power, so the OS exerts a ton of control over the use of those resources. But it can be tricky when it comes to understanding state and threading in an app, and clearly a lot of bugs in the wild boil down to a lack of awareness here.

Instead of main, an Android app is effectively a big JAR that uses a manifest file to expose Component classes. The most common of these is an Activity, which is effectively represents one user interface screen within the app. Other components include various types of background process; I’m going to ignore them here. Share to Roku exposes two Activities, one for choosing a Roku and one for the search and remote interface. Each activity derives from an Android base class that defines a set of well-known entrypoints, each of which is called at different points in the process lifecycle.

Tasks and the Back Stack

But before we dig into those, two other important concepts: tasks and the back stack. This can get wildly complicated, but the super-basics are this:

  • A “task” is a thing you’re doing on the device. Most commonly tasks are born by opening an app from the home screen.
  • Each task maintains a “stack” of activities (screens). When you navigate to a new screen (e.g., open an email from a list of emails) a new activity is added to the top of the stack. When you hit the back button, the current (top) activity is closed and you return to the previous one.
  • Mostly each task corresponds to an app — but not always. For example, when you are in Chrome and you “share” a show to my app, a new Share To Roku activity is added to the Chrome task. Tasks are not the same as JVM processes!

Taken together, the general task/activity lifecycle starts to make sense:

  1. The user starts a new task by starting an app from the home screen.
  2. Android starts a JVM for that app and loads an instance of the class for the activity marked as MAIN/LAUNCHER in the manifest.
  3. The onCreate method of the activity is called.
  4. The user interacts with the ux. Maybe at some point they dip into another activity, in which case onPause/onResume and onStop/onStart are called as the new activity starts and finishes.
  5. When the activity is finished (the user hits the back button or closes the screen in some other way) the onDestroy method is called.
  6. When the system decides it’s a good time (e.g., to reduce memory usage), the JVM is shut down.

Of course, it’s not really that simple. For example, Android may just nuke your process at any time, without ever calling onDestroy — so you’ll need to put some thought into how and when to save persistent data. And depending on your configuration, existing activity instances may be “reused” (with a call to onNewIntent). But it’s a pretty good starting place.

Intents

Intents are the means by which users navigate between activities on an Android device. We’ve actually already seen an intent in action, in step #2 above — MAIN/LAUNCHER is a special intent that means “start this app from the beginning.” Intents are used for every activity-to-activity transition, whether that’s explicit (e.g., when an email app opens up a message details activity in response to a click in a message list) or implicit (e.g., when an app opens up a new, pre-populated text message without knowing which app the user has configured for SMS).

Share to Roku uses intents in both ways. Internally, after picking a Roku, ChooseRokuActivity.shareToRoku instantiates an intent to start the ShareToRokuActivity. Because that internal navigation is the only way to land on ShareToRokuActivity, its definition in the manifest sets the “exported” flag to false and doesn’t include any intent-filter elements.

Conversely, the entry for ChooseRokuActivity in the manifest sets “exported” to true and includes no less than three intent-filter elements. The first is our old friend MAIN/LAUNCHER, but the next two are more interesting. Both identify themselves as SEND/DEFAULT filters, which mark the activity as a target for the Android Sharesheet (which we all just know as “sharing” from one app to another). There are two of them because we are registering to handle both text and image content.

Wait, image content? This seems a little weird; surely we can’t send an image file to the Roku search API. That’s correct, but it turns out that when the TV Time app launches a SEND/DEFAULT intent, it registers the content type as an image. There is an image; a little thumbnail of the show, but there is also text included which we use for the search. There isn’t a lot of consistency in the way applications prepare their content for sharing; I foresee a lot of app-specific parsing in my future if Share To Roku gets any real traction with users.

ChooseRokuActivity

OK, let’s look a little more closely at the activities that make up the app. ChooseRokuActivity (code / layout) is the first screen a user sees; a simple list of Rokus found on the local network. Once the user makes a selection here, control is passed to ShareToRokuActivity which we’ll cover next.

The list is a ListView, which gives me another opportunity to complain about modern development. Literally every UX system in the world has a control for simple displays of text-based lists. Android’s ListView is just this — a nice, simple control to which you attach an Adapter that holds the data. But the Android Gods really would rather you don’t use it. Instead, you’re supposed to use RecyclerView, a fine but much more complicated view. It’s great for large, dynamic lists, but way too much for most simple text-based UX lists. This kind of judgy noise just bugs me — an SDK should make common things as easy as possible. Sorry not sorry, I’m using the ListView. Anyways, the list is wrapped in a SwipeRefreshLayout which provides the gesture and feedback to refresh the list by pulling down.

The activity populates the list of Rokus using static methods in Roku.java. Discovery is performed by UDP broadcast in Ssdp.java, a stripped down version of the discovery classes I wrote about extensively in Anyone out there? Service discovery with SSDP, WSD, other acronyms. The Roku class maintains a static (threadsafe) list of the Rokus it finds, and only searches again when asked to manually refresh. This is one of those places where it’s important to be aware of the process lifecycle; the list is cached as long as our JVM remains alive and will be used in any task we end up in.

Take a look at the code in initializeRokus and findRokus (called from onCreate). If we have a cache of Rokus, we populate it directly into the list for maximum responsiveness. If we don’t, we create an ActivityWorker instance that searches using a background thread. The trick here is that each JVM process has exactly one thread dedicated to managing all user interface interactions — only code running on that thread can touch the UX. So if another thread (e.g., our Roku search worker) needs to update user interface components (i.e., update the ListView), it needs help.

There are a TON of ways that people manage this; ActivityWorker is my solution. A caller implements an interface with two methods: doBackground is run on a background thread, and when that method completes, the code in doUx runs on the UI thread (thanks to Activity.runOnUiThread).  These two methods can share member variables (e.g., the “rokus” set) without worrying about concurrency issues — a nice clean wrapper for a common-but-typically-messy situation.

ShareToRokuActivity

The second activity (code / layout) has more UX going on, and I’ll admit that I appreciated the graphical layout tools in AStudio. Designing even a simple interface that squishes and stretches reasonably to fit on so many different device sizes can be a challenge. Hopefully I did an OK job, but testing with emulators only goes so far — we’ll see as I get a few more testers.

If the activity was started from a sharing operation, we pick up that inbound text as “extra” data that comes along with the Intent object (the data actually comes to us indirectly via ChooseRokuActivity, since that was the entry point). Dealing with this search text is definitely the most unpleasant part of the app, because it comes in totally random and often unhelpful forms. If Share To Roku is going to become a meaningfully useful tool I’m going to have to do some more investment here.

A rare rave from me — the Android Volley HTTP library (as used in Http.java) is just fantastic. It works asynchronously, but always makes its callback on the UX thread. That is, it does automatically what I had to do manually with ActivityWorker. Since most mobile apps are really just UX sitting atop some sort of HTTP API, this makes life really really easy. Love it!

The bulk of this activity is just buttons and lists that cause fire-and-forget calls to the Roku, except for the directional pad that takes up the center of the screen. CirclePad.java is a custom control (sorry, custom “View”) that lets the user click a center button and indicate direction with either clicks in the N-S-E-W regions or (way cooler) directional swipes. A long press on the control sends a backspace, which makes entering text on the TV a bit more pleasant. Building this control felt like a throwback to Windows 3.0 development. Set a clip region, draw some lines and circles and icons. The gesture recognition is simultaneously amazingly easy (love the “fling” handler) and oddly prehistoric (check out my manual identification of a “long” press).

Publishing to the Play store

Back in the mid 00’s I spent some time consulting for Microsoft on a project called Windows Marketplace (wow there is a Wikipedia article for everything). Marketplace was sponsored by the Windows marketing team as an attempt to highlight the (yes) shareware market, which had been basically decimated by cross-platform browser-based apps. It worked a lot like any other app store, with some nice features like secure backup of purchased license keys (still a thing with some software!!!). It served a useful role for a number of years — fun times with a neat set of people (looking at you Raj, Vikram, DeeDee, Paul, Matt and Susan) and excellent chaat in Emeryville.

Anyways, that experience gave me some insight into the challenges of running and monetizing a directory of apps developed by everyone from big companies to (hello) random individuals. Making sure the apps at least work some of the time and don’t contain viruses or some weirdo porn or whatever. It’s not easy — but Google and Apple really have really done a shockingly great job. Setting up account on the Play Console is pretty simple — I did have to upload an image of my official ID and pay a one-time $25 fee but that’s about it. The review process is painful because each cycle takes about three or four days and they often come back with pretty vague rejections. For example, “you have used a word you may not have the rights to use” … which word is, apparently, a secret? But I get it.

So anyways — my lovely little app is now available for testing. If you’ve got an Android device, please use the links below to give it a try. If you have an Apple device, I’m sorry for many reasons. I will definitely be doing some work to better manipulate inbound search strings to provide a better search result on the Roku. I’m a little torn as to whether I could just do that all in-app, or if I should publish an API that I can update more easily. Probably the latter, although that does create a dependency that I’m not super-crazy about. We’ll see.

Install the beta version of Share To Roku with this link on the web or this link on your Android device.

Anyone out there? Service discovery with SSDP, WSD, other acronyms.

Those few regular readers of this stuff may remember What should we watch tonight, in which I used the Roku API to build a little web app to manage my TV watchlist. Since then I’ve found TV Time, which is waaay better and even tells me how many days until the next season of Cobrai Kai gets here (51 as of this writing). But what it doesn’t do is launch shows automatically on my TV, and yes I’m lazy enough to be annoyed by that. So I’ve been planning a companion app that will let me “share” shows directly to my Roku using the same API I used a few months ago.

This time, I’d like the app to auto-discover the TV, rather than asking the user to configure its IP address manually. Seems pretty basic — the Roku ECP API describes how it uses “Simple Service Discovery Protocol” to enable just that. But man, putting together a reliable implementation turned out to be a bear, and sent me tumbling down a rabbit hole of “service discovery” that was both fascinating and frankly a bit appalling.

Come with me down that rabbit hole, and let’s learn how those fancy home devices actually try to find each other. It’s nerd-tastic!

Can I get your number?

99% of what happens on networks is conversations between two devices that already know each other, either directly by address (like a phone number), or by a name that they use to look up an address (like using a phone book). And 99% of the time this works great — between “google.com” and QR codes and the bookmark lists we’ve all built up, there’s rarely any need to even think about addresses. But once in awhile — usually when you’re trying to set up a printer or some other smarty-pants device on your home network — things get a bit more complicated.

Devices on your network are assigned a (basically) arbitrary address by your wifi router, and they don’t have a name registered anywhere, so how do other devices find them to start a conversation? It turns out that there are a pile of different ways — most of which involve either multicast or broadcast UDP messaging, two similar techniques that enable a device to initiate a conversation without knowing exactly who it’s talking to. Vox Clamantis in Deserto as it were.

Side note: for this post I’m going to limit examples to IPv4 addressing, because it makes my job a little easier. The same concepts generally apply to IPv6, except that there is no true “broadcast” with v6 because they figured out that multicast could do all the same things more efficiently, but close enough.

An IP broadcast message is received by all devices on the local network that are listening on a given port. Typically these use the special “limited broadcast” address 255.255.255.255 (there’s also a “directed” broadcast address for each subnet which could theoretically be routed to other networks, but that’s more detail than matters for us). An IP multicast message is similar, but is received only by devices that have subscribed to (or joined) the multicast’s special “group” address. Multicast addresses all have 1110 as their most significant bits, which translates to addresses from 224.0.0.0 to 239.255.255.255.

Generally, these messages are restricted to your local network — that is, routers don’t send them out onto the wider Internet (there are exceptions for complex corporate-style networks, but whatever). This is a good thing, because the cacophony of the whole world getting all of these messages would most definitely take down the Internet. It’s also safer, as we’ll see a bit later.

Roku and SSDP

OK, back to the main thread here. Per the ECP documentation, Roku devices use Simple Service Discovery Protocol for discovery. SSDP defines a multicast address (239.255.255.250) and port (1900), a set of messages using what old folks like me still call RFC 822 format, and two interaction patterns for discovery:

  1. A client looking for devices sends a multicast M-SEARCH message with the type ssdp:discover, setting the ST header to either ssdp:all (everybody respond!) or a specific service type string (the primary Roku type is roku:ecp). AFAIK there is no authoritative list of ST values, you just kind of have to know what they are.  Devices listening for these requests respond directly to the client with a unicast HTTP OK response that includes (thank you) addressing information.
  2. Clients can also listen on the same multicast address for NOTIFY messages of type ssdp:alive or ssdp:byebye; devices send these out when they are turned on and off. It’s a good way to keep a list of devices accurate, but implementations are spotty so it really needs to be used in combination with #1.

An SSDP client in Java

Seems simple, right? I mean, OK, the basics really are simple. But a robust implementation runs into a ton of nit-picky little gotchas that, all together, took me days to sort out. The end result is on github and can be built/tested on any machine with java, git and maven installed (be sure to fix up slashes on Windows):

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox
mvn clean package
java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.ServiceDiscovery \
    ssdp

This command line entrypoint just sends out an ssdp:discover message, displays information on all the devices that respond, and loops listening for additional notifications. Somebody on your network is almost sure to respond; in particular for Roku you can look for a line something like this:

+++ ALIVE: uuid:roku:ecp:2N006D062746 | roku:ecp | http://192.168.86.47:8060/ | (/192.168.86.47:1900)

Super cool! If you open up that URL in a browser you’ll see a bunch more detail about your Roku and the interfaces it supports.

Discovery Protocol Abuse

If you don’t see any responses at all, it’s likely that your firewall is blocking UDP messages either to or from port 1900 — my Linux Mint distribution does both by default. Mint uses UncomplicatedFirewall which means you can see blocking activity (as root) in /var/log/ufw.log and open up UDP port 1900 with commands like:

sudo ufw allow from any port 1900 to any proto udp
sudo ufw allow from any to any port 1900 proto udp

Before you do this, you should be aware that there is some potential for bad guys to do bad stuff — pretty unlikely, but still. Any protocol that can “amplify” one message into many (because multiple devices can respond) carries some risk of a denial-of-service attack. That can be very simple: a bad guy on your network just fires off a ton of M-SEARCH requests, prompting a flood of responses that overwhelm the network as a whole. Or it can be nastier: combined with “ip spoofing,” a bad guy can redirect amplified responses to an unsuspecting victim.

Really though, it’s pretty theoretical for a home network — routers don’t generally route these messages, so it’d have to be an inside job anyways. And once you’ve got a bad actor inside your network, they can probably do a lot more damage than just slowing it down. YMMV, but I’m not personally super-worried about this particular attack. Just please don’t confuse my blasé assessment here with the risks of the related Universal Plug-and-Play (UPnP) protocol, which are quite real.

Under the Covers

There is quite a bit to talk about in the code here. Most of the hard work is in UdpServiceDiscovery.java (I’ll explain this abstraction later), which uses two sockets and two worker threads:

Socket/Thread #1 (DISCOVERY) sends M-SEARCH requests and receives back HTTP OK responses. A request is sent when the thread first starts up and can be repeated either on demand or on a timer (by default every twenty minutes).

It’s key to understand that while the request here is a multicast message, the responses are sent back directly as a unicast. I didn’t implement the server side specifically, but you can see how this works in the automated tests — the responding device extracts the source address and port from the multicast and just replies with a standard UDP unicast message. This is important for us because only one process on a computer can actively listen for unicast messages on a port. And on many systems, somebody is probably already doing that on port 1900 (for example, the Windows service “SSDP Discovery”). So if we want to reliably hear HTTP OK responses, we need to be using an unused, automatically-assigned port for this socket.

Socket/Thread #2 (NOTIFICATION) is for receiving unsolicited NOTIFY multicasts. This socket uses joinGroup to register interest and must be opened on port 1900 to work correctly.

forEachUsefulInterface is an interesting little bit of code. It’s used both for sending requests and joining multicast groups, ensuring that the code works in a system that is connected to multiple network interfaces (typically not the case at home, but better safe than sorry). Remember that multicasts are restricted to a local network — so if you’re attached to multiple networks, you’ll need to send out one message on each of them. The realities of coordinating interfaces with addresses can get pretty complicated, but I think this gets it right. Let me know if you think I’ve missed something!

The class also tries to identify and ignore duplicate UDP messages. Dealing with dups just comes with the territory when working with UDP — and while the nature of the SSDP protocol means it generally doesn’t hurt anything to re-process them, it’s just icky. UdpServiceDiscovery tries to filter them out using message hashing and a FIFO queue of recently-received messages. You can tune this behavior (or turn it off) through config; default is a two-second lookback.  

Wait, is that Everyone? Enter WS-Discovery.

If you look closely you’ll see that UdpServiceDiscovery really isn’t specific to SSDP at all — all of the protocol-specific stuff is in Ssdp.java and transits through yet another class ServiceDiscovery.java. What the heck is going on here? The short story is that SSDP doesn’t return most printers, and Microsoft always needs to be special. The longer story requires a quick aside into the insanity that was “WS*”.

Back in 1999 and 2000, folks realized that HTTP would be great for APIs as well as web pages — and two very different approaches emerged. First by a few months was SOAP (and it’s fast-follower WSDL), which tried to be transport-independent (although 99.9% of traffic was over HTTP) and was all about defined, strongly typed interfaces. The foil to SOAP was REST — a much lighter and Internettish way to think about machine-to-machine interaction.

SOAP was big company, REST was scrappy startup. And nobody was more SOAPy than Microsoft. They had a whole group (I’m looking at you Bill, and your buddy John too) that did nothing but make up abstract, overly-complicated, insane SOAP-based “standards” informally known as “WS*” that nobody understood or needed. Seriously, just check out this poster (really, click that link, zoom in and scroll for awhile, it’s shocking). Spoiler alert: REST crushed it.

Anyways — one of these beasts was WS-Discovery, a protocol for finding devices on a network that does exactly the same thing as SSDP. Not “generally the same thing,” but exactly the same thing. The code that works for SSDP works for WSD too, just swap out the HTTP-style metadata for XML. Talk about reinventing the wheel, yeesh. But at least this explains the weird object hierarchy in my discovery classes:

Since these all use callback interfaces and sometimes you just want an answer, I added OneShotServiceDiscovery that wraps up Ssdp and Wsd like this (where “4” below is the number of seconds to wait for UDP responses to come in):

Set<ServiceInfo> infosSSD = OneShotServiceDiscovery.ssdp(4);
Set<ServiceInfo> infosWSD = OneShotServiceDiscovery.wsd(4);

There’s an entrypoint for this too, so to get a WSD device list you can use (the example is my Epson ET-3760):

java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.OneShotServiceDiscovery \
    wsd
...
urn:uuid:cfe92100-67c4-11d4-a45f-e0bb9e278967 | wsdp:Device wscn:ScanDeviceType wprt:PrintDeviceType | http://192.168.5.228:80/WSD/DEVICE | (/192.168.5.228:3702)

Actually, a full WSD implementation is more complicated than this. The protocol defines a “discovery proxy” — a device on the network that can cache device information and reduce network traffic. A proxy advertises itself by sending out HELLO messages with type d:DiscoveryProxy; clients are supposed to switch over to use this service when it’s present. So so much complexity for so so little benefit. No thanks.

Don’t forget the broadcast bunch

And we’re still not done. SSDP and WSD cover a bunch of devices and services, but they still miss a lot. Most of these use some sort of custom broadcast approach. If you poke around in UdpServiceDiscovery you’ll find a few special case bits to handle the broadcast case — we disable the NOTIFICATION thread altogether, and just use the DISCOVERY thread/socket to send out pings and listen for responses. The Misc class provides an entrypoint for this; you can find my Roomba using broadcast port 5678 like this:

java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.Misc \
    255.255.255.255 5678 irobotmcs
...
============ /192.168.4.48:5678
{"ver":"3","hostname":"Roomba-3193C60472324700","robotname":"Bellvoomba","ip":"192.168.4.48","mac":"80:91:33:9D:E2:16","sw":"v2.4.16-126","sku":"R960020","nc":0,"proto":"mqtt","cap":{"pose":1,"ota":2,"multiPass":2,"pp":1,"binFullDetect":1,"langOta":1,"maps":1,"edge":1,"eco":1,"svcConf":1}}

Sometimes you don’t even need a ping. For example, devices that use thes Tuya platform just sit there and constantly broadcast their presence on port 6666 or 6667:

java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.Misc \
    255.255.255.255 6667 tuya
...
============ / 192.168.5.253:60913
????1r???8?W??⌂???▲????H??? _???r?9???3?o*?jwz#?$?Z?H?¶?Q??9??r~  ?U

OK, that’s not super useful — it’s my smart ceiling fan, but apparently not all of their devices broadcast on 6666, and the messages on port 6667 are encrypted (using a global key, duh — this code shows how to decrypt them). This kind of thing annoys me because it doesn’t really secure anything and just makes life harder for everyone. I’m going to register my protest by not writing that code myself; that’ll show them.

In any case, you get the point — there are a lot of ways that devices try to make themselves discoverable. I’ve even seen code that just fully scans the network — mini wardialers that check every possible address for specific open ports (an approach that won’t survive the eventual v4-v6 address transition!). It’d be nice if this was more standardized, but I’m happy to live with a little chaos in return for the innovations that pop up every day. It’ll settle out eventually. Maybe.

Now back to that Roku app, which will use something like 5% of the code I wrote for this post. Just one of the reasons I’m a fan of retirement — I can burn cycles on any rabbit hole I damn well please. Getting the code right can be tricky, so perhaps it’ll prove useful to some other nerd out there. And as always, please let me know if you find a bug!

Regulated software for software people

If you’ve built software at any scale, you know how the game works. You get requirements from somewhere — usually they’re wrong or at best incomplete. You do your best to implement and test them, and you ship. Users vote with their clicks as to what features work and which don’t — i.e., they refine the requirements for you — and then you repeat the process. Eventually you converge to a set of features that work, then you do it all over again with a new set of requirements.

If your cycle time is long, that’s called “waterfall” and folks judge you for it, which is sometimes fair but not always. If your cycle time is short, it’s called “agile.” Agile does have some advantages: user feedback gets incorporated more quickly, and doing things in smaller chunks generally results in fewer bugs. A lot of people have written a lot of boring religious articles about the differences here, but in reality most folks fall somewhere in the middle, and it’s usually fine. If you’re building the next Tinder or Candy Crush or whatever, that’s pretty much all the “process” you need to know.

But what if you’re building something for healthcare or another industry where the software is “regulated?” Oh my. “Regulation” is scary and mysterious, and people keep talking about going to jail. There’s a whole industry that as far as I can tell is built around paying protection money to consultants. So it’s not surprising that “regulated” in a software job description is a turn-off for lots of people. Still, it’s the cost of doing business for a lot of important things in the world, so let’s take a look at what’s really going on there.

An important caveat: I have never worked for the FDA or any regulatory agency. I’m not a lawyer. I’m just a guy who has written a bunch of regulated software and believes the advice I’ve received from the “regulatory industry” has been almost entirely crap (a very few people break that mold; you know who you are). My hope here is to give you a straightforward, clear-cut introduction so that you can enter into the process with enough confidence to avoid being bullied by silly hyperbole about going to jail or other craziness. The actual regulators I’ve known just expect you to do your best to build safe, reliable products, and they get that it’s a hard job. Understanding a few key things will not only keep you compliant, it will help you build better stuff. Honest.

The Big Secret

Most of my big-time regulatory work has been with FDA Class 1 and 2 medical devices. “Classification” is a risk stratification based on your “intended use” — a super-important bit of text that precisely defines what your device is supposed to do and how it’s supposed to be used. Nailing these down can be a fraught and expensive process all on its own.

But we’re getting ahead of ourselves here — a common problem in this space! In this piece I’m not going to go deep into the details of any specific regulation — because (cover your ears regulatory folks) they don’t really matter to you. OK, maybe they matter if you’re in a startup and you’re the one that has to actually do the filings … but that’s probably a horrible idea anyways. From a software perspective, pretty much all safety-focused regulation looks exactly the same and can be satisfied by adhering to a set of relatively small and, dare I say, pretty reasonable requirements.

This is a bigger secret than you might think. Thanks to the impenetrable nature of regulatory jargon and text, folks that can claim actual experience are in high demand. It’s to their advantage and job security to make it seem super complicated — nobody wants to go to jail, so we all just keep paying for vague explanations and double-talk. Of course that’s a broad brush —  but it’s more right than wrong.

Software first, not Regulation first

This is my biggest regret from my last regulated gig. I had stepped into a higher risk class of device, and we had hired a set of regulatory folks with no direct software experience. The company was very focused on agency approval, so there was a ton of pressure to get it “right.” Not good excuses, but it is what it is — I let our initial software processes be driven by regulation first. That’s not to say we didn’t build excellent software, because we did — but we did it with a very high burden on the team that honestly was mostly just wasted time and energy. I was lucky to lose only a couple of good people to the noise before we got things more-or-less straightened out.

Your job is to build great software. Finish reading this article, understand what you really need to be able to prove and document, talk to folks you trust, and then use your own software-focused best practices to meet the regs. It is the job of your regulatory team to take what you produce and “package” it into the right form for filings and auditors. This packaging takes work and a depth of understanding not many folks in the industry bring, so you’re probably going to get pushback. Stand your ground. Over time, it is for sure worth adding automation to generate different forms of documentation (this is where we finally ended up) and that’s great — but be confident that, if you execute well, you need not be bound by crazy redundant busywork.

Safety-based regulation in three bullets

Software regulation isn’t really intended to protect against bad or fraudulent actors — there are other laws for that. Instead, the point is to ensure that the specific risks and benefits associated with a product are understood and visible. From a software perspective, that means three things:

  1. You know what it’s supposed to do.
  2. It does what you expect.
  3. You’ve considered the risks.

The first two of these should look pretty familiar. #1 just says that you have a correct and detailed specification. #2 says you tested the software against that specification. These might need to be a bit more formal than you’re used to, but if you don’t have a good starting point, your product probably sucks anyways. You likely already use JIRA or some other system to track features/stories and bugs, so if you can generate the following reports you’re basically done with these first two requirements:

  • A list of features, each with sufficient detail to be implemented.
  • For each feature, a list of test cases that cover the feature.
  • For each test case, a record of each time it was executed and passed or failed.

Formal documentation of test cases and results — and especially links back to the specific features they exercise — can be pretty thin at many companies, where dedicated QA resources are hard to come by. If your tests are automated, a great start is just to log feature IDs along with each test case you run. Together with code coverage reports, that gets you a long way towards compliance. For manual test cases, you will need some way to keep track of things — I’ve used the Zephyr plugin for JIRA with good success, but there are tons of options out there.

Risk Assessment

Documented risk assessment (#3) is new to a lot of folks. The concept can take a bit of getting used to, because if you’re good at your job you’ve been assessing and addressing risks implicitly all along. Is there enough contrast to read this text in high-light situations? Will users understand what “accept” means in this situation? What happens if the user doesn’t scroll down to read the whole message? And so on. By the time you sit down for a “formal” risk analysis, you’ve probably taken care of most issues already.

And yet it’s a requirement to document a formal risk assessment for each feature. The best way I’ve found to manage this is to add a custom field to the requirements management system for risks, and ask folks to just make notes there along the way. Towards the end of the development phase, have the team sit down and clean them up and spend a bit of timing thinking about anything missed. That meeting tends to be pretty quick and actually serves as a nice double-check. Ultimately you’ll want to document four things for each risk identified:

  • What could happen.
  • The potential impact.
  • Some idea of how likely and how severe this would be (more on this below).
  • What you’ve done to mitigate or reduce the risk.

There are tons of rubrics for codifying “likelihood” and “severity” — red zone / green zone kind of stuff. I’m torn about these — there is definitely validity to the balance between risks that might cause actual physical harm but are so unlikely to occur that it would be silly to spend time on them, versus risks that have almost no real impact but could happen so frequently that it justifies extra work. But trying to get too precise is really quite hopeless — I’d just estimate low/medium/high on both dimensions and leave it at that.

Potential bugs are not “risks.” Of course bad code can cause all kinds of problems — but trying to capture that is a useless shell game. That “risk” applies to every feature, and the only mitigation is to develop and test better. Documenting this is useless. Software risks generally come down to user interface confusion, algorithms that break down given extreme inputs, that kind of thing. Honest, it gets easier once you do it for awhile.

Lastly, “documentation” or “user education” is a totally reasonable way to mitigate some risks. Sometimes something important is just complicated, and the user cannot be expected to understand how to use a feature without training and/or documentation. That’s OK! Just don’t use it as a crutch for bad design — your job after all is to build a helpful product, not an obtuse one. A trick that can increase the effectiveness of “mitigation by documentation” is to put the documentation directly into the user experience. For example, the first time the user clicks a particular button you might proactively pop up a dialog that can be dismissed once acknowledged.

The “Manufacturing” gotcha

Hopefully so far you’re feeling OK about all of this. A few tweaks to very standard practices and you’re pretty much capturing all the raw material you need. Woot! Ah, but wait.

Almost for sure you’re building modern software that runs as a service (in the cloud or otherwise), releasing new functionality on a regular basis — and that can make documentation a lot trickier. Maybe I should have mentioned this earlier, but I didn’t want to scare you off. Don’t worry, it’ll be ok.

Traditional medical devices are “things” — tongue depressors, MRI machines, cancer drugs, and so on. A great deal of up-front thought and effort and cost goes into figuring out what to make and how to make it. Prototypes are created. Factories and factory lines are set up. Raw materials are sourced. And then when you’re done, you flip a switch and stamp out thousands or millions of copies, exactly the same way, for years. Within that context, safety-based regulation makes a ton of sense. It expects to see “a” design record for each device. Auditors come in and ask to see “the” documents for a given product.

This worked ok when software shipped on a CD in a box. But when it runs as a service, updated and improved over many iterations in near real-time, things can get messy pretty quickly. Note this isn’t about “waterfall” vs. “agile,” it’s about frequent, incremental releases over time vs. one-and-done “manufacturing.”

My first stab at this didn’t work super-well. We basically just wrapped up each release, no matter how small, into its own set of documents — features, risks, test cases and results. When we did our first independent audit (internal, thankfully) the auditor asked for the documents for product X. I handed over dozens of these release packages and smiled confidently. We did a cool demo. They then said OK, you showed me this feature that does Y, where are the test results that prove it works? Seems like a reasonable request.

Yikes. Like almost every feature, this one had evolved over time. There were probably twenty stories related to it, scattered across dozens of releases, each one incremental, like “add option Z to the menu.” Was all the information there? Sure, you could figure it out if you really understood the product and had a couple of days to sort through it all. But answering that seemingly simple question in real time in the auditing room? No chance. And while I did say that it was the responsibility of your regulatory team to “package” your raw documents into something palatable to an auditor, this kind of synthesis is way too much to ask.

I’m sure there are many ways to address this issue, but we settled on something we called a “component document.” This was a single, authoritative, narrative document that could be used as the starting point for anybody trying to understand what the product did. It explained its purpose, the general approach to building it, and each major feature or feature area, assigning a unique identifier to each. The document was meant to be largely stable — that is, day-to-day features and bug fixes did not require changes to the text. An example might be a component-level feature that says “abnormal results will cause an alert to be sent to the medical team;” a corresponding release-focused requirement might describe specific alert conditions and channels for notification (like email). Adding SMS alerting would be a new requirement in a new release, but wouldn’t require updates to the component document.

By explicitly associating every requirement with a “component-level” feature, it became trivial to assemble coherent documentation packages. There were other benefits as well — for example, we found that if a requirement triggered a text change in the component document, it almost always warranted a full test pass rather than something more targeted. And the component document was a fantastic training vehicle for new engineers and even end users. It certainly isn’t always the case, but this time the regulatory framework really did directly help us improve our development process. Love that!

Almost there… honest!

At this point you should feel confident that you can build software in a way that satisfies the intent of safety-focused regulation. You understand and can explain what you’re building, you have tested it appropriately, you have assessed risks to health and safety — and you have the documents to prove it in an audit setting. This is really good, and frankly notably better than many self-claimed “compliant” software shops I’ve seen in real life. There is no orange jumpsuit in your future (at least for this reason).

That said, there are always more concepts in the regulatory framework you should be thinking about and evolving towards. None of these are all that challenging, and you should at least be prepared to explain to auditors how you think about them:

“V”erification vs “V”alidation

“V&V” is often used as a synonym for “testing” — which is pretty close, but obscures an important distinction between the two that you’ll need to address:

  • “Verification” ensures that features work as they are specified. It makes no judgment about whether the features do the “right” thing, just that they meet the spec.
  • “Validation” ensures that features do what users need them to do. They are really a test of the specifications themselves.

In an ideal world, verification tests are executed through automation and/or your engineering team, while validation tests are done by actual end users. In reality, most end-users aren’t qualified to do a good job, and you risk wasting time on “test theater” that doesn’t really prove anything. You’ll have to find your own way here; a reasonable approach might be to (a) make sure end-users are formally involved in the up-front process of creating specifications, and (b) label your test cases as “verification” or “validation” to show you’ve been thoughtful about both concepts.

Design Documents

Significant architectural decisions should be recorded in “design documents.” These are just engineer-focused documents in any form that help describe “how” the product is built. Think about the kind of documentation you’d like to show to a new developer on the team before they jump into code. Associating design documents with component-level features is a great way to keep a handle on how it all fits together.

Third-party software and/or “SOUP”

If your product incorporates COTS (“Commercial / Off The Shelf”) software, that also needs to be validated. Some vendors may be able to help you with this, and some may already have a base level validation that you can start from. But in most cases, you’ll want to show that the acquired software does what you need it to do. This is typically a “one-and-done” exercise where you (a) document your requirements and (b) write and execute tests cases to show the product satisfies those requirements.

This applies to third-party libraries you use as well, and even to your own internal software that may have been developed “way back when” without any documentation at all (sometimes called SOUP, for “Software of Unknown Provenance”). The same process applies — write some requirements, write some tests, run the tests, and have that documentation ready for auditors.

Surveillance

“Bugs found in the wild” is a fantastic measure of software quality (hint: fewer is better). Your regulatory team should be managing formal “complaints” (escalating to you as needed), but keeping track of which bugs were found post-release is a great practice that will serve you well. A quarterly meeting to discuss trends and identify problematic features shows that you’re taking it seriously, so keep meeting notes and be prepared to show a graph of incidents and their severity over time.

Approvals and Signatures

This is an area that really bugs me. Regulatory folks can get super hung-up on ink-based signatures and extreme measures to ensure that documentation is “tamper-proof.” Full stop, I think this is a waste of time. Regulation is not intended to stop a sophisticated bad actor — it’s supposed to help folks trying to do the right thing. The burden of security theater can be stupid high. My take:

  • The software you use to manage requirements and tests should require login and keep track of who creates/updates items in the system.
  • Don’t delete stuff; instead use “inactive” or “obsolete” statuses to keep irrelevant or mistaken entries out of everyday view.
  • Make sure that the appropriate people (especially end users) mark their approval of requirements and tests in the system by clicking a button or writing a comment, and be able to show a record of that.
  • Don’t go overboard.

A final note about audits and auditors

You’re never going to be “done” tweaking and evolving this stuff. Auditors are paid to find issues, and no matter how great you are, they’re going to find some. Don’t sweat it and don’t be defensive. Listen, create a plan to address what they find, and then — this is the real key — follow through. When that auditor comes back they’re going to assess your response, and the worst thing you can do is just ignore them. If you disagree, start a dialogue and you’re sure to find a reasonable compromise.

Bottom line — bureaucracy is bureaucracy, and there is for sure burden associated with complying with regulation. Some of that burden is stupid, and some of it helps. Believe it or not, the actual regulators really do understand this, and are always working to make it simpler (even right now). Your biggest challenge will be the “industry” of high-priced consultants who are incented only to keep you worried and paying their hourly fees. Don’t freak out. Put in a little work to understand the real intent, honestly work to incorporate the key concepts — and you’ll be just fine.