“Code Monkey think maybe manager want to write god damned login page himself.” — Jonathan Coulton
Truer words may never have been spoken. Set up a database, build a signup page, enforce some stupid password rules, salt and hash it, now create a login page, oops when does login expire, oops need a forgot password page. Or, integrate with an “identity provider” that requires you to use some crazy third party library with who-knows-what dependencies awash with unhelpful and vaguely-defined concepts like “claims” and “entities” and “level of assurance.” Dude, life is too short for this crap — I just want to know who’s logged into my app. And maybe (maaaaaaybe) sort them into a few buckets like “admins” or “read-only.”
Side note: You know who had this figured out? Windows NT and IIS in 1998. Check one box and your site is secured with the same credentials you use for everything else on the domain. Bliss.
Happily, excepting a few projects in Azure meant for my family, since retiring from corporate life I haven’t had much cause to worry about this stuff. But it came up this week during a conversation about “under-the-desk” servers. Every single company I’ve ever known has a few of these … servers that run little slapped-together homegrown enterprise tools that somehow become mission-critical. These days they aren’t really under somebody’s desk, they’re up in a virtual machine somewhere, but no difference. Vortex, Pokey, Nitro and Flora are just a few under-the-desk machines I’ve supported over the years.
It occurred to me that there was one of those tools that I’ve written again and again, and might be fun to put together as a small open source project. But before I tell you about that, I’m going to have to up my Java WebServer game a bit with … yes, wait for it … login. Sigh.
Social Login to the Rescue
I’m not doing it from scratch though, that’s for damn sure. It turns out that despite its (jargon-heavy history) OAuth2 has become a pretty ubiquitous standard for integrating with third-party services. It’s actually the same technology we explored in Part 3 of my SMART on FHIR tutorial. And while typically the standard is used to support data integration (e.g., by brokering access to personal health data in the FHIR case), it can serve as a pretty nifty basis for “social login” as well. This is what’s going on behind all of the “Login with Facebook/Google/etc.” buttons you see around the web. Seems like it just might do the trick for us.
Which service(s) you rely on depends on your use case, but an OAuth2-based implementation gives you a ton of options: Google, Facebook, Github, Microsoft / Office365, Twitter, Apple, we could go on for a long time here. There are minor differences in implementation, but not enough to cause much trouble. We’ll take a look at a few to tease out that diversity.
The bulk of the code here is just over 300 lines in OAuth2Login.java. Another 60 or so lines integrates it into WebServer.java, and that’s the lot. Not that much at the end of the day, but the jargon-to-code ratio is annoyingly high. I’ve tried to simplify things by limiting the exploration to a web-based flow without some more esoteric features; once the basics make sense, adding these back in is pretty easy.
To try the code yourself, first you’ll need to register a provider application. Let’s use GitHub, because they make it relatively simple:
Log into your GitHub account.
Under your profile icon, choose Settings / Developer Settings / OAuth Apps.
Copy your Client ID and Secret and save them for later.
To run the code, you’ll need access to a machine with a recent JDK, git and maven:
git clone https://github.com/seanno/shutdownhook.git
mvn clean package install
mvn clean package
vi config-ex.json # edit this file to include your GitHub Client ID and Secret
java -cp target/sdweb-1.0-SNAPSHOT-jar-with-dependencies.jar com.shutdownhook.sdweb.App config-ex.json
Finally, point your browser to https://localhost:3000/echo?msg=w00t. You’ll need to acknowledge the scary self-signed certificate, and then you should be redirected to GitHub where you’ll need to log in and approve access. When you’re redirected back from GitHub, you should see your GitHub login and email address. Success!
OK, but what actually happened?
Let’s see what’s going on under the covers here. We’ve got a web site, and want to delegate login duties to GitHub. We’ll need to understand (and implement) three things:
1. App registration & scopes
GitHub needs to know about our web site ahead of time. Each service does this differently, but for all of them we provide a “redirect URL” to our site, and receive an app (client) identifier and secret in return. After our user successfully logs in, their browser lands back on our configured “redirect” URL. This is an important piece of the security provided by OAuth2 — GitHub will only send credentials back to that HTTPS URL on our preconfigured domain. Our site certificate helps ensure that the URL is actually “us.”
When the browser hits this page, GitHub presents the user with their login page (if needed), then asks them for permission to share the requested data (scopes) with your application. If all goes well, they redirect the browser back to your site via the redirect URL.
If something bad happened (like the user declined to authorize your app), an error parameter (and possibly error_description and error_uri) will give you a sense of what happened.
Your state parameter will be returned to you. If this doesn’t match what you sent to auth, abort!
A single-use code that’s just one step away from a complete login.
Your last job is to swap that code for an access token (and other things). Do this by sending it in a POST to GitHub’s token URL, along with some proof that you are really you (your client_secret) and other noise. The response will include an access_token and you, my friend, are authenticated and authorized. Store this away in your own authorization cookie or other session state and feel free to go about your business, done and dusted.
Oh wait there’s more
For many OAuth2 use cases, you don’t need anything beyond the access_token. It allows you to call APIs on the user’s behalf, so fetching health conditions from FHIR or saving documents in Google Drive is good to go. But in our social login case, we almost certainly need some kind of unique login ID or email address, so that we can recognize the user across visits. We also need this if we’re going to differentiate between types of access — e.g., maybe firstname.lastname@example.org is granted admin rights on your site.
Depending on which provider you’re using, things here can start to get a little muddy. But most of the time we can use an add-on standard called OpenID Connect. First, add “openid email” to your scope parameter. This tells the provider that you’re doing an OpenID Connect login and would like to receive identifying information about the user, including their email. There are more options available, but that’s good enough for us.
Now when you call the provider’s token endpoint, in addition to the access_token you’ll receive an id_token that has the goods. The id_token is in JWT format — three Base64Url-encoded strings separated by dots. Since we’ve received our id_token from a trusted source, all we need to do is decode the middle section, which is its own JSON object that amongst other things includes:
A provider-unique token in the field sub, and
Their email in the field email.
Woohoo, we’re done! Except wait, not all OAuth2 providers support OpenID Connect. Like GitHub. Argh.
This is actually is a good example of the edge cases developers are always dealing with in the real world. Even when you do all of this correctly, a GitHub user can block their email address from public visibility. In that case it’s not included in the /user endpoint — but you can still find it using /user/emails (after all, they said you could have it when they agreed to your scope!). Always something.
Who can keep track of all these URLs?
There are tons of OAuth2 providers out there; I’ve collected deets on a few to get you started. If you use my code, all you’ll need is the Client ID and Secret; but the links are useful in case you want to roll your own. Good luck!
This stuff can change quickly, so please let me know if you find a mistake or would like me to add a new provider!
We ignored a lot!
Our implementation here is honestly pretty simple. It gets the job done — and that’s awesome — but there’s a ton more hiding underneath if you want to go there. Just for example:
We’ve implemented the “confidential client” version of OAuth2. This version relies on the ability to keep the client secret safe, which as a server-side app we’re able to do. But OAuth2 can also be used by “public clients” that can’t keep secrets safe (e.g., because they run entirely in a browser). These apps used to use something called the implicit flow, which counts only on the security of your HTTPS certificate and predefined redirect URLs for security. That ground is a little shaky though, so current implementations have added a feature called “Proof Key for Code Exchange.” This serves as kind of the inverse of the state parameter — proving to the server that the client asking to swap a code is legit.
Our authorization URLs include a parameter access_typewith the value online. This means that we’re just going to use our access_token for the life of the current session (or as long as the server allows us to, whichever is shorter). For Social Login this is more that enough, because we don’t really need the access_token for API access at all, except maybe in the GitHub or Amazon case where we use it once to fetch profile information.
If instead we passed offline for this parameter (and assuming the provider allows it), we would also receive a refresh_token in exchange for our code. We can use this to “refresh” our access_token whenever it expires — enabling scenarios that require long-lived API access without the security risk of forever-tokens. Maybe for something like sending an alert whenever a Google Drive file is updated. Nice!
Device Authorization Flow
There’s even an OAuth2 flow for devices that don’t have a browser or keyboard at all! The device displays a code, and then just polls waiting for an access_token to be available while the user enters the code on their phone or laptop or other device altogether. While they don’t tend to use OAuth2, you’ve almost certainly used a very similar experience authorizing apps like Netflix on a Smart TV.
OK, so now we can log in. Next!
Thanks to my snazzy new OAuth2 implementation, I can easily secure my web apps without ever writing “create table users……” again. And that is, most certainly, a worthwhile outcome. Now I can get back to that under-the-desk-app, which I’ll look forward to sharing in the coming weeks. Woot!
I want to be clear up front that I’m not a “methodology” guy. Whatever the hype, software methodology is inevitably either (a) full employment for consultants; (b) an ego trip for somebody who did something good one time under specific circumstances and loves to brag about it; or (c) both. I’ve built software for decades and the basics haven’t changed, not even once.
Break big things down into little things.
Write everything down.
Use a bug database and source control.
With that said, the rest of this might sound a little bit like software methodology. You have been warned!
Crossing the Ocean
I spend a bit of time these days mentoring folks — usually new startup CTOs that are figuring out how to go from nothing to a working v1 product. “Zero to Launch” is a unique, intense time in the life of a company, and getting through it requires unique (sometimes intense) behaviors. In all cases the task at hand is fundamentally underspecified — you’re committing to build something without actually knowing what it is. In bounded time. With limited resources. Who even does that? Startup CTOs, baby.
Like an ocean crossing, getting from zero to launch is a long journey that requires confidence, faith and discipline. There are few natural landmarks along the way, but there are patterns — the journey breaks down into three surprisingly clear and consistent phases:
You likely have two constituencies telling you what your software needs to do: (1) non-technical co-founders that see a market opportunity; and (2) users or potential users that want you to help them accomplish something. Each of these perspectives is essential, and you’d probably fail without them. But don’t be fooled — they are not going to give you clear requirements. They just aren’t. They think they are, but they’re wrong.
The first mistake you can make here is getting into a chicken-and-egg battle. Your partners ask for a schedule, you say you can’t do that without full requirements, they say they already did that, you point out the gaps, they glaze over, repeat, run out of money, everyone goes home. Don’t do that.
Instead, just understand and accept that it is up to you to decide what the product does. And further, that you’ll be wrong and folks will (often gleefully) point that out, and you’re just going to have to suck it up. This is why you hire program managers, because synthesizing a ton of vague input into clarity is their core competency — but it’s still on you to break ties and make judgment calls with incomplete information.
And I’m not just talking about invisible, technical decisions. I’m talking about stuff like (all real examples):
Does this feature need “undo” capability? If so how deep?
Do we need to build UX for this or can we just have the users upload a spreadsheet?
Can we cut support for Internet Explorer? (ok finally everyone agrees on that one)
What data needs to be present before a job can be submitted?
Does this list require paging? Filtering? Searching?
You get the idea. This can be hard even for the most egocentric among us, because really, what do we know about [insert product category here]? Even in a domain we know well, it’s a little bold. But there are two realities that, in almost every case, make it the best strategy:
Nobody knows these answers! I mean sure, do all the research you can, listen, and don’t be stupid. But at the end of the day, until your product is live in the wild, many of these are going to be guesses. Asking your users or CEO to make the guess is just an indirection that wastes time. Take whatever input you can, make a call, consider how you’ll recover if (when) it turns out you were wrong, and move on.
Normal people just aren’t wired to think about error or edge cases. For better or worse, it’s on you and your team to figure out what can go wrong and how to react. This is usually an issue of data and workflow — how can you repair something that has become corrupt? “Normal” people deal with these problems with ad-hoc manual intervention, which is a recipe for software disaster.
For this to work, you need to be obsessively transparent about what you’re building. Write down everything, and make sure all of your stakeholders have access to the documents. Build wireframes and clickthrough demos. Integrate early and often, and make sure everybody knows where the latest build is running and how they can try it. This isn’t a CYA move; that’s a losing game anyways. It’s about trying to make things real and concrete as early as possible, because people are really good at complaining about things they can actually see, touch and use. You’re going to get a ton of feedback once the product is live — anything you can pull forward before launch is gold. Do this even when it seems embarrassingly early. Seriously.
Transparency also gives people confidence that you’re making progress. As they say, code talks — a live, running, integrated test site is what it is. No magic, no hand-waving. It either works or it doesn’t; it has this feature or it doesn’t; it meets the need or it doesn’t. Seeing the product grow more complete day by day is incredibly motivating. Your job is to will it into existence.This is a key but often unstated startup CTO skill — you need to believe, and help others believe, during this phase.
Holy crap this is way bigger than we thought!
Once you’ve gotten over the first hump and folks have something to look at, things really start to heat up. Missing features become obvious. “Simple” tasks start to look a lot less simple. It can get overwhelming pretty quickly. And that’s just the beginning. Over on the business side of things, your colleagues are talking to potential customers and trying to close sales. Suddenly they desperately need new bells and whistles (sometimes even whole products) that were never on the table before. Everything needs to be customizable and you need to integrate with every other technology in the market. Sales people never say “no” and they carry a big stick: “Customers will never buy if they don’t get [insert one-off feature here].”
Herein we discover another problem with normal people: they have a really hard time distilling N similar instances (i.e., potential customers) into a single coherent set of features. And frankly, they don’t really have much incentive to care. But it’s your job to build one product that works for many customers, not the other way around.
During this phase, your team is going to get really stressed out, as every solved problem just seems to add three new ones on the pile. They’re going to want to cut, cut, cut — setting clear boundaries that give them a chance to succeed. This is an absolutely sane reaction to requirement chaos, but it’s on you to keep your team from becoming a “no” machine.
A useful measure of technical success is how often you are able to (responsibly) say “yes” to your stakeholders. But saying “yes” doesn’t mean you just do whatever random thing you’re told. It means that you’re able to tease out the real ask that’s hiding inside the request, and have created the right conditions to do that. It’s very rare that somebody asks you to do something truly stupid or unnecessary. Normal people just can’t articulate the need in a way that makes software sense. And why should they? That’s your job.
During this phase, you have to be a mediator, investigator, translator and therapist. Try to be present at every feature review, so you can hear what the business folks and users say first-hand. If you can’t be there, schedule a fast follow-up with your team to discuss any new asks while they’re still fresh. Never blind-forward requests to your team. Propose simpler alternatives and ask why they won’t (or will) work. Use a cascading decision tree:
What is the real ask? If you’re getting an indirect request through sales, ask them to replay the original conversation exactly — what words were used? If it’s coming from users, ask them to walk you through how they think the feature should work, click by click. Ask what they do now. What do other similar products do? Try to find other folks making the same ask — how do they word it?
Do we need to do something right now? Beyond just schedule, there are good reasons to delay features to “vNext” — you’ll know more once you’re live. Do we really need this for launch, or can it wait? One caveat here — be careful of people who want to be agreeable. I remember one company in particular where the users would say “it’s ok, we don’t need that,” but then go on to develop elaborate self-defeating workarounds on their own. It took awhile to get everyone on the same page there!
Can we stage the feature over time? This is often the best place for things to end up. Break the request down into (at least) two parts: something simpler and easier for launch and a vNext plan for the rest. You’ll learn a ton, and very (very) often the first version turns out to be more than good enough. Just don’t blow off the vNext plan — talk it out on the whiteboard so you don’t have to rebuild from scratch or undo a bunch of work.
Is there something else we can swap for? Sometimes yes, sometimes no. And don’t turn stakeholder conversations into horse trading arguments. But costs are costs, and if you can remove or delay something else, it makes launch that much closer. Again, you’re always learning, and there’s no honor in “staying the course” if it turns out to be wrong. Be smart.
This phase is all about managing up, down and sideways. Things will get hot sometimes, and people will be frustrated. Reinforce with your stakeholders that you’re not just saying “no” — you’re trying to figure out how to say “yes.” Remind your team that you understand the quantity-quality-time dilemma and that if there’s a fall to be taken, it’s on you not them. And tell your CEO it’s going to be OK … she’ll need to hear it!
Will this death march ever end?
You might notice that, so far, I haven’t mentioned “metrics” even once. That’s because they’re pretty much useless in the early stages of a product. Sorry. Products start out with one huge issue in the database: “build v1.” That becomes two, then four, and suddenly you’re in an exponential Heather Locklear shampoo commercial. New features come and go every day. Some are visible and quantifiable, but many are not. You are standing in for metrics at first — your gut and your experience. Read up on proton pump inhibitors my friend.
But as you get closer to launch, this balance shifts. Requirement changes slow down, and issues tend to look more like bugs or tasks — which tends to make them similar in scope and therefore more comparable. There’s some real comfort in this — “when the bug count is zero, we’re ready to launch” actually means something when you can measure and start to predict a downward trend.
But things get worse before they get better, and sometimes it feels like that downward shift will never happen. This is when the most grotty bugs show up — tiny miscommunications that blow up during integration, key technology choices that don’t stand up under pressure, missing functionality discovered at the last minute. Difficult repros and marathon debugging sessions suck up endless time and energy.
The worst are the bug pumps, features that just seem to be a bundle of special-cases and regressions. I’ve talked about my personal challenge with these before — because special-cases and regressions are exactly the symptoms of poor architecture. Very quickly, I start to question the fundamentals and begin redesigning in my head. And, sometimes, that’s what it takes. But just as often during this phase, you’re simply discovering that parts of your product really are just complicated. It’s important to give new features a little time to “cook” so they can settle out before starting over. Easy to say, tough to do!
During this home stretch, you need to be a cheerleader, mom and grandpa (please excuse the stereotypes, they’re obviously flawed but useful). A cheerleader because you’re finding every shred of progress and celebrating it. A mom because you’re taking care of your team, whatever they need. Food, tools and resources, executive air cover, companionship, music — whatever. And a Grandpa because you’re a calming presence that understands the long view — this will end; it’s worth it; I’ve been there.
I can’t promise your company will succeed — history says it probably won’t. But I can promise that if you throw yourself into these roles, understand where you are in the process, stay focused, hire well and work your butt off, you’ve got a really good chance of launching something awesome. I’m not a religious guy, but I believe what makes humans special is the things we build and create — and great software counts. Go for it, and let me know if I can help.
The beach outside our Whidbey place is amazing. There’s about twenty yards of firm sand and rocks along the shore, then a broad, flat, soft expanse of sand/mud/clay for just under 100 yards, then maybe 30 yards of firm sandbars. Beyond the sandbars, the channel drops to a depth of about 500 feet or so (the first “steps” along this drop-off are the best places to drop a crab pot).
The tide sweeping in and out over this shallow area changes our back yard dramatically from hour to hour. At the highest high tide there’s no beach at all — in the Spring whales swim just a few yards away, sucking ghost shrimp out of the mud flats. During summer low-low tides, we head out to the sand bars where you can dig for horse clams and pick up crabs hiding in the eel grass (while Copper chases seagulls for miles).
I know it sounds a bit out there, but the rhythm of our days really does sync up with the water — and it’s a wonderful way to live. “What’s the tide doing today?” is the first question everybody seems to ask as they come down for coffee in the morning. And that, my friends, sounds like fodder for another fun project.
What’s the tide doing today?
NOAA publishes tide information that drives a ton of apps — I use Tides Near Me on my phone and the TideGuide skill on Alexa, and both are great. But what I really want is something thatshows me exactly what the tide will look like in my back yard. For some reason I have a really hard time correlating tide numbers to actual conditions, so an image really helps. (As an aside, difficulty associating numbers with reality is a regular thing for me. I find it very curious.) For example, if you were to stand on the deck in the afternoon on September 30, what exactly would you see? Maybe this?
Those images are generated by (a) predicting what the tide and weather will be like at a point in time, and then (b) selecting a past image that best fits these parameters from a historical database generated using an exterior webcam, NOAA data and my Tempest weather station. So the pictures are real, but time-shifted into the future. Spooooky!
Actually, my ultimate goal is to create a driftwood display piece that includes a rotating version of these images together with a nice antique-style analog tide clock. But for today, let’s just focus on predictions and images.
Variations in pull from the Moon’s gravity on the Earth. The side facing the Moon has increased gravity, and the side opposite the moon has slightly less. Both of these cause liquid water on the surface to “bulge” along this axis (more on the closer side, less on the far side).
The same thing happens due to the Sun’s gravity, but less so. Tides are most extreme when the sun and moon “line up” and work together; least so when they are at right angles to each other.
The Earth is spinning, which combines with orbital movement to change which parts of the Earth are being pulled/pushed the most at any given time.
The Earth is tilted, which changes the angles and magnitude of the forces as the seasons change. One consequence of this is that we tend to have daytime lows in the Summer and nighttime lows in the Winter.
Weather (short-term and seasonal) can change the amount of water in a specific location (storm surges being a dramatic example).
Local geography changes the practical impact of tides in specific locations (e.g., levels present differently over a wide flat area like my beach vs. in a narrow fjord).
All of this makes it really tough to accurately predict tide levels at a particular time in a particular place. Behavior at a given location can be described reasonably well by combining thirty-seven distinct sine waves, each defined by a unique “harmonic constituent.” NOAA reverse-engineers these constituents by dropping buoys in the ocean, measuring actual tide levels over a period of months and years, and doing the math. Our closest “harmonic” or “primary” station is across the water in Everett.
“Subordinate” stations (our closest is Sandy Point) have fewer historical measurements — just enough to compute differences from a primary station (Seattle in this case). But here’s the really interesting bit — most of these “stations” don’t actually have physical sensors at all! The Sandy Point buoy was only in place from February to April, 1977. In Everett, it was there for about five months in late 1995. To find an actual buoy you have to zoom all the way out to Port Townsend! This seems a bit like cheating, but I guess it works? Wild.
You can query NOAA for tide predications at any of these stations, but unless there’s a physical buoy all you really get is high and low tide estimates. If you want to predict water level for a time between the extremes, you need to interpolate. Let’s take a look at that.
The Rule of Twelfths
It turns out that sailors have been doing this kind of estimation for a long, long time using the “Rule of Twelfths.” The RoT says that if you divide the span between extremes into six parts, 1/12 of the change happens in the first part; 2/12 in the next; then 3/12, 3/12 again, 2/12 and 1/12 to finish it out. Since the period between tides is about six hours, it’s a pretty easy mental calculation that would have been good to know when I was fifteen years old trying to gun my dad’s boat through the channel off of Ocean Point (spoiler alert: too shallow).
Anyways, I use this rule together with data from NOAA and simple interpolation to predict tide levels on my beach for any given timepoint. The code is in NOAA.java and basically works like this:
The resulting list is returned to the caller as a Predictions object.
The Predictions object exposes a few methods, but the most interesting one is estimateTide, which does a binary search to find the predictions before and after the requested timepoint, then uses linear interpolation to return a best-guess water level. The resulting estimations aren’t perfect, but they are really very accurate — more than good enough for our purposes. Woo hoo!
OK, let’s back up a bit and look at the code more broadly. Tides is a web app that primarily exposes a single endpoint /predict. It’s running on my trusty Rackspace server, and as always the code is on github. To build and run it, you’ll need a JDK v11 or greater, git and mvn. The following will build up the dependencies and a fat jar with everything you need:
git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox && mvn clean package install
cd ../weather && mvn clean package install
cd ../tides && mvn clean package
To run the app you’ll need a config file — which may be challenging because it expects configuration information for a Tempest weather station and a webcam for capturing images. But if you have that stuff, go to town! Honestly I think the code would still work pretty well without any of the weather information — if you are interested in running that way let me know and I’d be happy to fix things up so that runs without crashing.
The code breaks down like this:
Camera.java is a very simple wrapper that fetches live images from the webcam.
NOAA.java fetches tide predictions, augments them with the RoT, and does interpolation as discussed previously.
Captures the current “day of year” (basically 1 – 365) and “minute of day” (0 – 1,439). It turns out that these two values are the most critical for finding a good match (after tide height of course) — being near the same time of day at the same time of year really defines the “look” of the ocean and sky, at least here in the Pacific Northwest.
The capture stuff runs twice hourly via cron job on a little mini pc I use for random household stuff; super-handy to have a few of these lying around! Once a day, another cron job pushes new images and a copy of the database to an Azure container — a nice backup story for all those images that also lands them in a cloud location perfect for serving beyond my home network. Stage one, complete.
Picking an Image
The code to pick an image for a set of timepoints is for sure the most interesting part of this project. My rather old-school approach starts in Tides.forecastTides, which takes a series of timepoints and returns predictions for each (as well as data about nearby extremes which I’ll talk about later). The timepoints must be presented in order, and typically are clustered pretty closely — e.g., for the /predict endpoint we generate predictions for +1, +3 and +6 hours from now, plus the next three days at noon.
By querying in stages like this, we end up with a candidate pool of images that, from a tide/time perspective, we consider “equivalently good.” Of course we may just find a single image and have to use it, but typically we’ll find a few. In the second pass, we sort the candidates by fit to the predicted weather metrics. Again we use some thresholding here — e.g., pressure values within 2mb of each other are considered equivalent.
At the end of the day, this is futzy, heuristic stuff and it’s hard to know if all the thresholds and choices are correct. I’ve made myself feel better about it for now by building a testing endpoint that takes a full day of actual images and displays them side-by-side with the images we would have predicted without that day’s history. I’ve pasted a few results for August 30 below, but try the link for yourself, it’s fun to scroll through!
Other Ways We Could Do This: Vectors
Our approach works pretty well, even with a small (but growing!) historical database. But it’s always useful to consider other ideas. One way would be to replace my hand-tuned approach with vector-based selection. Vector distance is a compelling way to rank items by similarity across an arbitrary number of dimensions; it appeals to me because it’s pretty easy to visualize. Say you want to determine how similar other things are to a banana, using the properties “yellowness” and “mushiness” (aside: bananas are gross). You might place them on a graph like the one here.
Computing the Euclidian distance between the items gives a measure of similarity, and it kind of works! Between a papaya, strawberry and pencil, the papaya is intuitively the most similar. So that’s cool, and while in this example we’re only using two dimensions, the same approach works for “N” — it’s just harder to visualize.
But things are never that simple — if you look a little more deeply, it’s hard to argue that the pencil is closer to a banana than the strawberry. So what’s going on? It turns out that a good vector metric needs to address three common pitfalls:
Are you using the right dimensions? This is obvious — mushiness and yellowness probably aren’t the be-all-end-all attributes for banana similarity.
Are your dimensions properly normalized? In my tide case, UV measurements range from 0 – 10, while humidity can range from 0 – 100. So a distance of “1” is a 10% shift in UV, but only a 1% shift in humidity. If these values aren’t normalized to a comparable scale, humidity will swamp UV — probably not what we want.
How do you deal with outliers? This is our pencil-vs-strawberry issue. A pencil is “so yellow” that even though it doesn’t remotely match the other dimension, it sneaks in there.
These are all easily fixable, but require many of the same judgment calls I was making anyways. And it’s a bit challenging to do an efficient vector sort in a SQL database — a good excuse to play with vector databases, but didn’t seem like a big enough advantage to worry about for this scenario.
Other Ways We Could Do This: AI
My friend Zach suggested this option and it’s super-intriguing. Systems like DALL-E generate images from text descriptions — surprisingly effective even in their most generic form! The image here is a response to the prompt “a photographic image of the ocean at low tide east of Whidbey Island, Washington.” That’s pretty spooky — even includes an island that look a lot like Hat from our place.
With a baseline like this, it should be pretty easy to use the historical database to specialty-train a model that generates “future” tide images out of thin air. This is exciting enough that I’m putting on my list of things to try — but at the same time, there’s something just a bit distasteful about deep-faking it. More on this sometime soon!
One nice little twist — remember that I pushed the images and database to an Azure container for backup. There’s nothing in those files that needs to be secret, so I configured the container for public web access. Doing this lets me serve the images directly from Azure, rather than duplicating them on my Rackspace server.
I also forgot to mention the Extremes part of tide forecasting. It turns out that it’s not really enough to know where the water is at a point in time. You want to know whether it’s rising or falling, and when it will hit the next low or high. We just carry that along with us so we can display it properly on the web page. It’s always small things like this that make the difference between a really useful dashboard and one that falls short.
I’ll definitely tweak the UX a bit when I figure out how to put it into a fancy display piece. And maybe I’ll set it up so I can rotate predictions on my Roku in between checking the ferry cameras! But that is for another day and another post. I had a great time with this one; hope you’ve enjoyed reading about it as well. Now, off to walk the beach!
Over the last few months I’ve been working on a project with the good folks at TCP — the latest stopover on my long, painful, only-debatably-successful journey to use technology to benefit health and healthcare in the world. I’ve written about this project a few times already, and I continue to be excited about the potential for SMART Health Cards and Links to get important information in front of the right people when they need it. In this post I’m going to try to push on that “in front of the right people” bit by going into nerdtastic detail about the SMART Health Viewer application we’ve been building. The code is all MIT-licensed, so I hope you’ll pick out anything useful for your own projects. Suit up, folks!
Disclaimer #1: I 100% do not speak for TCP — I’m just volunteering my time towards this work. Anything in the below that you find objectionable is on me and not them! 🙂
Disclaimer #2: This is all pretty techy and more than a bit dry; it probably won’t be your next favorite beach read. I’ve tried to keep it moving along, but my real objective is to just dump a ton of detail to help out other folks trying to build solid implementations. Next up will be some much more entertaining woodworking stuff, I promise!
A Quick Tour
While there are a few twists and turns under the covers, the app itself is really very simple:
Read a SMART Health Card or Link using a barcode scanner, camera or copy/paste.
View the health information, including (when available) provenance data.
Save the information, using built-in copy/paste buttons or as a single document image.
The application can also run within the context of a SMART-on-FHIR enabled EHR system like Epic or Cerner. Additional features are available in this mode, for example:
The current patient record can be searched for SMART QR codes (e.g., on a scanned insurance card).
Rendered health information can be saved back into the patient record.
Potential patient mismatches are flagged (e.g., if the current patient is Bob Smith but a COVID-19 vaccine record is for Jane Doe).
The viewer is a single page application built using React and Create-React-App (I swear CRA was basically deprecated ten minutes after I learned how to use it). Most of the interface uses Material UI, which is truly a blessing for folks like me that are design-impaired. The source is available on GitHub under an MIT license. The snippets in this post reference the version tag shutdownhook.2 so they’ll stay consistent even as the app evolves. So there you go, logistics out of the way!
An SPA running exclusively in the client browser offers two key benefits: first, it makes the app super-easy to host — any static web server (or even just an AWS bucket) can do the job. More importantly, it means that sensitive health information never leaves the client. This pretty dramatically reduces the exposure to privacy breaches, a Very Good thing.
The React component tree looks something like the below. We’ll examine each in detail, but for now think of this hierarchy as a roadmap to the application:
(Note that if you want to actually edit the code for real, add -b new_branch_name to that checkout command to start a branch).
The app will start up with a self-signed certificate (that you’ll have to approve) at https://localhost:3000/. If you have a SMART COVID vaccine card, scan it by clicking the “Take Photo” tab and holding it up to your camera. Or use the “Scan Card” tab and paste in the contents of a demo patient summary or insurance card. Pretty neat!
To deploy a version of the site, just run a build with npm run build, then copy the entirety of the “build” directory to any static web server. I use an Azure blob container for live testing, and keep it up to date using AzCopy like this (remember to azcopy login first!):
A provider using the viewer may run it as a “SMART on FHIR Provider Launch” application. You can read a ton about provider launch apps elsewhere on my blog, but in a nutshell it means that the app runs in an iframe (or similar) within the EHR interface, inheriting its user and patient context. The embedded app is granted authorization to make read and write calls against data in the EHR on behalf of the logged in user. It’s a nice setup, albeit with some pretty inconsistent implementations.
In any case, you can try this out by running the viewer in the SMART Launcher, which simulates an EHR and provides some test data. On the page, make sure “Simulate launch within the EHR UI” is checked, enter https://localhost:3000/launch.html?client=abc into the “App’s Launch URL” box, and click the “Launch” button. You’ll be asked to “sign in” as a provider, select a patient, and then you’ll be right back in our familiar viewer interface — surrounded by EHR goodness and with a few new options available.
The code starts to look like a typical web app in App.js. A MUI Tabs component handles top-level navigation between the content panels swapped into the next div: an about page, controls for capturing data, and one for displaying it. At the bottom is an optional footer specific to TCP, since we’ll be hosting a version of the app for use in the real world.
If there’s a SHX to display, it’s held in React state as scannedSHX. The Scan, Photo and Search components call back up to the App component using the viewData function to set it, and it’s pushed down to the Data component for rendering.
You’ll also see a bunch of config calls in this code (and throughout the project). Defaults are in (duh) defaults.js — domain-based overrides layer on top of a base set of options. Any config value can also be overridden in the query string (a common use for this last is “initialTab” which can be used to drop the user directly onto one of the scanning modes rather than starting with About).
Once a SHX has been scanned, control passes to the Data tab. Gross parsing errors can show up here, or a request for a passcode if needed. More typically, the data will be parsed and result in one or more FHIR “bundles” — collections of signed or unsigned FHIR resources that work together. If multiple bundles are present, a dropdown allows the user to select between them.
Scanning with a handheld scanner (like this one I use) isn’t particularly interesting — scanners just send keystrokes, so really you just need a textbox. Just remember to set the focus correctly and auto-submit if the user hits (or the scanner sends) a “return” at the end of the code. A bonus of the textbox is that copy/pasting codes is super-handy during development.
Scanning with a camera is much more interesting. We use qr-scanner, a solid and reliable module for picking barcodes out of camera feeds or static images. In its simplest form, you just instantiate the module, call start, and wait for it to find a QR code. Ah, but of course it’s never quite that easy.
First of all, we may not be able to instantiate a camera at all. Remember that the viewer is built to be embedded with an EHR, which often happens within an iframe that is subject to a number of security restrictions. One of these (unless explicitly permitted with an “allow” policy) is access to connected cameras. If the viewer detects this error case, it replaces the scanner element with a button to pop-up a capture window. This stand-alone window (captureQR.html) can access the camera (still subject to user approval, of course) and passes detected QR codes back to the iframe before closing itself. It’s maybe a little hokey, but gets the job done pretty well.
Picking the right camera can also be a challenge. The viewer is meant to be usable both on mobile and laptop/desktop systems, which can have very different camera setups. The browser allows cameras to be selected by “facing mode” (primarily “user” or “environment”) or by an internal ID that isn’t necessarily correlated with a user-recognizable label. The viewer tries to balance all of these with the following approach mostly in switchCamera.js.
If there’s only one camera in the system, don’t show any of this at all!
Another twist for this logic is that it needs to be usable in both the React component (Photo.js) and the pop-up simple HTML version (captureQR.html) we discussed earlier. This turns out to be more challenging than I expected, but is accomplished by including switchCamera.js as a script tag way up at the top of the React hierarchy in index.html. The most interesting thing about this code — other than some reusable bits for iterating cameras and such — is the double-click detection, which somehow is still a complicated thing to do in 2023.
Last is the code that pauses the camera after a configurable timeout. In one of our early demos, there seemed to be a steady memory leak that persisted as long as the camera was active. Typically the camera is only visible for a short time and this doesn’t matter, but if for some reason the page is left open, it can eventually crash the browser. The leak doesn’t seem to happen on all browsers or platforms, so more research to do. But to be safe, we just shut the camera off if it doesn’t find a QR code within this timeout.
Scanning Stuff: Searching the EHR
This is really an exploratory feature, but I’m betting something like it will be useful in some workflows. Primarily for SMART Health Insurance Cards, it works when physical cards are scanned into the EHR during check-in. If payers start printing SHX QR Codes on their cards, it could be useful to pull the structured FHIR data out of those QRs based on the scanned images.
When executed in EHR context, the code in Search.js digs around in the patient record looking for scanned files that might have QR codes on them. This happens in two passes:
Separate from the FHIR-specific stuff, there’s a nice React pattern in here too. Searching and scanning can take some time, so it happens async as part of an Effect. Nothing special about that; the neat part is that we run through the effect multiple times — once for the search and then one for each document searched. On each pass the UX is updated with information about the step — if a QR is found we route directly to the Data tab, and if not we report back and allow the user to search again if appropriate. In a world where it often seems like I’m wedging what I want to do into the React lifecycle, I was pleasantly surprised with how well this matched up.
“Scanning” Stuff: Viewer Prefix
OK this isn’t really scanning at all, but there is one more way that SHX data can make it into the viewer. If you’ve dug into the SMART Health Links spec, you’ll have encountered the viewer URL that can be prepended to the shlink:/ data itself. Our viewer supports that model via shlink.html. No muss, no fuss!
Reading the SHX: resolveSHX
No matter how they’re wrapped and packaged, the endgame for SMART Health Cards and Links is one or more FHIR bundles holding actual health information. The code in SHX.js and resources.js is responsible for sorting all of that out and building up a set of consistent structures that are (more or less) easy to render. This starts with verifySHX, which receives the scanned code — either a shc:/ or shlink:/ string, with the SHL possibly hiding behind a viewer URL hash prefix.
This function returns its work to the caller as a “status” object. The only thing guaranteed to be in the object is a shxStatus code which provides the overall result of the operation. It’s important to keep in mind that “ok” here doesn’t necessarily mean we found usable data — it just means that we were able to resolve the SHX and we were at least able to parse bundles out of it. The status of each bundle is its own thing, as we’ll see later.
The first thing verifySHX does (after setting up some exception handling) is to call resolveSHX. This dude’s job is to normalize the SHX down to two lists: one containing signed FHIR bundles (aka “verifiable credentials”) and one for bundles that are unsigned. For SMART Health Cards, this is easy — the input is a verifiable credential, so we just add it to the list and get out of dodge. (Note we’re taking advantage of the fact that our SHC verification library accepts VC values in a number of formats, including the shc:/ strings read out of QR codes.)
SMART Health Links are a more complicated story; for those we drop down into resolveSHL, which starts with decodeSHL — just a bit of fancy base64-decoding that gets us the payload so that we can throw exceptions if the payload requires a passcode or has expired. Note both of these are just hints to support our user experience — it’s up to the SHL hoster to actually enforce them. So you’ll see similar exceptions thrown later, when we actually request the manifest…
…which happens in fetchSHLManifest. In most cases this just a simple POST with a few parameters. The one exception is for SHLs with the “U” flag, used when the SHL contains only one file and bypasses a formal manifest. When our code detects this, it fakes up a manifest so that the rest of the code can proceed normally.
The rest of resolveSHX loops over each file in the manifest, downloading and decrypting the content and populating the verifiableCredentials and rawBundles arrays as appropriate. The one interesting thing here is resources that aren’t bundles at all — which is fine, we just cons up a bundle-of-one so they’re consistent for the rest of the code.
OK, take a breath. At this point we’ve taken the input SHX and turned it into two arrays — one with verifiable credentials and one with unsigned (“raw”) bundles. The next step is to turn those into a single list of bundles (statusObj.bundles) with consistent format and metadata. Let’s go.
Reading the SHX: Bundles and Organization
First we iterate over each verifiable credential we’ve collected and use smart-health-card-decoder to verify its signature and content. The directories we trust are set by configuration; be careful if you’re going to deploy any of this to production! Note that as of this writing, some of the FHIR validation rules in the decoder are a bit over-harsh; they were built for a first-generation of SMART Health Cards and need to be updated. The newest branch of the viewer actually supports a “permissive” configuration that skips some of these, but I’m going to see about a PR for the decoder soon as well.
Next we iterate over the raw bundles and add them to the list; this is obviously much simpler because there’s not much to verify.
When all is said and done, we have a bundles array that contains the original FHIR object, any bundle-specific errors, a “certStatus” field and signature metadata (if present). The last step here is to call organizeResources on each bundle. Organization has two purposes:
Create structures that make it easy to work with the bundle and resolve references.
Identity the bundle “type” which we’ll use to pick a renderer later on.
For #1, we create a simple array containing every resource, and two maps, one keyed by resource type and one by id (actually double-keyed by fullUrl and the resource ID itself, which provides some resiliency across different implementations). There’s a lot of redundancy here of course, but it prevents renderers re-implementing loops and lookups over and over and over.
#2 is driven by findTypeInfo, a rather grotty set of functions that dig around in the bundles to figure out “what” they are. For example, tryTypeInfoPatientSummary looks for a Composition resource coded with LOINC 60591-5. If you’re looking to add a new type to the viewer, this is where to start.
These routines also supply human-readable labels that can be used in a dropdown, and a list of resources that represent the “subject” of the data. This “subject” list drives the behavior of WrongPatientWarning.js when in EHR context — more on that later.
Another breath — now we have a nice, clean, typed list of bundles in our SHX — time to render them.
Rendering the SHX: Error cases and metadata
Way back in Data.js, the status object with all its goodness is stored in React state. Based on the shxStatus, we do one of four things:
The viewer renders one bundle at a time. If the SHX contains multiple bundles, renderBundleChooser displays a dropdown allowing the user to navigate between them.
If the bundle is verifiable, ValidationInfo.js displays details about the signature in a banner above the data. There’s definitely some usability work to do on this; communicating signature information in a way that humans can actually comprehend is a tricky business.
I’ll call out a few interesting implementation details of these in later sections — but in general rendering FHIR data well is just a slog through the mud. As I’ve said before: Healthcare data sucks, and FHIR is no exception. Everything can be null, everything can be a list … don’t get me started again. The methods in fhirUtil.js and fhirTables.js try to create some reusable sanity around it all, but really it just is what it is. If you want to write production-caliber FHIR display code, it’s going to be ugly and filled with defensive checks. Just learn to love the pain.
HOWTO: Add a New Type Renderer
Giving this its own section just to make it as clear is possible. The code is built so that as more types of data are shared using SMART Health Cards and Links, the viewer can be easily updated to understand and display them. This has already happened once when the good folks at Docket added the Immunization record renderer — I hope there are many more to come!
Add a new BTYPE constant and tryTypeInfoXXX function in resources.js. The tryTypeInfoXXX function should return undefined if the bundle is not a match, otherwise an object with its BTYPE constant, a human-readable label for the bundle, and a list of the resources that identify the subject of the bundle (if any).
Add your new rendering component to the switch statement in Data.js. Your component will receive the organized resource info and a deferred code renderer (“dcr” … see the terminologies section later for details) as input. Feel free to use or ignore fhirUtil and fhirTables — whatever works for your data type!
And that’s it. Eventually we might abstract things out even a bit more, but it’s a good start. If you have any trouble, ping me and I’ll be happy to help.
Terminologies: Deferred Code Rendering
FHIR relies heavily on codes to describe things. These codes may be relatively simple (like Marital Status, which currently includes just eleven values), or they may be mind-numbingly complex (like LOINC, commonly used for lab results and observations, which includes more than 50,000 multi-part codes).
Codes are a key tool in the attempt to make data interoperable — useful not just to the person who created it, but to anyone who receives it. A medical record that indicates “Resfriado Común” may have limited use outside of the Spanish-speaking world, but SNOMED code 82272006 means “a common cold” (or “rhume” or “verkoudheid” or whatever) no matter where it’s received. Codes also help avoid mistakes due to typos, make it easier to do robust research and use computers to work with information, and basically are just kind of great.
But when all you want to do is display a human-readable version of the code, they’re kind of annoying. There are basically an infinite number of coding systems, and as we’ve seen they can get pretty big. Some are used a lot, some very rarely. Some are published online in easily-readable formats, others are not. But everybody has to deal with them.
codes.js attempts to wrangle all of this in a way that works well within the constraints of the React client-side, synchronous rendering model. Basically it works like this:
The dcr is passed as a property to each rendering component (e.g., here).
Whenever a component needs to display human-readable text for a code, it calls dcr.safeCodeDisplay or dcr.safeCodingDisplay. If the text can be rendered synchronously, it does that. Otherwise, it queues up the codeset for asynchronous download and returns a placeholder.
Back in useEffect, if any codesets were queued for download, a re-render is triggered which inserts the final, downloaded values.
If you’re writing a rendering component — just use the dcr methods and ignore the rest.
The list of known systems is at the top of codes.js. Each is keyed with its canonical URL, corresponding to the “system” value in FHIR Coding and CodeablConcept structures. url points to the machine-readable code set (expected to be fetched with a simple GET), and the type (default “fhir”) indicates how that source data is to be parsed into a simple code-to-text dictionary. The current list captures most codes needed for the current use cases, but will surely need to be expanded in the future.
Transformed code sets are also cached in browser-local storage (TTL and other settings are in config) — the end result being that most renderings can be completed fully-synchronously. All this is a lot of work, but the rendering developer experience is super-clean, which I’m kind of proud of.
Saving Rendered Views
The nice thing about the viewer, especially when used with a SHL viewer prefix, is that it “just works” — providers don’t need any fancy software or IT work to receive the data in a SHX. It can easily be copy/pasted (more on this in a bit!) or printed out to be incorporated into a chart. This is really good stuff and not to be underestimated.
However — the long-term endgame for interoperability is to save the data in structured form back into an EHR or other system. Unfortunately while FHIR “read” has become more-or-less expected functionality, “write” is still a little sketchy. And even when that capability is well-implemented, it’s not altogether obvious “where” to save the data. For example, we probably don’t want to fully inter-mingle patient-reported data with a condition list curated by a long-time primary care provider.
There’s a lot to figure out here, but we’ve tried to push things just one small step forward by enabling rendered views to be saved as images, either as a downloaded file or directly in the EHR. We create the image using html2canvas, a really impressive package that does exactly what we need, wrapped up inside of the divToImage function in saveDiv.js. It would probably feel more natural to do this as a PDF rather than a JPEG, but I had a really tough time getting reliable PDF rendering on the client — and since JPEG is well-supported for scanned documents, I just stuck with that.
If you scan a SMART Health Insurance Card SHL (e.g., the demo one here) you’ll see a number of “copy” icons placed next to important values. Of course anybody can copy/paste anything, but the idea here is to make it easy to grab the bits and pieces that are useful … for example, if you’re a provider and need to enter insurance member and group numbers into an intake form.
This is implemented in Copyable.js, a standalone React component that accepts two props — the copyable text and, optionally, JSX that represents a more complex rendered view of the data. You can see this at work here (also in the picture), where only the plan number should be copied, but we want to display the name as well.
The hits keep coming, this time in the PatientSummarySection.js component. IPS bundles use a Compositionresource to describe how the other resources (medications, observations, etc.) in the IPS should be grouped and displayed. The document is organized into sections — the content of each section is either a (structured) set of resources, an (unstructured) Narrativeblock of XHTML, or both. (Actually a section can also include a set of sub-sections, but let’s ignore that for today.)
The interplay of structured resources and unstructured narrative here is pretty tricky. IPS generators have great freedom as to how they’re used — for example:
This IPS has a “Plan of Treatment” section with only narrative xhtml.
This one has exclusively structured data in all sections.
This one has both narrative and structured data for all sections.
It even gets a bit weirder, because when both narrative and structured data are present they are “generally” considered to be equivalents. But the Narrative element includes a status, for which acceptable values include “additional” (i.e., the narrative has MORE information than the structured data) or “extensions” (i.e., the narrative includes content from extension elements that a structured rendering might not know how to represent). I swear every one of the meetings must end with a round of “but can we make it just a bit more complicated?”
Anyways, the viewer needs to deal with two problems here: (1) Which data do we display if both are present, and for extra fun (2) How can we safely display XHTML that we receive from an external, possibly malicious source? Awesome!
The second problem is a little hairier, because injecting untrusted XHTML into the browser is just a really, really sketch thing to do. I honestly can’t quite believe that the standard allows this. But it is what it is, and there’s no real option to just ignore what could be critically important clinical information. So OK.
Unfortunately there are still browsers out there that aren’t supported by DOMPurify. My guess is that the intersection of these browsers with those that can use the viewer is basically zero, but you never know. So we have a fallback solution that inserts an IFrameSandbox.js component. The content is loaded into an embedded iframe with the minimal “sandbox” attribute we can use while still reasonably integrating the content into our layout. This solution isn’t great — but it’s better than nothing!
And that’s a Wrap! (for now)
There’s plenty of work left to do on the viewer, and even as I write this the “develop” branch of the code has started to move beyond what I’ve described here. But it should be (maybe more than) enough to understand what’s going on, and hopefully to save other implementers time and angst figuring out how they want their SHX receivers to work. I’m always happy to chat about this kind of thing too, so please just hit me up using the contact form or on LinkedIn or whatever.
Next job, rendering Provenance resources. So much nerd …
Remote monitoring of a community water tank for under $500, that works kilometers away from wifi or cell service, incurs no monthly fees, and uses a battery that lasts up to ten years? The future is here! I’m super-impressed with LoRaWAN, The Things Network and my Milesight Sensor. Read on for all the nerdy goodness.
Southern Whidbey Island, geologically speaking, is a big pile of clay covered by a big pile of sand. As I (barely) understand it, when glaciers moved in from the North, they plowed heavy clay sediment in front of them, which got trapped in lake beds formed when north-flowing rivers were blocked by those same glaciers. These big blobs of clay (in particular the Lawton Formation) sprung upwards as the glaciers retreated, the same way a pool float does when you climb off, creating the island. The retreat also left a bunch of looser stuff (sand and gravel) on top of the clay. Since then, tides and waves have been continually carving away the sides of the island, leaving us with beautiful high bluffs and frequent landslides. These UW field trip notes go into more and surely more accurate detail, but I think I’ve got the high points right.
Anyway, I’m lucky enough to live at the bottom of one of those bluffs. How our property came to “be” is a great story but one for another time — ask me sometime when we’re hanging out. For today, what’s important is that groundwater collects along the top of the impermeable clay layer in “aquicludes,” what a great word. And that’s where we collect our drinking water. It’s a pretty cool setup — three four-inch pipes jammed into the hillside draw water that’s been filtered through tons of sand and gravel before hitting the clay. The water is collected in a staging tank, then pumped into two holding tanks. A smaller 500 gallon one sits at house-level, and a bigger 2,000 gallon one is most of the way up the bluff.
It’s a bit janky, but gets the job done. Until it doesn’t. Like last July 2nd, two days before 30+ family and friends were to show up for the holiday weekend. The tanks went completely dry and it took us both of those days to figure out the “root” cause. See, I put quotes around the word “root” because it turns out that there were TWENTY-FIVE FEET OF TREE ROOTS growing through the pipes. Completely blocked. Clearing them out was quite a chore, but we got it done and July 4th was enjoyed by all, complete with flushing toilets and non-metered showers. All of which is just background leading to my topic for today.
LoRa / LoRaWAN
Our July 4th saga prompted me to set up a monitoring solution that would give us some advance warning if the water supply starts getting low. The obvious place to do this is the 2,000 gallon upper holding tank, because it’s the first place that goes dry as water drains down to our homes. The tank shed is too far from my house to pick up wifi, though, and while there is some cell coverage, I wasn’t psyched about paying for a monthly data plan. What to do?
It turns out that there is an amazingly cool technology called LoRa (aptly, for “Long Range”) that is tailor-made for situations just like mine. There’s a lot of terminology here and it can be tough to sort out, but in short:
LoRa is a physical protocol for sending low-bandwidth messages with very little power over very long distances. It’s actually a proprietary technique with the patent owned by Semtech, so they control the chip market. Kind of unsettling for something that is otherwise so open, but they don’t seem to be being particularly evil about it.
LoRaWAN is a networking layer that sits on top of LoRa and the Internet, bridging messages end-to-end between devices in the field and applications (e.g., dashboards or alerting systems) that do something useful with device data.
A bunch of different players coordinate within these two layers to make the magic happen. There’s a great walkthrough of it all on the LoRa Alliance site; I’m going to crib their diagram and try to simplify the story a bit for those of us that aren’t huge radio nerds:
End Devices sit in the field, broadcasting messages out into the world without a target — just signals saying “HEY EVERYBODY IT’S 100 DEGREES HERE RIGHT NOW” or whatever.
Gateways harvest these messages from the air and forward them over TCP/IP to a pre-configured…
Network Server (LNS) that typically lives on the Internet. Network servers are the traffic cops of this game. They queue messages, send acknowledgements, delegate “join” messages to a Join Server and device messages to an Application Server, etc.
Join Servers hold the inventory of end devices and applications within the larger network, and knows which devices are supposed to be talking to which applications. Join Servers also manage and distribute encryption keys to ensure minimal information disclosure. I won’t dive into the encryption details here, because yawn.
Application Servers receive device data and get them to the right Application.
Applications are logical endpoints for specific end device data. This is a bit tricky because a LoRaWAN application is different from an end-user application. There is often a 1:1 relationship, but the LRW application accepts and normalizes device data, then makes it available to end-user applications.
End-User Applications (not an official LRW term, just one I made up) actually “do stuff” with device data — create dashboards and other user experiences, send alerts, that kind of thing. End-user applications typically receive device data through a message queue or webhook or other similar vehicle.
The most common LoRaWAN use case is “uplink” (devices send info to apps), but there are also plenty of uses for “downlink” where apps send to devices: configuration updates, proactive requests for device information, whatever. A neat fun-fact about downlinks is that the network server is responsible for picking the best gateway to use to reach the targeted device; it does this by keeping track of signal strength and reliability for the uplinks it sees along the way. Pretty smart.
Picking a Network
Despite the nifty encryption model, many enterprises that use LoRaWAN for mission-critical stuff set up their own private network — which really just means running their own Servers (I’m just going to call the combo of Network/Join/Application servers a logical “Server” going forward). AWS and companies like The Things Industries offer hosted solutions, and a quick Google search pops up a ton of open source options for running your own. There are also quite a few “public networks” which, kind of like the public cloud providers, share logically-segmented infrastructure across many customers.
More interesting to me is the pretty amazing community-level innovation happening out there. The Things Stack “Community Edition” was one of the first — anybody can set up devices, gateways and applications here. It so happens that our outpost on Whidbey Island didn’t have great TTN coverage, so I bought my own gateway — but with more than 21,000 connected gateways out there, in most metro locations you won’t even have to do that. The gateway I bought grows the community too, and is now there for anybody else to use. Sweet!
Side note: I actually bought my gateway almost two years ago (part of a different project that never made it over the finish line), so it was there and waiting for me this time. But if I was starting today I might (even as a crypto skeptic, and appreciating its already checkeredpast) take a look at Helium instead. They basically incent folks to run gateways by rewarding them with tokens (“HNT”) which can be exchanged for credits on the network (or for USD or whatever). Last year they expanded this (only in Miami for now) system into cell service. I dunno if these folks will make a go of it, but I do love the idea of a “people’s network” … so hopefully somebody will!
Here’s my gateway running on The Things Network:
Picking a Device
Measuring the amount of liquid in a tank is an interesting problem. We use a standard float switch to toggle the pump that feeds the tank, turning it on whenever the level drops below about 1,800 gallons. This works great for the pump, but not for my new use case — it only knows “above” or “below” its threshold. I want to track specific water volume every few minutes, so we can identify trends and usage patterns over time.
A crude option would be to just use a bunch of these binary sensors, each set at a different height (it’s about six feet tall, so say one every foot or so). But that’s a lot of parts and a lot to go wrong — there are a plenty of better options that deliver better measurements with less complexity:
Capacitive measurement uses two vertical capacitive plates with an open gap between them (typically along the insides of a PVC pipe open at both ends. As liquid rises inside the pipe, capacitance changes and can be correlated to liquid levels.
Ultrasonic measurement is basically like radar — the unit mounts at the top of the tank pointing down at the liquid. A pulse is sent downwards, bounces off the water and is sensed on its return. The amount of time for that round trip can be correlated to height in the tank. The same approach can be used from the bottom of the tank pointing up — apparently if the transducer is attached to the bottom of the tank, the signal won’t reflect until it hits the top of the liquid-air boundary. Amazing!
Hydrostatic pressure sensors are placed on the inside floor of the tank and the relative pressure of water above the sensor correlates with depth.
A number of variations on the above and/or float-based approaches.
After a bunch of research, I settled on a hydrostatic unit — the EM500-SWL built by Milesight. Built for LoRaWAN, fully sealed, 10 year battery life, and a relative steal at less than $350. I was a bit worried that our tank would be too small for accurate measurements, but Asuna at Milesight assured me it’d work fine, and connected me with their US sales partner Choovio to get it ordered. They were both great to work with — five stars!
Setup at the tank was a breeze. Connect the sensor to the transceiver, drop the sensor into the tank, hang the transceiver on the shed wall and hit the power button. Configuration is done with a mobile app that connects to the unit by NFC; kind of magic to just hold them together and see stuff start to pop! By the time I walked down the hill to my house, the gateway was already receiving uplinks. Woo hoo!
Setting up the Application
OK, so at this point the sensor was broadcasting measurements, they were being received by the gateway, and the gateway was pushing them up to the Things Network Server. Pretty close! But before I could actually do anything with the readings, it was back to the Network Server console to set up an Application and “activate” the device. Doing this required three key pieces of information, all collected over that NFC link:
DevEUI: a unique identifier for the specific device
JoinEUI: a unique identifier for the Join Server (the default in my device was, happily, for The Things Network)
AppKey: the key used for end-to-end encryption between the device and application
Applications can also assign “payload formatters” for incoming messages. These are small device-specific scripts that translate binary uplink payloads into something usable. Milesight provides a ready-to-go formatter, and with that hooked up, “water_level” (in centimeters) started appearing in each message. Woot!
Finally, I set up a “WebHook” integration so that every parsed uplink from the device is sent to a site hosted on my trusty old Rackspace server, secured with basic authentication over https. There are a ton of integration choices, but it’s hard to beat a good old URL.
And Actually Tracking the Data
At last, we can do something useful with the data! But as excited as I am about my monitoring app, I’m not going to go too deep into it here. The code is all open sourced on github if you’d like to check it out (or use it for something) — basically just a little web server with a super-simple Sqlite database underneath. Four endpoints:
/witterhook is the webhook endpoint, accepting and storing uplinks.
/witterdata provides the JSON data underlying the chart.
/wittercheck returns a parseable string to drive alerts when the levels go low (3.5 feet) or critical (2 feet).
For the alerting, I’m just using a free account at Site24x7 to ping /wittercheck every half hour and send email alerts if things aren’t as they should be.
So there you go. There are already obvious patterns in the data — the “sawtooth” is so consistent that there must be a steady, small leak somewhere in the system below the upper tank. Our supply is keeping up with it no problem at the moment, but definitely something to find and fix! It’s also clear that overnight sprinklers are by far our biggest water hogs, but I guess that’s not a shocker.
Now I just have to figure out how to auger out the rest of that root mass. Always another project at the homestead!
Baseball, specifically Red SoxBaseball, was a Big Deal for kids growing up in the 1970s in suburban Boston. We all collected and traded cards, prayed to be put on the Red Sox for little league, kept score watching on channel 38, played home run derby on the neighborhood tennis court, and established infinitely complex rules to mange “ghost runners” for wiffleball games with only four players. I can neither confirm nor deny reports of ritual burnings of Yankee cards.
Anyways, there was a dice-based baseball game we used to play when we couldn’t be outside. Not the exquisitely-complicated version created by some kid in Quebec in 1979 (I was 10 and that one was a little mind-numbing), but a really simple version that just associates rolls with outcomes. I’d totally forgotten about this until it popped up on my Pinterest feed a couple of months ago. Seemed like a fun and nostalgic Glowforge project, so I started playing with designs in Inkscape. It took awhile, but I’m quite pleased with the end result, which included not just standard cutting and etching but some cool magnets and an online component as well.
Note: You may notice that the engravings are all Mariners images, not the Red Sox. My kids grew up Mariners fans, and over the last 30 years that’s made me one too. Julio!!!
The game consists of a playing field inside a finger-joint box, magnetic bases and tokens, dice (purchased!) and a mobile scoreboard app. Most of the pieces are quarter-inch two-sided white oak veneer MDF from Craft Closet (a great source BTW, they even recommend GF settings for their materials). A QR code on the pitchers mound opens the scoreboard app on a phone, which sits in landscape mode in center field. Each roll of the dice corresponds to one at-bat, according to rules etched into the bottom of the lid.
Tokens are used to represent the batter and baserunners. Outs and runs are recorded via touches on the mobile app, and honestly that’s about it. The game is simple, 100% luck-based, yet kind of entertaining. And a ton of fun to put together; I love projects that combine multiple techniques.
The Field & Tokens
The field is 10×10 inches and comprised of a few different insets — cut separately but all together in this SVG file. Creating the infield shape took forever, intersecting and unioning arcs and lines and circles in Inkscape. I am (at best) an Inkscape hack, but am continually amazed at what a stellar job it does with really complex (for me) stuff. Of course each inset needed to deal with kerf width, which I described here and won’t belabor again.
Unfortunately the magnets didn’t quite fit into the holes underneath first, second and third bases. I was able to snip off enough using a wire cutter, but the material is really brittle and I wrecked quite a few before I was finished. Luckily they’re cheap and, since they were hidden under the board, looks didn’t matter. A few dabs of two-part epoxy held them in place great. The pitchers mound didn’t have magnets underneath, so I padded the extra space with a bit of cork sheet I had lying around.
The tokens are simple circles cut from more white oak for the home team (34, 24, 11 and 51) and mahogany for the visitors (33, 15, 10 and 20). I’ll leave it as an exercise for you to figure out which numbers correspond to which of our favorite Mariners and Red Sox players. 😉 I used the drill press to very very carefully create a recess for magnets in the bottom of each one — careful to get the polarity right so that the tokens stick to the bases rather than jumping away from them!
The field is glued into a lidded box, which is handy for storage and keeps the dice from bouncing off the table during play. Having never built a laser-cut box before, I tried the “boxes.py” SVG generator and really can’t say enough good things about it. Choose your style, set your measurements and you’re ready to go. And because its output is a clean SVG file, it was super-easy to add engravings for the top and the dice combos.
Once again our old friend kerf is super-important to ensure a good fit, and it was a little tough to get right with thicker pieces. But my second try was a success and didn’t require a lick of glue to stay solid (the first attempt is now holding my supply of Unicorn Spit). The inset lid even snugs perfectly into place. My box was pretty simple, but there are tons of options to choose from. What a stellar resource.
Three coats of a clear satin spray polyurethane to protect the surfaces and the physical game was good go to. Now, on to the virtual!
Some versions of the game use a cribbage-like setup with pegs to keep track of runs and outs. I played with that, and with manually-operated counter wheels, but really didn’t like either one. Instead I decided to build a mobile website, optimized for a landscape-mode phone, that could sit right in the box in center field. I added a little brace behind second base that should fit pretty much any phone. The pitchers mound has a QR code (I have QR Codes on the brain these days) that opens up the scoreboard app, so there’s no stupid URL to remember or lose. Just scan the code and place the phone into its nook. Works great!
There’s not too much to say about the app itself. Diceball.js holds the game state and logic, which is passed down to three controls: Scoreboard.js drives the aggregate and per-inning run display (tracking extra innings if necessary), OutsDisplay.js shows current outs in the inning, and ButtonBar.js handles game updates. Game state is persisted into local storage so you can resume games in progress, and a full undo chain lets you fix touch errors like double-tapping the “out” button by mistake. Because there’s no server-side processing, I was able to host it in my family Azure account simply by copying the files up to a storage account with web access enabled. Nice.
And that’s it! A lot of fun to make and to share. Until next time!
This is article three of a series of three. The first two are here and here.
Last time here on the big show, we dug into SMART Health Cards — little bundles of health information that can be provably verified and easily shared using files or QR codes. SHCs are great technology and a building block for some fantastic use cases. But we also called out a few limitations, most urgently a ceiling on QR code size that makes it impractical to share anything but pretty basic stuff. Never fear, there’s a related technology that takes care of that, and adds some great additional features at the same time: SMART Health Links. Let’s check them out.
The Big Picture
Just like SMART Health Cards (SHCs) are represented by encoded strings prefixed with shc:/, SMART Health Links (SHLs) are encoded strings prefixed with shlink:/ — but that’s pretty much where the similarity ends. A SHC is health information; a SHL packages health information in a format that can be securely shared. This can be a bit confusing, because often a SHL holds exactly one SHC, so we get sloppy and talk about them interchangeably, but they are very different things.
The encrypted string behind a shlink:/ (the “payload”) is a base64url-encoded JSON object. We’ll dive in way deeper than this, but the view from 10,000 feet is:
The payload contains (a) an HTTPS link to an unencrypted manifest file and (b) a key that will be used later to decrypt stuff.
The manifest contains a list of files that make up the SHL contents. Each file can be a SHC, a FHIR resource, or an access token that can be used to make live FHIR requests. We’ll talk about this last one later, but for now just think of a manifest as a list of files.
Each file can be decrypted using the key from the original SHL payload.
There’s a lot going on here! And this is just the base case; there are a bunch of different options and obligations. But if you remember the basics (shlink:/, payload, manifest, content) you’ll be able to keep your bearings as we get into the details.
Privacy and Security
In that first diagram, nothing limits who can see the manifest and encrypted content — they’re basically open on the web. But all that is basically meaningless without access to the decryption key from the payload, so don’t panic. It just means that, exactly like a SHC, security in the base case is up to the person that’s holding the SHL itself (in the form of a QR Code or whatever). And often that’s perfectly fine.
Except sometimes it’s not, so SHLs support added protection using an optional passcode that gates access to the manifest:
A user receiving a SHL also is given a passcode. The passcode is not found anywhere in the SHL itself (although a “P” flag is added to the payload as a UX hint).
When presenting the SHL, the user also (separately) provides the passcode.
The receiving system sends the passcode along with the manifest request, which succeeds only if the passcode matches correctly.
Simple but effective. It remains to be seen which use cases will rally around a passcode requirement — but it’s a handy arrow to have in the quiver.
The SHL protocol also defines a bunch of additional requirements to help mitigate the risk of all these (albeit encrypted and/or otherwise protected) files floating around:
Manifest URLs are required to include 256 bits of entropy — that is, they can’t be guessable.
Manifests with passcodes are required to maintain and enforce a lifetime cap on the number of times an invalid passcode is provided before the SHL is disabled.
Content URLs are required to expire (at most) one hour after generation.
These all make sense … but they do make publishing and hosting SHLs kind of complicated. While content files can be served from “simple” services like AWS buckets or Azure containers, manifests really need to be managed dynamically with a stateful store to keep track of things like passcodes and failed attempts. Don’t think this is going to be a one night project!
SMART Health Links in Action
Let’s look at some real code. First we’ll run a quick end-to-end to get the lay of the land. SHLServer is a standalone, Java-based web server that knows how to create SHLs and serve them up. Build and run it yourself like this (you’ll need a system with mvn and a JDK installed):
git clone https://github.com/seanno/shutdownhook.git
mvn clean package install
mvn clean package
./run-demo.sh # or use run-demo.cmd on Windows
This will start your server running on https://localhost:7071 … hope it worked! Next open up a new shell in the same directory and run node create-link.js (you’ll want node v18+). You’ll see an annoying cert warning (sorry, the demo is using a self-signed cert) and then a big fat URL. That’s your SHL, woo hoo! Select the whole thing and then paste it into a browser. If you peek into create-link.js you’ll see the parameters we used to create the SHL, including the passcode “fancy-passcode”. Type that into the box that comes up and …. magic! You should see something very much like the image below. The link we created has both a SHC and a raw FHIR bundle; you can flip between them with the dropdown that says “Health Information”.
So what happened here? When we ran create-link.js, it posted a JSON body to the server’s /createLink endpoint. The JSON set a passcode and an expiration time for the link, and most importantly included our SHC and FHIR files as base64url-encoded strings. SHLServer generated an encryption key, encrypted the files, stored a bunch of metadata in a SQLite database, and generated a SHL “payload” — which looks something like this:
(You can make one of these for yourself by running create.js rather than create-link.js.) Finally, that JSON is encoded with base64url, the shlink:/ protocol tag is added to the front, and then a configured “viewer URL” is added to the front of that.
The viewer URL is optional — apps that know what SHLs are will work correctly with just the shlink:/… part, but by adding that prefix anybody can simply click the link to get a default browser experience. In our case we’ve configured it with https://shcwork.z22.web.core.windows.net/shlink.html, which opens up a generic viewer we’re building at TCP. That URL is just my development server, so handy for demo purposes, but please don’t use it for anything in production!
Anyways, whichever viewer receives the SHL, it decodes the payload back to JSON, issues a POST to fetch the manifest URL it finds inside, pulls the file contents out of that response either directly (.embedded) or indirectly (.location), decrypts it using the key from the payload, and renders the final results. You can see all of this at work in the TCP viewer app. Woot!
A Quick Tour of SHLServer
OK, time for some code. SHLServeris actually a pretty complete implementation of the specification, and could probably even perform pretty reasonably at scale. It’s MIT-licensed code, so feel free to take it and use it as-is or as part of your own solutions however you like, no attribution required. But I really wrote it to help folks understand the nuances of the spec, so let’s take a quick tour.
Because the manifest format doesn’t include a way to identify specific files, the admin methods expect the caller to provide a “manifestUniqueName” for each one. This can be used later to delete or update files — as the name implies, they only need to be unique within each SHL instance, not globally.
The last interesting feature of the class is that it can operate in either “trusted” or “untrusted” mode. That is, the caller can either provide the files as cleartext and ask the server to allocate a key and encrypt them, or it can pre-encrypt them prior to upload. Using the second option means that the server never has access to keys or personal information, which has obvious benefits. But it does mean the caller has to know how to encrypt stuff and “fix up” the payloads it gets back from the server.
The bottom layer of code is SHLStore.java, which just ferries data in semi-ORM style between a Sqlite database and file store. Not much exciting there, although I do have a soft spot for Sqlite and the functional interface I built a year or so ago in SqlStore.java. Enough said.
Anatomy of a Payload
OK, let’s look a little more closely at the payload format that is base64url-encoded to make up the shlink:/ itself. As always it’s just a bit of JSON, with the following fields:
urlidentifies the manifest URL which holds the list of SHL files. Because they’re burned into the payload, manifest URLs are expected to be stable, but include some randomness to prevent them from being guessable. Our server implements a “makeId” function for this that we use in a few different places.
keyis the shared symmetric key used to encrypt and decrypt the content files listed in the manifest. The same key is used for every file in the SHL.
labelis a short string that describes the contents of the SHL at a high level. This is just a UX hint as well.
v is a version number, assumed to be “1” if not present.
flagsis a string of optional upper-case characters that define additional behavior:
“P” indicates that access to the SHL requires a passcode. The passcode itself is kept with the SHL hoster, not the SHL itself. It is communicated to the SHL holder and from the holder to a recipient out of band (e.g., verbally). The flag itself is just another UX hint; the SHL hoster is responsible for enforcement.
“L” indicates that this SHL is intended for long-term use, and the contents of the files inside of it may change over time. For example, a SHL that represents a vaccination history might use this flag and update the contents each time a new vaccine is administered. The flag indicates that it’s acceptable to poll for new data periodically; the spec describes use of the Retry-After header to help in this back-and-forth.
One last flag (“U”) supports the narrow but common use case in which a single file (typically a SHC) is being transferred without a passcode, but the data itself is too large for a usable QR code. In this case the url field is interpreted not as a manifest file but as a single encrypted content file. This option simplifies hosting — the encrypted files can be served by any open, static web server with no dynamic manifest code involved. The TCP viewer supports the U flag, but SHLServer doesn’t generate them.
Note that if you’re paying attention, you’ll see that SHLServer returns another field in the payload: _manifestId. This is not part of the spec, but it’s legal because the spec requires consumers to expect and ignore fields they do not understand. Adding it to the payload simply makes it easier for users of the administration API to refer to the new manifest later (e.g., in a call to upsertFile).
Working with the Manifest
After a viewer decodes the payload, the next step is to issue a POST request for the URL found inside. POST is used so that additional data can be sent without leaking information into server logs:
recipientis a string representing the viewer making the call. For example, this might be something like “Overlake Hospital, Bellevue WA, registration desk.” It is required, but need not be machine-understandable. Just something that can be logged to get a sense of where SHLs are being used.
Directly, using an embedded field within the manifest JSON.
Indirectly, as referenced by a location field within the manifest JSON.
This is where embeddedLinkMax comes into play. It’s kind of a hassle and I’m not sure it’s worth it, but not my call. Basically, if embeddedLengthMax is not present OR if the size of a file is <= its value, the embedded option may be used. Otherwise, a new, short-lived, unprotected URL representing the content should be allocated and placed into location. Location URLs must expire after no more than one hour, and may be disabled after a single fetch. The intended end result is that the manifest and its files are considered a single unit, even if they’re downloaded independently. All good, but it does make for some non-trivial implementation complexity (SHLServer uses a “urls” table to keep track; cloud-native implementations can use pre-signed URLs with expiration timestamps).
SMART API Access
OK I’ve put this off long enough — it’s a super-cool feature, but messes with my narrative a bit, so I’ve saved it for its own section.
In addition to static or periodically-updated data files, SHLs support the ability to share “live” authenticated FHIR connections. For example, say I’m travelling to an out-of-state hospital for a procedure, and my primary care provider wants to monitor my recovery. The hospital could issue me a SHL that permits the bearer to make live queries into my record. There are of course other ways to do this, but the convenience of sharing access using a simple link or QR code might be super-handy.
A SHL supports this by including an encrypted file with the content type application/smart-api-access. The file itself is a SMART Access Token Response with an additional aud element that identifies the FHIR endpoint (and possibly some hints about useful / authorized queries). No muss, no fuss.
The spec talks about some other types of “dynamic” exchange using SHLs as well. They’re all credible and potentially useful, but frankly a bit speculative. IMNSHO, let’s lock down the more simple file-sharing scenarios before we get too far out over our skis here.
And that’s it!
OK, that’s a wrap on our little journey through the emerging world of SMART Health Cards and Links. I hope it’s been useful — please take the code, make it your own, and let me know if (when) you find bugs or have ideas to make it better. Maybe this time we’ll actually make a dent in the health information exchange clown show!
Contains some health information (e.g., information about vaccines administered to an individual);
Is provably vouched for by some known issuer (e.g., WA DOH confirms that the individual actually received the vaccines);
Can be shared in many ways (e.g., through a printed QR Code or in a mobile wallet); and
May be invalidated/retracted if the issuer no longer believes the claim.
The best way to think about security and privacy for a SHC is to imagine it as a physical piece of paper containing the same information (which, sometimes, it is!):
An issuer gives a SHC to humans they believe should have it. (E.g., WAVerify will give you a COVID vaccine SHC if you prove you have access to the email address or mobile phone that was recorded when you physically received the vaccine.)
It’s up to the holder to “protect” a SHC by not losing it or allowing it to be stolen. (E.g., by keeping a printed version safely in your wallet or behind a PIN on your mobile phone). SHCs are not encrypted!
The holder can share the SHC with whomever they like. (E.g., by allowing the bouncer at your favorite dive to scan the printout).
A really important implication of this approach is that a SHC is not a proof of identity. If I show you a SHC with Sean’s vaccine information, it doesn’t mean that I am Sean — it just means that I have access to this information about Sean (maybe legitimately, maybe not). Additional work is required to prove that the person with the SHC is the same person that is in the SHC. Typically this is done by looking at an actual ID (e.g., a drivers license) and verifying that the name/birthdate/etc. match what’s in the SHC.
OK, that’s basically what we’re working with here. Now let’s see how they’re put actually together and used.
A SHC can contain any health information that can be represented in JSON using FHIR resources, which in practice means anything. For example, here’s one of my vaccine records as a FHIR “Immunization” resource:
Most of this is pretty self-explanatory (which is one of the advantages of FHIR over other formats). Code “208” in the CVX system represents the original Pfizer-BioNTech vaccine, and I’ve redacted out the actual lot number because, well, I don’t really know why I care about you seeing that but whatever.
The patient “reference” element is interesting — what does “resource:0” mean? Because information in a SHC is usually composed of not just one but a few related resources, my actual COVID SHC also includes a “Patient” resource that contains my demographic information. Referencing that resource links the two together to form a complete record. Related resources are packaged together into a collection called a “Bundle” (another FHIR type). Here’s a complete COVID bundle (cut to include just my first vaccination):
The “entry” array in the bundle is just an array of resources, each one keyed with a “fullUrl” value that can be used as a reference. Typically fullUrl values are actual unique identifiers (often REST-style URLs), but the SHC specification prescribes the use of “resource:#” values instead. Primarily this is to help minimize resource size, but it’s also a nice reminder that the resources in a SHC should stand alone without dependencies outside of its bundle.
Actually, to be really precise, the specification only requires “resource:#” identifiers when the SHC is destined to be shared using a QR code. We’ll talk a lot more about this later, but QR codes are quite limited in the amount of data they can reasonably contain. One way the specification tries to keep the size down is with a set of minimization steps which result in still-valid but terse resources.
Obviously there’s a lot more to FHIR and we could keep going for a long time — but I think that’s enough for our purposes. The important takeaway is just that the actual goodies in the magic inner core of a SHC are just FHIR resources that can represent any useful health-related information.
Signatures and Verifiable Credentials
One problem with a paper vaccine card is that it’s trivial to forge — and the same goes for a random digital bundle of FHIR resources. Now this may not always be the end of the world (smart folks like Bruce Schneier have argued that it can actually be a “feature”), often it’s a real problem. One of the coolest things about SHCs is that they require a very small incremental investment vs. paper to be made strongly “verifiable” in a way that doesn’t require sensitive centralized databases or other privacy-threatening technology.
Next, the FHIR bundle needs to be wrapped up in a structure that’s almost but not quite a W3C Verifiable Credential (a fully “legit” VC can be directly derived from the SHC format; where they differ it’s basically another attempt to reduce data size for the QR representation). In practice that means using JSON like this:
ISS_URL tells readers where to find the public key that will ultimately verify that the issuer signed the bundle. Note this is a prefix of the JWKS URL we saw above (e.g., for Washington the ISS value is https://waverify.doh.wa.gov/creds). “.well-known/jwks.json” is part of the specification to be appended by the reader. Also, the ISS value cannot have a trailing slash — it is regularly used in exact string comparisons, so the protocol can’t afford to be as sloppy as we usually are about that.
Finally, we can sign this bad boy using a JSON Web Signature. JWS (or if you’re picky, JWS using compact serialization) is one of the two flavors of JSON Web Token — a format that makes it easy to move signed and/or encrypted data around the web. The issuer first compresses the data with DEFLATE (that size issue again!), then uses its private key to sign it, using code something like this:
*** Don’t miss the deflateRawSync call above! “raw” means that the result won’t include a zip header. This can trip up implementations that expect a header by default. One such example is the Java Inflater class; passing true to it’s constructor will do the trick.
And here’s our resulting JWS (in SHC conversations often referred to as the VC):
That big garbled mess is actually hiding three different base64url-encoded sections, each separated by dots (feel free to skip the Where’s Waldo game on this): a header, the payload and the signature itself. Decoding the header is a useful exercise:
First is the “zip” field that we added ourselves — this is actually a non-standard header for JWS, but totally allowed and a useful hint/reminder that our payload is compressed. More important are the algorithm and key identifier used to sign the payload. So to verify a SHC, the receiver:
Decodes the three sections with base64url.
Uses the ISS value in the payload to get the list of valid keys for the issuer.
Uses the “kid” field in the header to pick the right one from that list.
Computes the signature and compares it to the one in the JWS.
And then magic happens. We’re really close now — just need to cover a couple of side issues and then look at sharing (including using QR codes). Hang in there!
Trusted Issuers and Directories
Protocols are all well and good, but at some point the rubber has to meet the road. Sure you can trust any issuer that puts a JWKS file up on the web, but should you? Probably not — but how to decide? Typically folks delegate the work to some centralized organization (a trust network or trust registry) that serves as the gatekeeper for inclusion. The trust network publishes a directory of vetted participants at a secure, well-known location that everyone can start from.
Unfortunately, this practical necessity is often the last thing that gets implemented in trust-dependent projects, because it’s so easy to fake up in demos and tests (including mine). And while I wasn’t on the ground in 2021 when folks were putting it together, it sure seems like the COVID-initiated SHC network was no exception. This is the situation as it appears to be today, at least in the USA and other jurisdictions without a strong central network of their own:
The Commons Project is responsible for actually vetting issuers against these rules and maintaining technical infrastructure for the network.
Machine-readable lists of issuers, their ISS values and cached public keys are available in the VCI Directory on GitHub.
One of the nicest things that VCI does is to take a daily “snapshot” of all the JWKS files in the network and aggregate them into a single file here. This means that any application that wants to verify SHCs can just download this one file and use it to find keys, rather than having to download each JWKS on-demand. Since network calls are always slow and unreliable, the fewer the better. Good stuff!
Despite its near-impenetrable web of trademarks, VCI/CommonTrust has so far used a pretty common-sense approach to inclusion. Issuers are required to be one of the following (from their site):
Clinical health system or hospital providing patient care
National or regional pharmacy chain
National or regional laboratory diagnostics provider
National or regional health insurance payor
Government or governmental agency
It’s pretty easy to verify whether an applicant does or does not fit into one of these categories; there doesn’t seem to be much controversy about that. What is up for grabs, however, is whether this same trust network should be used for all SHCs. For example, would it be better to have a separate trust network for insurance cards? There are good arguments pro and con; the nerds among us who just want to know what URL to use will just have to wait and see how it plays out!
If you were really paying attention back when we talked about the SHC privacy and security model, you might have realized that since people hold SHCs themselves, without any “live” link back to the issuer, the issuer might have trouble retracting the credential if it needs to, for example if a bug caused them to put the wrong plan identifier on an insurance card. “Revocation” is not a common part of the SHC lifecycle (e.g., your insurance card from last year doesn’t need to be revoked — it’s still a valid insurance card, the data just shows that the plan is expired) — but it can happen.
In the case of a major sh*tshow — say somebody hacks into an issuer’s internal systems — public keys can simply be removed from the trust network and all SHCs signed by that issuer will immediately become invalid. But that’s a pretty big hammer, sure to cause a lot of chaos and annoyance throughout the system. More typically, a SHC issuer creates and publishes a cardrevocation list (CRL) that contains opaque identifiers for individual cards that should not be trusted.
The issuer adds an “rid” field to the signed SHC VC. Because it may be posted publicly, the rid must be a random, opaque value.
The issuer maintains a CRL containing revoked rids for each of its public keys at “ISS_URL/.well-known/KID.json”, where KID is a key identifier from its JWKS file (that .well-known URL construct is really quite handy).
Verifiers download the CRL regularly and use it to reject individual cards that show up on the list. “Regularly” isn’t very well-defined, but daily seems to be a generally-accepted frequency.
There are some more details that enable time-based revocation, but I’m calling it good there. TLDR, there’s a pretty good way for issuers to retract a card that was issued in error.
Sharing by File
OK, we’ve got a SHC in JWS/VC format, and we want to move it around — how to do that? We’ll cover the less-commonly-used method first, which is just sticking a SHC into a file. The specification defines an extension “.smart-health-card” and MIME type “application/smart-health-card” for these files.
It’s pretty simple, except for one thing. The file format actually can contain multiple cards — it’s a JSON object with a “verifiableCredential” array, each entry of which is a JWS/VC string. I can see where they were going with this concept … it might be nice to have a file that wraps up multiple cards from different issuers. But with the appearance of SMART Health Links I’m thinking this capability may be very, very lightly used. Ah well, there’s always a little leftover detritus when you’re figuring out something new.
Presumably as SHCs become more and more popular, sharing them around this way may increase in popularity. For example, it’d be really nice to just tap a link with the right MIME type and have the SHC drop directly into your Google or Apple wallet. Likely will happen, but it we’re just early, so the process using files is a bit clunky.
One nice thing about storing a SHC in a file — unlike with QR codes, there’s no practical size restriction. It turns out that this is a pretty big deal as the use cases expand, and is a key reason that SMART Health Links are playing an increasingly important role. We’ll talk about those next time.
Sharing by QR Code
Finally! The first experience everyone has with SMART Health Cards is the QR Code — but it’s taken more than 2,500 words to get there. With that background, at least it’s pretty easy to understand what’s going on inside. A SHC QR Code simply contains the JWS/VC string we created and signed earlier. It’s not a link to that data, it’s the actual data itself.
Space really is at a premium with QR Codes. It depends on things like the image size and error correction level, but in practice SHC data in JWS/VC form has to be 1,191 bytes or smaller. The original specification had a vehicle for “chunking” larger data into multiple codes (imagine a page with four individual QRs on it, scanned one after another) but for obvious reasons that was quickly deprecated by the community.
So everything about encoding QRs is about squeezing things down. There are a few different “modes” that can be used for QR data; the SMART folks figured out that numeric was the best option for SHC data. The spec goes into gory detail here, so I’m just going to cheat and just pop in a little code using the qrcode npm package:
… and we made it. Woo hoo! There are actually a few other ways to move SHC data around using FHIR APIs, but I’m going to call it here.
I hope all of this noise proves useful, both to help understand what’s going on under the covers and give you a bit of a head start on actually working with SHCs in code. Next time we’ll dive into SMART Health Links, which address some of the challenges with SHCs and add some neat stuff of their own. I’d love to hear your thoughts or corrections, just hit me up on Twitter or LinkedIn or whatever.
This is article one of a series of three. Two is here; three is here!
A few weeks ago I helped out with a demo at UCSD that showed patients checking into the doctor using a QR code. It was pretty cool and worked well, excepting some glare issues on the kiosk camera. But why, you might ask, did this retired old guy choose to spend time writing code to support, of all things, workflow automation between providers and insurance companies? Well since you clearly did ask, I will be happy to explain!
I spent a lot of my career trying to make it easier for individuals to get the informed care they need to be healthy and safe. And while I’ll always be proud of those efforts, the reality is that we just weren’t able to change things very much. Especially here in the US, where the system is driven far more by dollars than by need. But I’m still a believer — longitudinal care that travels with the individual is the only way to fix all of this — and despite my exit from the daily commute, I’m always on the lookout for ideas that will push that ball forward.
COVID and the birth of SMART Health Cards
Flash back to COVID year two (bear with me here). We had vaccines and they worked really well, and folks were chomping at the bit to DO STUFF again. One way we tried to open the world back up was by requiring proof of vaccination for entry to movies and bars and such. And because healthcare still thinks it’s 1950, this “proof” was typically a piece of paper. Seriously. Anyways, a few folks who live in the current millenium came up with a better idea they called SMART Health Cards — a fancy way of using QR codes and phones to share information (like vaccine status) that can be digitally verified. It was a lot better than paper — The Commons Project even made a free mobile app that venues could use to quickly and easily scan cards. More than thirty states adopted the standard within months — a track record that will make any health tech wonk stand up and take notice.
Of course the problem with any new technology is that adoption takes time — most folks still just showed up at the bar with a piece of paper. But with SMART Health Cards, that’s fine! Paper records could easily include a SHC QR code and the system still worked great. I found this bridge between the paper and digital worlds super-compelling … it just smelled like maybe there was something going on here that was really different. Hmm.
As it turns out, the pandemic began to largely burn itself out just as all of this was building up steam. That’s a good thing of course, but it kind of put the brakes on SHC adoption for a bit.
Enter SMART Health Insurance Cards
One reason the states were so quick to adopt SHCs is because it was fundamentally simple:
Host a certificate (ok, a “signing key”).
Sign and Print a QR code on your existing paper records.
That’s the whole thing. Everything that worked before keeps working. All you need to do to get the digital benefits is to put a QR code on whatever document or card or app you already have. This is pretty neat. Of course, other pieces of the ecosystem like the verifier and “trust network” of issuers took a bit more work, but for the folks in the business of issuing proof, it’s really easy.
It’s pretty clear that this technology could be used in other ways as well. Extending vaccine cards to include more history for camps and schools is an obvious one. Folks are working on an “International Patient Summary” to help people move more seamlessly between health systems. And, finally getting back to the point of this post, it seems like there is a real opportunity to improve the experience of patient check-in for care — we all have insurance cards in our wallet, why not make them digital and use QR codes to simplify the process?
This idea gets me excited because, if you play it out, there appears to be a chance to move that “individual health record” ball forward. First, there is real business momentum behind the idea of improving check-in. 22% of claim denials are due to typos and other errors entering registration information. We’ve learned it takes six weeks to train front desk staff to interpret the thousands of different insurance card formats — and it’s a high-turnover job, so folks are running that training again and again. And even when the data makes it to the right place in the end, it’s only after a super-annoying process of form-filling and xeroxing that nobody feels positive about.
All of this together means there are a lot of people who are honestly psyched about the potential financial / experience benefits of digital check-in. Especially when the lift is simply “put a QR code on your existing cards.” It’s kind of a no-brainer and I think that’s why more than 70 different organizations were represented at the UCSD demonstration. It was pretty neat.
More than insurance
Cool, so there’s motivation to actually deploy these things and begin to transition check-in to the modern world. (I should acknowledge that this is not the first or only initiative in this space; for example Phreesia has been working the problem for years and does a fantastic job. SMART Health Cards are additive to these workflow solutions and will just make them all get traction quicker.)
But the other thing that gets me excited here is that the “payload” in a SMART Health Card can carry way more than just insurance data. That same card — especially since it’s coming from an insurance company that knows a lot about your health — could include information on your allergies, medications, recent procedures, and much more. All of the stuff that you have to fill out on forms every time you show up anyways, and that can make or break the quality of care you receive. You can even imagine using this connection to set up authorization for the provider to update your personal record after the visit.
Woo hoo! At the end of the day, I see this initiative as one that has the potential to improve coordination of care through individuals in a way that will actually be deployed and sustained, because it has immediate and obvious business benefit too. And with the ability of SHCs to bridge paper to digital, we may be looking at a real winner. Still a ton of work to do on the provider integration side, but that just makes it interesting.
Oops one problem (and a solution)
It turns out that there is one big technical issue with SHC QR codes that make a lot of what I’ve been gushing about kind of, well, impossible. The numbers bounce around depending on the physical size of the QR image, but basically you can only cram about 1,200 bytes of data into the QR itself. That’s enough for a really terse list of vaccines, but it just doesn’t work for larger payloads. Insurance data alone using the proposed CARIN FHIR format seems to average about 15k. Hmm.
No problem — Josh and his merry band of collaborators come to the rescue again with the concept of SMART Health Links. A SHL creates an indirection between the QR Code and a package of data of basically unlimited size that can contain multiple SMART Health Cards, other collections of health data, and even those authorization links I mentioned earlier. The data in the QR code is just a pointer to that package, encrypted at a URL somewhere. The standard defines how that encryption works, defines ways to add additional security, and so on. It’s great stuff.
The workflow we demonstrated at UCSD uses payor-issued SMART Health Cards wrapped up inside SMART Health Links. If a person has multiple insurance cards (or even potentially drivers licenses and other good stuff) they could combine them all into a “Super-Link” while still maintaining intact the ability to verify each back to the company or state or whatever that issued it. Ka-ching!
And if you’re a nerd like me, over the next week or so I’m going to write two techy posts about the details — one for SMART Health Cards and one for SMART Health Links. Hopefully they will serve to get folks more comfortable with what it will take for issuers and consumers to get moving with real, production deployments quickly. If you’d like to get notified when those go up, just follow me on LinkedIn or Twitter or whatever.
It’s a good fight and hopefully this one will get us closer to great care. Just Keep Swimming!
Appropriate for Memorial Day and the reflection it deserves. A Woman of No Importance by Sonia Purnell tells the story of SOE, OSS and CIA agent Virginia Hall, winner of the Distinguished Service Cross, the French Croix de Guerre, and the Most Excellent Order of the British Empire (man does that sound British) for her awe-inspiring work in France during WW2. While she survived the war (and actually continued to serve well into the Cold War), she very clearly carried the physical and mental after-effects of a time where so many of her colleagues and charges did not come home.
That she survived at all is amazing. Initially inserted into Lyon in Vichy France by the SOE, she recruited, trained, armed and led independent resistance cells on both sides of the demarcation line. She coordinated safe houses and safe passage for agents and downed pilots. She set up radio operators with secure locations, moving them constantly as Milice and Gestapo mobile vans triangulated the signals. She was so central that in 1942 the Gestapo called her “the most dangerous of all allied spies” and none other than Klaus Barbie was focused on tracking her down.
After Operation Torch in Northern Africa, Hitler gave up the fiction of Vichy independence and came rolling south. Hall had to escape by trekking more than twenty miles of snowy Pyrenees passes into Spain. On foot. WITH A FAKE LEG. No joke — she lost her lower leg in a hunting accident, and accomplished all of this with a 1940s-era prosthetic strapped on with leather and buckles.
And then she went back. Now as an OSS agent and disguised as an old lady to elude the still-active Nazi campaign to find her, she set up shop in the south-central high plains of Haute-Loire, quickly taking control of and unifying resistance cells there. When the time came, the Diane Irregulars liberated their department on their own, even before uniformed Allied troops showed up.
Of course through it all, this incredible woman was treated largely like crap by the fragile male egos of the developing security services. Again and again she was made to report to less qualified agents. Back in the States after the war she was given administrative and low-level positions that made no sense. And in her performance reviews you see the same code words we see for women even today: “too direct” … “overly independent” … you get the idea. So dumb.
I’m glad to have stumbled upon her story, and that at last Virginia Hall is getting at least some of the credit she deserves. She was part of an amazing generation that did amazing things to defeat a terrible evil in the world. I’m grateful, and hope that our and future generations prove worthy of their sacrifices. Memorial Day indeed.