Nerdsplaining: SMART Health Links

This is article three of a series of three. The first two are here and here.

Last time here on the big show, we dug into SMART Health Cards — little bundles of health information that can be provably verified and easily shared using files or QR codes. SHCs are great technology and a building block for some fantastic use cases. But we also called out a few limitations, most urgently a ceiling on QR code size that makes it impractical to share anything but pretty basic stuff. Never fear, there’s a related technology that takes care of that, and adds some great additional features at the same time: SMART Health Links. Let’s check them out.

The Big Picture

Just like SMART Health Cards (SHCs) are represented by encoded strings prefixed with shc:/, SMART Health Links (SHLs) are encoded strings prefixed with shlink:/ — but that’s pretty much where the similarity ends. A SHC is health information; a SHL packages health information in a format that can be securely shared. This can be a bit confusing, because often a SHL holds exactly one SHC, so we get sloppy and talk about them interchangeably, but they are very different things.

The encrypted string behind a shlink:/ (the “payload”) is a base64url-encoded JSON object. We’ll dive in way deeper than this, but the view from 10,000 feet is:

  1. The payload contains (a) an HTTPS link to an unencrypted manifest file and (b) a key that will be used later to decrypt stuff.
  2. The manifest contains a list of files that make up the SHL contents. Each file can be a SHC, a FHIR resource, or an access token that can be used to make live FHIR requests. We’ll talk about this last one later, but for now just think of a manifest as a list of files.
  3. Each file can be decrypted using the key from the original SHL payload.

There’s a lot going on here! And this is just the base case; there are a bunch of different options and obligations. But if you remember the basics (shlink:/, payload, manifest, content) you’ll be able to keep your bearings as we get into the details.

Privacy and Security

In that first diagram, nothing limits who can see the manifest and encrypted content — they’re basically open on the web. But all that is basically meaningless without access to the decryption key from the payload, so don’t panic. It just means that, exactly like a SHC, security in the base case is up to the person that’s holding the SHL itself (in the form of a QR Code or whatever). And often that’s perfectly fine.

Except sometimes it’s not, so SHLs support added protection using an optional passcode that gates access to the manifest:

  1. A user receiving a SHL also is given a passcode. The passcode is not found anywhere in the SHL itself (although a “P” flag is added to the payload as a UX hint).
  2. When presenting the SHL, the user also (separately) provides the passcode. 
  3. The receiving system sends the passcode along with the manifest request, which succeeds only if the passcode matches correctly.

Simple but effective. It remains to be seen which use cases will rally around a passcode requirement — but it’s a handy arrow to have in the quiver.

The SHL protocol also defines a bunch of additional requirements to help mitigate the risk of all these (albeit encrypted and/or otherwise protected) files floating around:

  • Manifest URLs are required to include 256 bits of entropy — that is, they can’t be guessable.
  • Manifests with passcodes are required to maintain and enforce a lifetime cap on the number of times an invalid passcode is provided before the SHL is disabled.
  • Content URLs are required to expire (at most) one hour after generation.
  • (Optionally) SHLs can be set to expire, with a hint to this expiration time available in the payload.

These all make sense … but they do make publishing and hosting SHLs kind of complicated. While content files can be served from “simple” services like AWS buckets or Azure containers, manifests really need to be managed dynamically with a stateful store to keep track of things like passcodes and failed attempts. Don’t think this is going to be a one night project!

SMART Health Links in Action

Let’s look at some real code. First we’ll run a quick end-to-end to get the lay of the land. SHLServer is a standalone, Java-based web server that knows how to create SHLs and serve them up. Build and run it yourself like this (you’ll need a system with mvn and a JDK installed):

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox
mvn clean package install
cd ../shl
mvn clean package
cd demo
./run-demo.sh # or use run-demo.cmd on Windows

This will start your server running on https://localhost:7071 … hope it worked! Next open up a new shell in the same directory and run node create-link.js (you’ll want node v18+). You’ll see an annoying cert warning (sorry, the demo is using a self-signed cert) and then a big fat URL. That’s your SHL, woo hoo! Select the whole thing and then paste it into a browser. If you peek into create-link.js you’ll see the parameters we used to create the SHL, including the passcode “fancy-passcode”. Type that into the box that comes up and …. magic! You should see something very much like the image below. The link we created has both a SHC and a raw FHIR bundle; you can flip between them with the dropdown that says “Health Information”.

So what happened here? When we ran create-link.js, it posted a JSON body to the server’s /createLink endpoint. The JSON set a passcode and an expiration time for the link, and most importantly included our SHC and FHIR files as base64url-encoded strings. SHLServer generated an encryption key, encrypted the files, stored a bunch of metadata in a SQLite database, and generated a SHL “payload” — which looks something like this:

{
  "url": "https://localhost:7071/manifest/XruV__8k1Zn68NK1lsLH05ZmONtaUC85jmAW4zEHoTA",
  "key": "OesjgV2JUpvk-E9wu9grzRySuMuzN4HpcP-LZ4xD8hc",
  "exp": 1687405491,
  "flag": "P",
  "label": "Fancy Label",
  "_manifestId": "XruV__8k1Zn68NK1lsLH05ZmONtaUC85jmAW4zEHoTA"
}

(You can make one of these for yourself by running create.js rather than create-link.js.) Finally, that JSON is encoded with base64url, the shlink:/ protocol tag is added to the front, and then a configured “viewer URL” is added to the front of that.

The viewer URL is optional — apps that know what SHLs are will work correctly with just the shlink:/… part, but by adding that prefix anybody can simply click the link to get a default browser experience. In our case we’ve configured it with https://shcwork.z22.web.core.windows.net/shlink.html, which opens up a generic viewer we’re building at TCP. That URL is just my development server, so handy for demo purposes, but please don’t use it for anything in production!

Anyways, whichever viewer receives the SHL, it decodes the payload back to JSON, issues a POST to fetch the manifest URL it finds inside, pulls the file contents out of that response either directly (.embedded) or indirectly (.location), decrypts it using the key from the payload, and renders the final results. You can see all of this at work in the TCP viewer app. Woot!

A Quick Tour of SHLServer

OK, time for some code. SHLServer is actually a pretty complete implementation of the specification, and could probably even perform pretty reasonably at scale. It’s MIT-licensed code, so feel free to take it and use it as-is or as part of your own solutions however you like, no attribution required. But I really wrote it to help folks understand the nuances of the spec, so let’s take a quick tour.

The app follows a pretty classic three-tier model. At the top is SHLServer.java, a class that uses the built-in Java HttpServer to publish seven CORS-enabled endpoints: one for the manifest, one for location URLs, and five for various SHL creation and maintenance tasks. For the admin side of things, parameters are accepted as JSON POST bodies and a custom header carries an authorization token.

SHLServer relies on the domain class SHL.java. Most of the important stuff happens here; for example the manifest method:

  • Verifies that the requested SHL exists and isn’t expired,
  • Rejects requests for disabled (too many passcode failures) SHLs.
  • Verifies the passcode if present, keeping a count of failed attempts.
  • Sets a header indicating how frequently to re-pull a long-lived (“L” flag) SHL, and
  • Generates the response JSON, embedding file contents or allocating short-lived location links based on the embeddedLengthMax parameter.

The admin methods use parameter interfaces that try to simplify things a bit; mostly they just do what they’re called:

Because the manifest format doesn’t include a way to identify specific files, the admin methods expect the caller to provide a “manifestUniqueName” for each one. This can be used later to delete or update files — as the name implies, they only need to be unique within each SHL instance, not globally.

The last interesting feature of the class is that it can operate in either “trusted” or “untrusted” mode. That is, the caller can either provide the files as cleartext and ask the server to allocate a key and encrypt them, or it can pre-encrypt them prior to upload. Using the second option means that the server never has access to keys or personal information, which has obvious benefits. But it does mean the caller has to know how to encrypt stuff and “fix up” the payloads it gets back from the server.

The bottom layer of code is SHLStore.java, which just ferries data in semi-ORM style between a Sqlite database and file store. Not much exciting there, although I do have a soft spot for Sqlite and the functional interface I built a year or so ago in SqlStore.java. Enough said.

Anatomy of a Payload

OK, let’s look a little more closely at the payload format that is base64url-encoded to make up the shlink:/ itself. As always it’s just a bit of JSON, with the following fields:

  • url identifies the manifest URL which holds the list of SHL files. Because they’re burned into the payload, manifest URLs are expected to be stable, but include some randomness to prevent them from being guessable. Our server implements a “makeId” function for this that we use in a few different places.
  • key is the shared symmetric key used to encrypt and decrypt the content files listed in the manifest. The same key is used for every file in the SHL.
  • exp is an optional timestamp (expressed as an epoch second). This is just a hint for viewers so they can short-circuit a failed call; the SHL hoster needs to actually enforce the expiration.
  • label is a short string that describes the contents of the SHL at a high level. This is just a UX hint as well.
  • v is a version number, assumed to be “1” if not present.
  • flags is a string of optional upper-case characters that define additional behavior:
    • “P” indicates that access to the SHL requires a passcode. The passcode itself is kept with the SHL hoster, not the SHL itself. It is communicated to the SHL holder and from the holder to a recipient out of band (e.g., verbally). The flag itself is just another UX hint; the SHL hoster is responsible for enforcement.
    • “L” indicates that this SHL is intended for long-term use, and the contents of the files inside of it may change over time. For example, a SHL that represents a vaccination history might use this flag and update the contents each time a new vaccine is administered. The flag indicates that it’s acceptable to poll for new data periodically; the spec describes use of the Retry-After header to help in this back-and-forth.

One last flag (“U”) supports the narrow but common use case in which a single file (typically a SHC) is being transferred without a passcode, but the data itself is too large for a usable QR code. In this case the url field is interpreted not as a manifest file but as a single encrypted content file. This option simplifies hosting — the encrypted files can be served by any open, static web server with no dynamic manifest code involved. The TCP viewer supports the U flag, but SHLServer doesn’t generate them.

Note that if you’re paying attention, you’ll see that SHLServer returns another field in the payload: _manifestId. This is not part of the spec, but it’s legal because the spec requires consumers to expect and ignore fields they do not understand. Adding it to the payload simply makes it easier for users of the administration API to refer to the new manifest later (e.g., in a call to upsertFile).

Working with the Manifest

After a viewer decodes the payload, the next step is to issue a POST request for the URL found inside. POST is used so that additional data can be sent without leaking information into server logs:

  • recipient is a string representing the viewer making the call. For example, this might be something like “Overlake Hospital, Bellevue WA, registration desk.” It is required, but need not be machine-understandable. Just something that can be logged to get a sense of where SHLs are being used.
  • passcode is (if the P flag is present) the passcode as received out-of-band from the SHL holder.
  • embeddedLengthMax is an optional value indicating the maximum size a file can be for direct inclusion in the manifest. More on this in a second.

The SHL hoster uses the incoming manifest request URL to find the appropriate manifest (e.g., in our case https://localhost:7071/manifest/XruV__8k1Zn68NK1lsLH05ZmONtaUC85jmAW4zEHoTA), then puts together a JSON object listing the content files that make up the SHL. The object contains a single “files” array, each element of which contains:

  • contentType, typically one of application/smart-health-card for a SHC or application/fhir+json for a FHIR resource (I promise we’ll cover application/smart-api-access before we’re done).
  • A JSON Web Encryption token using compact serialization with the encrypted file contents. The content can be delivered in one of two ways:
    • Directly, using an embedded field within the manifest JSON.
    • Indirectly, as referenced by a location field within the manifest JSON.

This is where embeddedLinkMax comes into play. It’s kind of a hassle and I’m not sure it’s worth it, but not my call. Basically, if embeddedLengthMax is not present OR if the size of a file is <= its value, the embedded option may be used. Otherwise, a new, short-lived, unprotected URL representing the content should be allocated and placed into location. Location URLs must expire after no more than one hour, and may be disabled after a single fetch. The intended end result is that the manifest and its files are considered a single unit, even if they’re downloaded independently. All good, but it does make for some non-trivial implementation complexity (SHLServer uses a “urls” table to keep track; cloud-native implementations can use pre-signed URLs with expiration timestamps).

In any case, with JWEs in hand the viewer can finally decrypt them using the key from the original payload — and we’re done. Whew!

* Note I have run into compatibility issues with encryption/decryption. In particular the specification requires direct encryption using A256GCM, which seems simple enough. But A256GCM requires a 12-byte initialization vector, and there are libraries (like python-jose at the time of this writing) that mistakenly use 16.  Which might seem ok because it “works”, but some compliant libraries (like javascript jose) error out when they see the longer IV and won’t proceed. Ah, compatibility.

SMART API Access

OK I’ve put this off long enough — it’s a super-cool feature, but messes with my narrative a bit, so I’ve saved it for its own section.

In addition to static or periodically-updated data files, SHLs support the ability to share “live” authenticated FHIR connections. For example, say I’m travelling to an out-of-state hospital for a procedure, and my primary care provider wants to monitor my recovery. The hospital could issue me a SHL that permits the bearer to make live queries into my record. There are of course other ways to do this, but the convenience of sharing access using a simple link or QR code might be super-handy.

A SHL supports this by including an encrypted file with the content type application/smart-api-access. The file itself is a SMART Access Token Response with an additional aud element that identifies the FHIR endpoint (and possibly some hints about useful / authorized queries). No muss, no fuss.

The spec talks about some other types of “dynamic” exchange using SHLs as well. They’re all credible and potentially useful, but frankly a bit speculative. IMNSHO, let’s lock down the more simple file-sharing scenarios before we get too far out over our skis here.

And that’s it!

OK, that’s a wrap on our little journey through the emerging world of SMART Health Cards and Links. I hope it’s been useful — please take the code, make it your own, and let me know if (when) you find bugs or have ideas to make it better. Maybe this time we’ll actually make a dent in the health information exchange clown show!

SMART does it again with Health Cards and Links

This is article one of a series of three. Two is here; three is here!

A few weeks ago I helped out with a demo at UCSD that showed patients checking into the doctor using a QR code. It was pretty cool and worked well, excepting some glare issues on the kiosk camera. But why, you might ask, did this retired old guy choose to spend time writing code to support, of all things, workflow automation between providers and insurance companies? Well since you clearly did ask, I will be happy to explain!

I spent a lot of my career trying to make it easier for individuals to get the informed care they need to be healthy and safe. And while I’ll always be proud of those efforts, the reality is that we just weren’t able to change things very much. Especially here in the US, where the system is driven far more by dollars than by need. But I’m still a believer — longitudinal care that travels with the individual is the only way to fix all of this — and despite my exit from the daily commute, I’m always on the lookout for ideas that will push that ball forward.

COVID and the birth of SMART Health Cards

Flash back to COVID year two (bear with me here). We had vaccines and they worked really well, and folks were chomping at the bit to DO STUFF again. One way we tried to open the world back up was by requiring proof of vaccination for entry to movies and bars and such. And because healthcare still thinks it’s 1950, this “proof” was typically a piece of paper. Seriously. Anyways, a few folks who live in the current millenium came up with a better idea they called SMART Health Cards — a fancy way of using QR codes and phones to share information (like vaccine status) that can be digitally verified. It was a lot better than paper — The Commons Project even made a free mobile app that venues could use to quickly and easily scan cards. More than thirty states adopted the standard within months — a track record that will make any health tech wonk stand up and take notice.

Of course the problem with any new technology is that adoption takes time — most folks still just showed up at the bar with a piece of paper. But with SMART Health Cards, that’s fine! Paper records could easily include a SHC QR code and the system still worked great. I found this bridge between the paper and digital worlds super-compelling … it just smelled like maybe there was something going on here that was really different. Hmm.

As it turns out, the pandemic began to largely burn itself out just as all of this was building up steam. That’s a good thing of course, but it kind of put the brakes on SHC adoption for a bit.

Enter SMART Health Insurance Cards

One reason the states were so quick to adopt SHCs is because it was fundamentally simple:

  • Host a certificate (ok, a “signing key”).
  • Sign and Print a QR code on your existing paper records.

That’s the whole thing. Everything that worked before keeps working. All you need to do to get the digital benefits is to put a QR code on whatever document or card or app you already have. This is pretty neat. Of course, other pieces of the ecosystem like the verifier and “trust network” of issuers took a bit more work, but for the folks in the business of issuing proof, it’s really easy.

It’s pretty clear that this technology could be used in other ways as well. Extending vaccine cards to include more history for camps and schools is an obvious one. Folks are working on an “International Patient Summary” to help people move more seamlessly between health systems. And, finally getting back to the point of this post, it seems like there is a real opportunity to improve the experience of patient check-in for care — we all have insurance cards in our wallet, why not make them digital and use QR codes to simplify the process?

This idea gets me excited because, if you play it out, there appears to be a chance to move that “individual health record” ball forward. First, there is real business momentum behind the idea of improving check-in. 22% of claim denials are due to typos and other errors entering registration information. We’ve learned it takes six weeks to train front desk staff to interpret the thousands of different insurance card formats — and it’s a high-turnover job, so folks are running that training again and again. And even when the data makes it to the right place in the end, it’s only after a super-annoying process of form-filling and xeroxing that nobody feels positive about.

All of this together means there are a lot of people who are honestly psyched about the potential financial / experience benefits of digital check-in. Especially when the lift is simply “put a QR code on your existing cards.” It’s kind of a no-brainer and I think that’s why more than 70 different organizations were represented at the UCSD demonstration. It was pretty neat.

More than insurance

Cool, so there’s motivation to actually deploy these things and begin to transition check-in to the modern world. (I should acknowledge that this is not the first or only initiative in this space; for example Phreesia has been working the problem for years and does a fantastic job. SMART Health Cards are additive to these workflow solutions and will just make them all get traction quicker.)

But the other thing that gets me excited here is that the “payload” in a SMART Health Card can carry way more than just insurance data. That same card — especially since it’s coming from an insurance company that knows a lot about your health — could include information on your allergies, medications, recent procedures, and much more. All of the stuff that you have to fill out on forms every time you show up anyways, and that can make or break the quality of care you receive. You can even imagine using this connection to set up authorization for the provider to update your personal record after the visit.

Woo hoo! At the end of the day, I see this initiative as one that has the potential to improve coordination of care through individuals in a way that will actually be deployed and sustained, because it has immediate and obvious business benefit too. And with the ability of SHCs to bridge paper to digital, we may be looking at a real winner. Still a ton of work to do on the provider integration side, but that just makes it interesting.

Oops one problem (and a solution)

It turns out that there is one big technical issue with SHC QR codes that make a lot of what I’ve been gushing about kind of, well, impossible. The numbers bounce around depending on the physical size of the QR image, but basically you can only cram about 1,200 bytes of data into the QR itself. That’s enough for a really terse list of vaccines, but it just doesn’t work for larger payloads. Insurance data alone using the proposed CARIN FHIR format seems to average about 15k. Hmm.

No problem — Josh and his merry band of collaborators come to the rescue again with the concept of SMART Health Links. A SHL creates an indirection between the QR Code and a package of data of basically unlimited size that can contain multiple SMART Health Cards, other collections of health data, and even those authorization links I mentioned earlier. The data in the QR code is just a pointer to that package, encrypted at a URL somewhere. The standard defines how that encryption works, defines ways to add additional security, and so on. It’s great stuff.

The workflow we demonstrated at UCSD uses payor-issued SMART Health Cards wrapped up inside SMART Health Links. If a person has multiple insurance cards (or even potentially drivers licenses and other good stuff) they could combine them all into a “Super-Link” while still maintaining intact the ability to verify each back to the company or state or whatever that issued it. Ka-ching!

If you’re interested in all of this, I’d invite you to join/follow the “SMART Health Insurance Card Initiative” on LinkedIn so you can watch it evolve and, hopefully, scale up.

And if you’re a nerd like me, over the next week or so I’m going to write two techy posts about the details — one for SMART Health Cards and one for SMART Health Links. Hopefully they will serve to get folks more comfortable with what it will take for issuers and consumers to get moving with real, production deployments quickly. If you’d like to get notified when those go up, just follow me on LinkedIn or Twitter or whatever.

It’s a good fight and hopefully this one will get us closer to great care. Just Keep Swimming!

Looking back at Azyxxi… er, Amalga.

Just a few months after the Great Gunshot Search incident of 2005, I found myself at Washington Hospital Center while Dr. Craig Feied showed us list after list on a huge (for the time) monitor. Real-time patient rosters for the ER and ICU, sure, but that was just the warmup. Rooms that needed cleaning. Patients who needed ventilation tubes replaced. Insurance companies with elevated rates of rejected claims. Patients eligible for actively-recruiting complex trials. He just kept going, like a fireworks show where every time you think you just saw the finale they start up again. Incredible stuff. Anyways, cut to a few months later and we (Microsoft) announced the acquisition of Azyxxi — adding an enterprise solution to our growing portfolio in Health Solutions.

Sadly — and despite a ton of work — we were never really able to scale that incredible solution at WHC into a product that realized the same value at other institutions. That’s not to say there weren’t some great successes, because there absolutely were. But at the end of the day, there was just something about Azyxxi that we couldn’t put into a box. And while it’s tempting to just say that it was a timing problem, I don’t think that was it. Even today I don’t see anything that delivers the magic we saw in Dr. Craig’s office — just flashy “innovation” videos and presentations that never quite make it to the floor in real life.

So what was the problem? Anything we can do about it? I dunno — let’s talk it out.

Oh, and just to get it out of the way early, “Azyxxi” doesn’t mean anything — it’s just a made-up word engineered to be really easy to find with Google. We renamed it “Amalga” at Microsoft, which does actually have some meaning behind it but in retrospect sounds a bit like some kind of scary semi-sentient goo. Moving on.

Just what was it?

A correct but only semi-helpful description of Azyxxi is that it was a data analysis and application platform for healthcare. Three parts to that: (a) data analysis, like a big data warehouse; (b) an application platform so insights gained from analysis could be put into on-the-floor solutions; (c) made for healthcare, which means there was functionality built-in that targeted weirdnesses endemic to the business of providing care. This is of course a mouthful, and one of the reasons it was hard to pitch the product outside of specific use cases. A better and more concrete way of looking at the product is to break it down into five key activities:

1. Get the Data

Healthcare data is incredibly diverse and notoriously messy — images, free text notes, lab results, insurance documents, etc. etc.. The first rule of the Azyxxi Way (yes we actually referred to it like that) was to “get the data, all of it, without trying to clean it up.” Which is to say, it was a Data Lake before Data Lakes were cool (or even a term). In 2006 the conventional wisdom for data warehousing was “Extract, Transform, Load.” ETL pipelines extract data out of source systems, transform it into (usually) a “star schema” optimized for analysis, and load it into a target database. In this model an enormous amount of upfront thought goes into what data is important, and transforming/normalizing it into a shape that can efficiently answer a set of predefined questions.

Azyxxi’s insight was that ETL prework is always wrong, and leaves you with a brittle data warehouse unable to answer novel questions as they inevitably arise. Instead they talked about “ELT” — loading everything just as it was in the source systems and figuring out the “transform” part later. This seems obvious now, but we all used to worry a ton about performance. Azyxxi used SQL Server, and the founders were constantly pushing its boundaries, typically with great success. Sure, some queries were really slow — but you could at least ask the question!

2. Ask Novel Questions

Which leads us to the first user-driven Azyxxi experience — exploration. Using an Excel-like grid display, users had the ability to query source tables individually or via pre-configured “joins” that linked records. Sort, filter, etc. — all the standard stuff was there. Of course there was access control, but this was a care-focused tool in a care-delivery setting — by default users could see a LOT. And as noted above they could get themselves into “trouble” by running queries that took hours or days, but SQL Server is smart and it was mostly just fine.

The key is that there was a culture at the early Azyxxi sites, developed over many years, of asking questions and self-serving the answers. This is not typical! Most nurses and doctors ask themselves data-driven questions in passing, but never follow them up. Working with the IT department to run a report, combine data from multiple sources, get approval to make a change — it just isn’t worth the hassle. So great ideas just die on the vine every day. Azyxxi users knew they had a way to answer their questions themselves — and so they did.

3. Bring Insights to the Floor

It’s awesome to be able to ask questions. But it’s only really impactful when you can use the answers to effect change in real life. With Azyxxi, one-off queries could be saved together with all of their settings — including automatic refresh and kiosk-style display — and shared with other users or departments.

If you’ve been a hospital inpatient or visitor lately, almost certainly you’ve seen the patient roster grid at the central nurse’s station. At my recent colectomy the surgical unit had a live status board that helped my wife keep track of my progress through the building. Great stuff, but every one of these dashboards is an IT project, and no IT project is trivial. With Azyxxi, more than a decade ago, users could create and deploy them by themselves.

But hold on. I’ve already said twice that novel queries against source data could be really slow — a “real-time” dashboard that takes an hour to load isn’t going to get very far, and end users don’t have the skills or tools to fix it. What to do?

Azyxxi empowered the IT folks to run behind user innovation and keep things humming. Each user-created list was driven by an automatically generated SQL query — and anyone who has written interfaces like this know that they can become very inefficient very quickly. Slow queries were addressed using a sliding scale of intervention:

  1. Hand-code the query. SQL experts on the Azyxxi team were great at re-writing queries for performance. The new query could be inserted behind the user grid transparently and without downtime — it just looked like magic to the end users.
  2. Pre-calculate joins or derived data. When hand-coding queries wasn’t enough, the team could hook into the “EL” part of data acquisition and start doing a little “T” with code. For example, data from real-time monitors might be aggregated into hourly statistics. Or logic to group disease codes into higher-level buckets could be applied ahead of time. These were the same kind of “transforms” done in every data warehouse — but only done after a particular use case proved necessary and helpful.
  3. Fully-materialize user grids. An extreme version of pre-calculation, sometimes code would be written to build an entire user grid as its own table. Querying these tables was lightning fast, but creating them of course took the most IT effort.

The refrain here was just-in-time optimization. The software made it easy for the Azyxxi IT team to see which queries were active, and to assess which approach would lead to acceptable performance. That is, they optimized scarce IT expertise to only do work that was already known to have real clinical value. Compare this to the torturous processes of up-front prioritization and resource allocation in most of the world.

Axyxxi also made these transforms sustainable by strictly enforcing one-way data dependency. Only one “parser” (not really a parser in the CS sense, just Azyxxi terminology for ELT code) could write to one target (typically a table or set of tables), and then optionally trigger additional downstream parsers to perform further transformation into other targets. This “forward-only-write” approach provided a ton of benefit — most importantly automatic error and disaster recovery. At any time, parsers at any level of the hierarchy could be re-run from scratch, trigger their downstream dependencies, and end up with an exact copy of what existed before the recovery event.

Even these dependencies could become complicated, and nobody loved the idea of a “full re-parse” — but it was an invaluable backup plan. One we took advantage of more often than you’d expect!

4. Close the Loop

Because data acquisition was near-real-time, most grids didn’t require additional user input to be useful. New lab results arriving for a patient naturally caused them to fall off of the “patients awaiting lab results” grid. It’s kind of amazing how many problems fit this simple pattern — auto-refreshing grids on a kiosk screen prove to be transformative.

But sometimes there was no “source system” to provide updates — e.g., a list that alerted facilities to newly-vacated rooms that needed to be cleaned. The “newly-vacated” part came from the external EHR system, but cleaning times did not. Azyxxi included user-editable fields and forms for this purpose — never changing ingested data, just adding new data to the system. A facilities employee could simply click a row after taking care of a room, and the grid updated automatically.

Users could create pretty complex forms and such in the system — but honestly they rarely did. Usually it was simply checking an item off of a list, maybe with a bit of extra context about the activity. Simple stuff that created beautifully elegant solutions for a ton of different situations.

5. Improve the data

There are a bunch of challenges specific to healthcare data. Take for example the humble patient identifier — by law we have no federal patient identification number in the United States. The amount of time and money spent making sure records are assigned to the right human is absolutely shocking, but there it is. Especially in high-stress hospital admission settings, recorded demographics are often wrong or missing — every significant health care information system has to deal with this.

Privacy rules are another one. Providers in a care setting have very few restrictions on the data they can see, but the same isn’t true for all employees, and certainly not for visitors walking by kiosk displays in a hallway. There are specific rules around how data needs to be anonymized and what data elements can appear together — more work for users trying to build usable queries.

Even simply figuring out why a patient is in the hospital can be tough. Different systems use different “coding systems”, or sometimes no coding at all. A huge federal project called the “Unified Medical Language System” is an attempt to help navigate all of this, but it’s pretty hairy stuff and not in any way “user ready.”

Azyxxi’s “one way” parsing system made it relatively easy to help create “augmented” tables to handle these things once rather than many times. My favorite example of this was the “PHI filter” parser, which would take a table and automatically create a version that masked or otherwise anonymized relevant fields. The user interface could then be directed at the original or “safe” version of the table, depending on the rights of the logged-on user.

This all sounds great, so what happened?

If you’ve read along this far, you probably already have a sense of the challenges we were about to face as Azyxxi v1 became Amalga v2. We spent a lot of time upgrading and hardening the software, modernizing UX, etc. – and that all went fine, albeit with some inevitable cultural churn. And despite a non-trivial problem with “channel conflict” — our nascent sales team was getting a positive response to the story we were telling. I mean, a simple slide show of awesome use cases at WHC and other Azyxxi sites was pretty compelling.

Side note: channel conflict is a tough thing at Microsoft! The sales team is used to co-selling with third parties that build solutions on top of Microsoft platforms like Windows and SQL Server (and now Azure). So they were best buddies with a whole bunch of healthcare data analytics companies that were in direct competition with Amalga … oops! This problem is a hassle for every vertical solution at Microsoft, and they’ve never really figured out how to deal with it. I don’t think it played a primary role in Amalga’s market woes, but it sure didn’t help.

So the software was OK — but right away, early implementations just weren’t making it into production use on schedule. What the heck?

Oops, IT Culture

First, it turned out that we had a significant problem fighting IT culture at our target customers. The Azyxxi team at WHC and its sister organizations were also the Azyxxi developers. For them, the counter-conventional-wisdom practices of Azyxxi were the whole point, and they knew how to turn every knob and dial to achieve just-in-time optimization. But your typical health system IT department — even those run by really competent folks — just doesn’t think that way. They are a cost center with an endless list of projects and requests, often driven more by risk avoidance than innovation. Most of these shops also already had some sort of data analytics solution; while they invariably sucked, they existed and were a sunk cost that the team knew how to use.

The Amalga team walked in and just started breaking eggs left and right. We asked for a very large up-front investment, using weird new techniques — all for a few smallish initial use cases that had captured the eye of some annoying but influential doctor or the Chief Medical Officer. We told them to “just get the data, don’t worry about what you think you need.” We told them that SQL Server was fine for things that made their SQL experts faint on the spot. We told them to give broad access to users rather than assigning rights on a “need to know” basis.

In short, we told them to do everything differently, using coding skills they didn’t even have. Not surprisingly, that didn’t work out. In almost every case we became bogged down by “prioritization” and “project planning” that stopped implementations cold. And even when we finally were able to eke out an MVP implementation, we almost always ran straight into our second stumbling block.

Oops, User Culture

The Amalga team used to talk a lot about “democratizing” access to data. And to be sure, nobody has better insight into day-to-day problems than nurses and docs and the others doing the actual work of providing care. But as it turns out, not a lot of these folks have the skills, motivation or time to dig in and create the kind of self-reinforcing flywheel of improvements that Amalga was designed for.

At least, that’s the way it is in most healthcare systems. The IT department and leadership push technology down onto the working staff, and they just have to deal with it. Sometimes it’s great, sometimes it’s awful, but either way it typically isn’t something they are rewarded for getting involved with. Executives and maybe department heads ask the IT department to prepare “reports” that typically show very high-level, lagging indicators of quality or financial performance. But technology-driven workflow changes? It’s usually a pretty small bunch making those calls.

This was a challenge at the early Azyxxi sites, too. But a combination of (a) sustained evangelist outreach from the Azyxxi team itself, and (b) successful users becoming evangelists themselves, created the right environment to bring more and more users into the game. Almost every department had at least one active Azyxxi user who would work with their colleagues to leverage the tools. But at new Amalga sites, where the IT team was often reluctant to begin with, with no established pattern of users self-serving their own solutions, and only a few small uses cases deployed — starting the flywheel was a tall order indeed.  

It’s tough to establish a system when you’re fighting culture wars on both the supply and demand fronts!

The good fight: Amalga v3

With a pretty clear set of problems in front of us, the Amalga team set out strategies to fix them. I’m really proud of this time in HSG — the team came together in one of those moments of shared purpose that is both rare and exhilarating. Some of the software we built would be state of the art even today. Bryan, Mehul, Kishore, Noel, Adeel, Sohail, Sumeet, Mahmood, Puneet, Vikas, Imran, Matt, Linda, Shawna, Manish, Gopal, Pierre, Jay, Bei-Jing, many many more … it was just a ton of fun.

Goal #1: Easier for IT

The biggest knock on Amalga v2 from IT was that it was just too slow. Of course, having been on this journey with me you know that this misses the point. Amalga was designed for just-in-time optimization — if important queries were “slow” they just needed to be optimized by hand-coding, pre-computing key values, or fully materializing tables. Simple! Unless of course your IT team doesn’t have advanced coding or SQL skills. Which was, unfortunately, most customers.

We took on a bunch of features to better automate JIT optimization, but the biggest by far was automatic materialization. Based on a list query created either in the Amalga user interface or by hand, Amalga v3 could automatically create and maintain a flat, simple table representing the results, with maximally-efficient inserts and updates at parse time. This meant that every grid could be made performant simply by checking a box to turn on materialization. OK, maybe not that easy — but pretty close.

We also made initial data acquisition simpler by introducing a “super parser” that could be driven by configuration rather than by code. We put together a sophisticated install and patch system that enabled upgrades without disturbing user customizations. We extended our custom client with Sharepoint integration, making it easier to combine Amalga and other corporate content, and reduced the burden of user and group management. And much more.

Goal #2: Shorter Time-to-Value for Users

If users weren’t creating their own apps, we’d bring the apps to them!

On top of the new Sharepoint integration, we created a configuration framework for describing data requirements for detailed, targeted use cases. Deploying an app was simply a matter of identifying the source for each data element required — a “checklist” kind of task that was easy to explain and easy to do. And while installing the first app or two would certainly require new parsing and data extraction work, at critical mass users were mostly reusing existing data elements, making it far easier to demonstrate the value of building a “data asset” over time.

And then we went mining for apps. We dug up every Azyxxi use case and convinced early Amalga customers to share theirs. Even better, we created a developer program, both for consultants who helped customers build their own apps (e.g., Avenade) and third party developers that created their own (e.g., CitiusTech). Classic Microsoft playbook — and a great way to recapture Dr. Craig’s fireworks-that-never-end sales experience.

Goal #3: Kickstart Evangelism

Lastly, we dropped our own people into customer sites, to be the early evangelists they needed. I was the executive sponsor for Seattle Children’s Hospital and was there at least once a week in person to help the IT team solve problems, meet with docs and nurses to develop lists and apps, take feedback and get yelled at, whatever it took. I learned a ton, and was able to bring that learning back to the team. I’ll always appreciate the time I spent there with Drex and Ted — even if it wasn’t always fun.

Honestly, I’ve never seen another organization commit to its customers so hard. Every single person on the team was assigned to at least one site — execs, sales, engineers, everyone. And our customers’ success was part of our annual review. If we just couldn’t get somebody over the hump, it sure wasn’t for a lack of sweat equity. In fact I forgot about this, but you can still find demos made by yours truly more than a decade ago! Here’s one inspired by Atul Gawande’s Checklist Manifesto:

And then came Caradigm (and Satya)

Update: Originally I dated the below as 2014 and Renato corrected me — the Caradigm JV was formed in 2012, two years before Satya’s official start date and my ultimate departure from the company. Those two years were very quite chaotic between the two CEOs and I’m afraid my brain conflated some things — thanks for setting me straight!

By 2012 we’d been in a long, pitched battle — making progress, but still tough. Then again, that had pretty much been the plan we set with Steve back in 2006; it was going to take a long time for Microsoft to really get established in a vertical industry like healthcare. I have always admired Steve for his willingness to commit and stick with a plan — people love to winge, but he was great for Microsoft for a long time.

But companies are not families, and shareholders and the market were clearly ready for new strategies and new blood as CEO. And where Steve’s approach was to go broad, Satya’s was (is) to go deep on just a few things — and clearly he was on the rise. Don’t get me wrong, it has clearly been a winning strategy for Azure and the business; a big part of my portfolio is still in Microsoft and my retirement is thankful for his approach! But it did shine a very, very bright spotlight on ventures like Health Solutions that weren’t core to the platform business and weren’t making any real money (yet). Totally fair.

So we had to find another path for Amalga.

During the last few years, it had become clear that a key use case for Amalga was population management — the idea that with a more comprehensive, long-term view of an individual we could help them stay healthy rather than just treat them when they’re sick. This is the driving force behind “value-based” care initiatives like Medicare Advantage, and why you see these plans promoting healthy lifestyle options like weight loss and smoking cessation — small early investments that can make a big difference in costs (and health) later in life.

But to do this well you need to know more about an individual than just when they show up at the hospital. It turns out that Amalga was very well-suited to this task — able to pull in data from all kinds of diverse sources and, well, amalgamate it into a comprehensive view (I had to do that at least once, right?). In fact, Amalga apps related to population health were typically our most successful.

It turned out that GE HealthCare was also interested in this space, building on their existing hardware and consulting businesses. Thus was born Caradigm, a joint venture that set out with partners like Geisinger Health to build population health management tools on top of Amalga. The new company took some employees from Microsoft but was more new than old, and fought the good fight for a few years until they were ultimately bought by Imprivata and frankly I’ve lost the thread from there.

TLDR; What to make of it all?

In retrospect, I think it’s pretty clear that Amalga’s problems were really just Healthcare’s problems. Not technology — Amalga v3 was certainly more sophisticated than Azyxxi v1, but both of them could do the job. Data and workflows in healthcare are just so fragmented and so diverse that a successful data-driven enterprise requires the problem-solving skills of people at least as much as technology. More specifically, two types of people:

  1. Developers that can quickly build and maintain site-specific code.
  2. Evangelists that can bring potential to life for “regular” users.

Of course a certain level of technology is required just to house and present the data. And great tech can be an enabler and an accelerant. But without real people in the mix it’s hard for me to imagine a breakout product that changes the game on its own. Bummah.

But let me end with two “maybes” that just might provide some hope for the future:

MAYBE all of the layoffs in pure tech will change the game a bit. As somebody who has built teams in both “tech-first” and “industry-first” companies, I know how tough it is to attract really top talent into industry. Tech has always paid more and had way more nerd cred. I find that annoying because it can be incredibly rewarding to do something real and concrete — as much as I loved Microsoft, nothing I ever did there matched the impact of collaboration with clinicians and patients at Adaptive Biotechnologies. If we can get more talent into these companies, maybe it’ll pay off with a few more Azyxxi-like solutions.

Or MAYBE ChatGPT-like models will be able to start filling in those gaps — they can already write code pretty well, and I wouldn’t be shocked to see a model create high-impact dashboards using historical performance data as a prompt. This one may be a little more out there, but if AI could create an 80% solution, that might just be enough to get users excited about the possibilities.

Who knows? I just hope folks find some interesting nuggets in this very long post — and if nothing else I had a great time walking myself down memory lane. I will leave you with this video, made after the acquisition but sadly before I was spending day-to-day time on the product. We do get knocked down, and 100% we get up again!

Health IT: More I, less T

“USCDI vs. USCDI+ vs. EHI vs. HL7 FHIR US Core vs. IPA. Definitions, similarities, and differences as you understand them. Go!” —Anonymous, Twitter

I spent about a decade working in “Health Information Technology” — an industry that builds solutions for managing the flow of healthcare information. It’s a big tent that boasts one of the largest trade shows in the world and dozens of specialized venture funds. And it’s quite diverse, including electronic health records, consumer products, billing and cost management, image management, AI and analytics of every flavor you can imagine, and more. The money is huge, and the energy is huger.

Real world progress, on the other hand, is tough to come by. I’m not talking about health care generally. The tools of actual care keep rocketing forward; the rate limiter on tests and treatments seems only our ability to assess efficacy and safety fast enough. But in the HIT world, it’s mostly a lot of noise. The “best” exits are mostly acquisitions by huge insurance companies willing to try anything to squeak out a bit more margin.

That’s not to say there’s zero success. Pockets of awesome actually happen quite often, they just rarely make the jump from “promising pilot” to actual daily use at scale. There are many reasons for this, but primarily it comes down to workflow and economics. In our system today, nobody is incented to keep you well or to increase true efficiency. Providers get paid when they treat you, and insurance companies don’t know you long enough to really care about your long-term health. Crappy information management in healthcare simply isn’t a technology problem. But it’s an easy and fun hammer to keep pounding the table with. So we do.

But I’m certainly not the first genius to recognize this, and the world doesn’t need another cynical naysayer, so what am I doing here? After watching another stream of HIT technobabble clog up my Twitter feed this morning, I thought it might be helpful to call out four technologies that have actually made a real difference over the last few years. Perhaps we’ll see something in there that will help others find their way to a positive outcome. Or maybe not. Let’s give it a try.

A. Patient Portals

Everyone loves to hate on patient portals. I sure did during the time I spent trying to make HealthVault go. After all, most of us interact with at least a half dozen different providers and we’re supposed to just create accounts at all of them? And figure out which one to use when? And deal with their circa 1995 interfaces? Really?

Well, yeah. That’s pretty much how business works on the web. Businesses host websites where I can view my transaction history, pay bills, and contact customer support. A few folks might use aggregation services to create a single view of their finances or whatever, but most of us just muddle through, more-or-less happily, using a gaggle of different websites that don’t much talk to each other.

There were three big problems with patient portals a few years ago:

  1. They didn’t exist. Most providers had some third-party billing site where you could pay because, money. But that was it.
  2. When they did exist, they were hard to log into. You usually had to request an “activation code” at the front desk in person, and they rarely knew what you were talking about.
  3. When they did exist and you could log in, the staff didn’t use them. So secure messaging, for example, was pretty much a black hole.

Regulation fixed #1; time fixed #2; the pandemic fixed #3. And it turns out that patient portals today are pretty handy tools for interacting with your providers. Sure, they don’t provide a universal comprehensive view of our health. And sure, the interfaces seem to belong to a long ago era. But they’re there, they work, and they have made it demonstrably easier for us to manage care.

Takeaway: Sometimes, healthcare is more like other businesses than we care to admit.

B. Epic Community Connect & Care Everywhere

Epic is a boogeyman in the industry — an EHR juggernaut. Despite a multitude of options, fully a third of hospitals use Epic, and that percentage is much larger if you look at the biggest health systems in the country. It’s kind of insane.

It can easily cost hundreds of millions of dollars to install Epic. Institutions often have Epic consultants on site full time. And nobody loves the interface. So what is going on here? Well, mostly Epic is just really good at making life bearable for CIOs and their IT departments. They take care of everything, as long as you just keep sending them checks. They are extremely paternalistic about how their software can be used, and as upside-down as that seems, healthcare loves it. Great for Epic. Less so for providers and patients, except for two things:

Community Connect” is an Epic program that allows customers to “sublet” seats in their Epic installation to smaller providers. Since docs are basically required to have an EHR now (thanks regulation), this ends up being a no-brainer value proposition for folks that don’t have the IT savvy (or interest) to buy and deploy something themselves. Plus it helps the original customer offset their own cost a bit.

Because providers are using the same system here, data sharing becomes the default versus the exception. It’s harder not to share! And even non-affiliated Epic users can connect by enabling “Care Everywhere,” a global network run by Epic just for Epic customers. Thanks to these two things, if you’re served by the 33%+ of the industry that uses Epic, sharing images and labs and history is just happening like magic. Today.

Takeaway: Data sharing works great in a monopoly.

C. Open Notes

OpenNotes is one of those things that gives you a bit of optimism at a time when optimism can be tough to come by. Way back in 2010, three institutions (Beth Israel in MA, Geisinger in PA, and Harberview in WA) started a long-running experiment that gave patients completely unfettered access to their medical records. All the doctor’s notes, verbatim, with virtually no exception. This was considered incredibly radical at the time: patients wouldn’t understand the notes; they’d get scared and create more work for the providers; providers fearing lawsuits would self-censor important information; you name it.

But at the end of the study, none of that bad stuff happened. Instead, patients felt more informed and greatly preferred the primary data over generic “patient education” and dumbed-down summaries. Providers reported no extra work or legal challenges. It took a long time, but this wisdom finally made it into federal regulation last year. Patients now must be granted full access to their providers’ notes in electronic form at no charge.

In the last twelve months my wife had a significant knee surgery and my mom had a major operation on her lungs. In both cases, the provider’s notes were extraordinarily useful as we worked through recovery and assessed future risk. We are so much better educated than we would otherwise have been. An order of magnitude better than ever before.

Takeaway: Information already being generated by providers can power better care.

D. Telemedicine

It’s hard to admit anything good could have come out of a global pandemic, but I’m going to try. The adoption of telemedicine as part of standard care has been simply transformational. Urgent care options like Teladoc and Doctor on Demand (I’ve used both) make simple care for infections and viruses easy and non-disruptive. For years insurance providers refused “equal pay” for this type of encounter; it seems that they’ve finally decided that it can help their own bottom line.

Just as impactful, most “regular” docs and specialists have continued to provide virtual visits as an option alongside traditional face-to-face sessions. Consistent communication between patients and providers can make all the difference, especially in chronic care management. I’ve had more and better access to my GI specialists in the last few years than ever before.

It’s only quite recently that audio and video quality have gotten good enough to make telemedicine feel like “real” medicine. Thanks for making us push the envelope, COVID.

Takeaway: Better care and efficiency don’t have to be mutually exclusive.

So there we go. There are ways to make things better with technology, but you have to work within the context of reality, and they ain’t always that sexy. We don’t need more JSON or more standards or more jargon; we need more information and thoughtful integration. Just keep swimming!