SMART Part 4: Healthcare data sucks, and FHIR is no exception

Welcome to the finale! While the full codebase has been available from the beginning, parts 1, 2, and 3 of the series have focused mostly on the logistics of getting a SMART app registered, authorized and running within the various EHR environments. After all that noise, you’d think you could just write your app in peace, but no. Why? Because healthcare data freaking sucks.

There are a few reasons for this, some legit and others not so much. It’s useful to understand the whys, so we’ll talk about them first. But if you’re going to write real apps, not just connectathon demos, you need a strategy for muddling through it, so we’ll do that too. With real code.

Health records are stories first

Consider your own health history for a moment. Even for the healthiest among us, there are childhood and adult vaccinations, a few broken bones, allergies that wax and wane, that weird skin thing you were embarrassed to ask the doc about, colonoscopies or mammograms… it’s a long list. If you’re part of the 60% of the population that manages a chronic condition, the list gets a lot longer. And it’s all important. The care you need today is a direct result of the care you’ve received in the past.

Now let’s make it concrete: tell me when you had your last tetanus shot. Maybe you have a patient portal somewhere with that data, if you remember which one and if you remember your username and if you had an hour to look for it. But in the ER today with a rusty nail stuck in your foot? Pick one: “When I was a kid;” “Not sure if I ever have had one;” “Two years ago when I was on vacation.”

At least currently, a booster is recommended if your last shot was more than five years ago (or ten if it’s a clean wound). So even with those vague answers, a doc can judge whether or not you need a shot — probably yes for the first two, and no for the last. Plus they can enter that into your record so that there is better information available for your next ER visit.

The point is that even “simple” medicine is hard, and any information is better than none when trying to make care decisions. Getting an extra tetanus booster is no big deal — but for other complex, difficult-to-diagnose conditions docs face every day, the stakes are much higher.

This is why medical dramas are always popular on TV. Every case is a detective story, piecing together life history and lab results and images and Medline searches until a consistent picture finally emerges. So while “got a rash from the flu vaccine” or “travelled to the rainforest as a teen” may not seem important in isolation, as clues to something larger they can be pivotal.

As I used to say a lot in my HealthVault days, comprehensive messy data trumps spotty clean data any day. This is uncomfortable for folks that create data models and apps. We like clean data. “As a kid” may be super-helpful to a doc wondering when you got the measles, but it’s (nearly) useless for a model trying to predict the optimal age to get a shingles vaccine.

We make it worse

It’s tough to build data models (and APIs) that support the narrative, messy data needed to deliver comprehensive care and serve as the basis for compute-based solutions, that’s for sure — and I certainly don’t have a silver bullet to offer. But again and again, the formats we create just make things worse by punting the issue. Everything can be null, everything can be a list, controlled vocabularies are just recommendations, you get the idea. Ironically given this series, I was and remain pretty blasé about FHIR itself because it is exactly the same. Sure it’s easier to parse and the documentation is excellent, but at the end of the day, pretty much the entire burden of interpreting the data gets dumped on the app developer.

To wit: pretty much every app in the world needs to display the patient’s name. These apps aren’t name experts. They just want the freaking name to show on a web page. But in the FHIR patient record:

  • There is a list of names (and it’s nullable).
  • Each HumanName entry in the list has a nullable NameUse constant, but the standard doesn’t have explicit priority rules for picking the best.
  • Each entry has a nullable Period, which if present might indicate that the name is no longer in use, after some more parsing work and simple but finicky computation.
  • The structure has a “text” element intended for display … but it’s nullable.
  • The family name used to be nullable array, but in later versions is a nullable singleton.

You might say that it’s the job of a client library to wrap up this complexity and make it usable. And to be sure, that’s exactly what I’ve tried to do with SmartEhr (more on this in a minute!). But that’s just a band aid over a lazy API. And because typically APIs support multiple client libraries across languages, you’re pretty much guaranteed that decisions about how to interpret the data are going to diverge between them. What a mess. Left to a developer looking to get through to the “real work” of their application, it can get ugly fast:

String name =;

This is real code I’ve seen out there. It might look good at the connectathon with fake data, but that silly little line has two potential NullPointerExceptions and one IndexOutOfBounds expression, and as a kicker may have picked the wrong entry. Sweet.

At this point in the program I’d like to remind you that we’re just talking about getting the patient’s name.

Anyways, I can complain all I want — but the API ain’t changing, so our only option is to build in some of this developer-friendly functionality at a higher level. Doing this well is quite tricky, because by definition the code is, to use the popular jargon, opinionated. Patient.bestName makes choices:

  • We interpret a null period to mean the name is active.
  • We “prefer” names in the order they appear in the HumanNameUse enum. (Even looking at this just now I’m wondering if I should have moved “anonymous” to the top of the HumanNameUse enum, since it’s intended to be privacy-protecting.)


Making these helpers as clean and maintainable as possible is the job of — a collection of structures that parse raw FHIR resources from JSON and then selectively add helpers to improve the developer experience. This breaks down into three main bits:

  1. Defining the structures based on the specification.
  2. Adding FHIR-specific parsing logic in the form of JsonDeserializers.
  3. Designing and writing opinionated helpers.

Unlike comprehensive FHIR mappings like HAPI, SmartEhr is parsimonious about which resources it codifies and which fields within those resources are parsed out. This hurts a little, because I’d like to be helpful to everyone. But it takes careful thought in the context of a use case to do a good job, so I decided it was better to leave expansion to the future. If you add a resource or helper, please send me a pull request, and/or let me know if I can help think it through.

Defining the structures is relatively simple. Using the excellent FHIR reference material, just create a Java object with public fields that you care about associated with the types you want those fields to be. Sometimes the right type is obvious and sometimes it’s a little more complicated — the key is to pick something that is going to be the most useful to a developer. This is a hierarchical exercise, as you may have to define new “primitive” structures along the way.

The FHIR standard has evolved over the years. But because the changes from DSTU2 to R4 have been pretty subtle, I’ve been able to mostly abstract them away for developers. This provides nice flexibility if an EHR hasn’t stayed current, but may end up being too much of a hassle to be worth it long-term — we’ll see.

When it comes to parsing the json into these objects, I can’t say enough good things about the Gson engine — SmartEhr uses it extensively throughout. Each top-level structure is given a static “fromJson” method that basically asks Gson to do the work (e.g., Patient.fromJson). We add two bits of goodness to this process ourselves.

First, we do a bit of post-processing on the parsed object to make it easier to use. Primarily we use this to sort lists in order of descending priority (where “priority” is something I’ve just done my best to assess where the standard isn’t forthcoming). You could imagine taking this post-processing further, for example adding dummy elements to empty lists to avoid the need for annoying null checks, but I haven’t gone there.

We also add parsing logic to patch some FHIR quirks and novel types. This logic is implemented as a set of JsonDeserializer classes attached to our Gson parser:

  • The “date” and “dateTime” FHIR types are parsed into LocalDate and ZonedDateTime respectively. This is a classic optionality situation; each FHIR subject allows for increased precision (e.g., YYYY, YYYY-MM or YYYY-MM-DD for dates) based on what is known. Since developers most often want to compare dates, this is hard to work with — these deserializers default missing values to get to something reasonable. Be careful here though! “1990” isn’t really the same as “1990-01-01” … use case matters.
  • For the Condition type, it’s easier to deal with ClinicalStatusCode and VerificationStatusCode as enums rather than structures, so we clean them up. Similarly, we generate a clean set of ConditionCategoryCodes to reduce developer friction.
  • In a few places, string arrays have been changed to strings between versions. The LaxStringDeserializer makes that go away by space-concatenating arrays if needed.

These steps get us pretty far — the objects are clean and validated and in many cases can be used as-is. But as we discussed earlier, there’s still a ton of room for interpretation. The helpers we’ve added to the structures wrap this up, and so far fall into a few buckets:

  • Picking of the “best” from a nullable list of items, such as in Patient.bestAddress or bestName. This is generally a combination of ordering the “use” constants and checking for valid periods (Period.current).
  • Condition.bestGuessOnset is a particularly interesting one, and my least favorite helper implementation. Onset may be recorded as a dateTime, but the standard also allows it to be given as an Age, Period, Range or free-text String. I really should be at least trying to use those alternate types, but as yet have not.
  • Weeding through obsolete or erroneous entries, as in Condition.validAndActive. Of particular note here is the validation status — if something is “entered_in_error” it is typically not removed from the record, just marked as such. These can easily lead a naïve developer astray if not handled.
  • Sometimes it’s just nice to have an easy way to display items onscreen; helpers like Address.display and HumanName.displayName take care of this.

SmartTypes is definitely just the seed of a complete developer-friendly data model for this stuff. But I hope that the general pattern will make it easy for myself and others to grow it as needed.

Enough Said

Well, I think that about does it. These posts have been dense and wonky, but I’ve tried to include enough color that readers come away with two uber-points:

  1. SMART on FHIR (not just FHIR) is a transformative technology for clinical care. There is no better way to get your software innovation into clinical practice.
  2. SmartEhr and SmartServer are simple, production-class, almost dependency-free, license-free libraries that can help you build your SMART on FHIR app quickly and with a minimum of hassle.

Please let me know how I can help. Send me bugs (they are surely there) and pull requests. I would be super-excited to see apps based on these libraries make it into the EHR app stores. Go do that!

*** A last note and request (this will show up at the bottom of each article in the series). I’ve spent a lot of time in this industry, and the systemic impediments to progress and innovation can make even good folks feel hopeless sometimes. I really, truly believe that SMART is one of those rare technologies that has matured at exactly the right time to change the game. But there’s no guarantee — not enough folks know about it, and it’s too hard to use. If you swim in this pool, please help me fix that:

  1. Share these articles with folks that use and implement EHRs. Tell them to look at the “app store” for their system and add an app to their test system. Tell them to ask vendors if they have a SMART interface to their solution.
  2. Share these articles with folks that build care delivery solutions. Explain how they can use SMART to add functionality for customers without a custom login and without having to do an integration project with custom IT teams.
  3. Contact me if I can help. There’s a form here on the website, and I’m @seanno on Twitter, or use LinkedIn, or whatever. I’m happy to answer questions, make some connections, and heck I might even write some code for you if it makes a difference.