Regulated software for software people

If you’ve built software at any scale, you know how the game works. You get requirements from somewhere — usually they’re wrong or at best incomplete. You do your best to implement and test them, and you ship. Users vote with their clicks as to what features work and which don’t — i.e., they refine the requirements for you — and then you repeat the process. Eventually you converge to a set of features that work, then you do it all over again with a new set of requirements.

If your cycle time is long, that’s called “waterfall” and folks judge you for it, which is sometimes fair but not always. If your cycle time is short, it’s called “agile.” Agile does have some advantages: user feedback gets incorporated more quickly, and doing things in smaller chunks generally results in fewer bugs. A lot of people have written a lot of boring religious articles about the differences here, but in reality most folks fall somewhere in the middle, and it’s usually fine. If you’re building the next Tinder or Candy Crush or whatever, that’s pretty much all the “process” you need to know.

But what if you’re building something for healthcare or another industry where the software is “regulated?” Oh my. “Regulation” is scary and mysterious, and people keep talking about going to jail. There’s a whole industry that as far as I can tell is built around paying protection money to consultants. So it’s not surprising that “regulated” in a software job description is a turn-off for lots of people. Still, it’s the cost of doing business for a lot of important things in the world, so let’s take a look at what’s really going on there.

An important caveat: I have never worked for the FDA or any regulatory agency. I’m not a lawyer. I’m just a guy who has written a bunch of regulated software and believes the advice I’ve received from the “regulatory industry” has been almost entirely crap (a very few people break that mold; you know who you are). My hope here is to give you a straightforward, clear-cut introduction so that you can enter into the process with enough confidence to avoid being bullied by silly hyperbole about going to jail or other craziness. The actual regulators I’ve known just expect you to do your best to build safe, reliable products, and they get that it’s a hard job. Understanding a few key things will not only keep you compliant, it will help you build better stuff. Honest.

The Big Secret

Most of my big-time regulatory work has been with FDA Class 1 and 2 medical devices. “Classification” is a risk stratification based on your “intended use” — a super-important bit of text that precisely defines what your device is supposed to do and how it’s supposed to be used. Nailing these down can be a fraught and expensive process all on its own.

But we’re getting ahead of ourselves here — a common problem in this space! In this piece I’m not going to go deep into the details of any specific regulation — because (cover your ears regulatory folks) they don’t really matter to you. OK, maybe they matter if you’re in a startup and you’re the one that has to actually do the filings … but that’s probably a horrible idea anyways. From a software perspective, pretty much all safety-focused regulation looks exactly the same and can be satisfied by adhering to a set of relatively small and, dare I say, pretty reasonable requirements.

This is a bigger secret than you might think. Thanks to the impenetrable nature of regulatory jargon and text, folks that can claim actual experience are in high demand. It’s to their advantage and job security to make it seem super complicated — nobody wants to go to jail, so we all just keep paying for vague explanations and double-talk. Of course that’s a broad brush —  but it’s more right than wrong.

Software first, not Regulation first

This is my biggest regret from my last regulated gig. I had stepped into a higher risk class of device, and we had hired a set of regulatory folks with no direct software experience. The company was very focused on agency approval, so there was a ton of pressure to get it “right.” Not good excuses, but it is what it is — I let our initial software processes be driven by regulation first. That’s not to say we didn’t build excellent software, because we did — but we did it with a very high burden on the team that honestly was mostly just wasted time and energy. I was lucky to lose only a couple of good people to the noise before we got things more-or-less straightened out.

Your job is to build great software. Finish reading this article, understand what you really need to be able to prove and document, talk to folks you trust, and then use your own software-focused best practices to meet the regs. It is the job of your regulatory team to take what you produce and “package” it into the right form for filings and auditors. This packaging takes work and a depth of understanding not many folks in the industry bring, so you’re probably going to get pushback. Stand your ground. Over time, it is for sure worth adding automation to generate different forms of documentation (this is where we finally ended up) and that’s great — but be confident that, if you execute well, you need not be bound by crazy redundant busywork.

Safety-based regulation in three bullets

Software regulation isn’t really intended to protect against bad or fraudulent actors — there are other laws for that. Instead, the point is to ensure that the specific risks and benefits associated with a product are understood and visible. From a software perspective, that means three things:

  1. You know what it’s supposed to do.
  2. It does what you expect.
  3. You’ve considered the risks.

The first two of these should look pretty familiar. #1 just says that you have a correct and detailed specification. #2 says you tested the software against that specification. These might need to be a bit more formal than you’re used to, but if you don’t have a good starting point, your product probably sucks anyways. You likely already use JIRA or some other system to track features/stories and bugs, so if you can generate the following reports you’re basically done with these first two requirements:

  • A list of features, each with sufficient detail to be implemented.
  • For each feature, a list of test cases that cover the feature.
  • For each test case, a record of each time it was executed and passed or failed.

Formal documentation of test cases and results — and especially links back to the specific features they exercise — can be pretty thin at many companies, where dedicated QA resources are hard to come by. If your tests are automated, a great start is just to log feature IDs along with each test case you run. Together with code coverage reports, that gets you a long way towards compliance. For manual test cases, you will need some way to keep track of things — I’ve used the Zephyr plugin for JIRA with good success, but there are tons of options out there.

Risk Assessment

Documented risk assessment (#3) is new to a lot of folks. The concept can take a bit of getting used to, because if you’re good at your job you’ve been assessing and addressing risks implicitly all along. Is there enough contrast to read this text in high-light situations? Will users understand what “accept” means in this situation? What happens if the user doesn’t scroll down to read the whole message? And so on. By the time you sit down for a “formal” risk analysis, you’ve probably taken care of most issues already.

And yet it’s a requirement to document a formal risk assessment for each feature. The best way I’ve found to manage this is to add a custom field to the requirements management system for risks, and ask folks to just make notes there along the way. Towards the end of the development phase, have the team sit down and clean them up and spend a bit of timing thinking about anything missed. That meeting tends to be pretty quick and actually serves as a nice double-check. Ultimately you’ll want to document four things for each risk identified:

  • What could happen.
  • The potential impact.
  • Some idea of how likely and how severe this would be (more on this below).
  • What you’ve done to mitigate or reduce the risk.

There are tons of rubrics for codifying “likelihood” and “severity” — red zone / green zone kind of stuff. I’m torn about these — there is definitely validity to the balance between risks that might cause actual physical harm but are so unlikely to occur that it would be silly to spend time on them, versus risks that have almost no real impact but could happen so frequently that it justifies extra work. But trying to get too precise is really quite hopeless — I’d just estimate low/medium/high on both dimensions and leave it at that.

Potential bugs are not “risks.” Of course bad code can cause all kinds of problems — but trying to capture that is a useless shell game. That “risk” applies to every feature, and the only mitigation is to develop and test better. Documenting this is useless. Software risks generally come down to user interface confusion, algorithms that break down given extreme inputs, that kind of thing. Honest, it gets easier once you do it for awhile.

Lastly, “documentation” or “user education” is a totally reasonable way to mitigate some risks. Sometimes something important is just complicated, and the user cannot be expected to understand how to use a feature without training and/or documentation. That’s OK! Just don’t use it as a crutch for bad design — your job after all is to build a helpful product, not an obtuse one. A trick that can increase the effectiveness of “mitigation by documentation” is to put the documentation directly into the user experience. For example, the first time the user clicks a particular button you might proactively pop up a dialog that can be dismissed once acknowledged.

The “Manufacturing” gotcha

Hopefully so far you’re feeling OK about all of this. A few tweaks to very standard practices and you’re pretty much capturing all the raw material you need. Woot! Ah, but wait.

Almost for sure you’re building modern software that runs as a service (in the cloud or otherwise), releasing new functionality on a regular basis — and that can make documentation a lot trickier. Maybe I should have mentioned this earlier, but I didn’t want to scare you off. Don’t worry, it’ll be ok.

Traditional medical devices are “things” — tongue depressors, MRI machines, cancer drugs, and so on. A great deal of up-front thought and effort and cost goes into figuring out what to make and how to make it. Prototypes are created. Factories and factory lines are set up. Raw materials are sourced. And then when you’re done, you flip a switch and stamp out thousands or millions of copies, exactly the same way, for years. Within that context, safety-based regulation makes a ton of sense. It expects to see “a” design record for each device. Auditors come in and ask to see “the” documents for a given product.

This worked ok when software shipped on a CD in a box. But when it runs as a service, updated and improved over many iterations in near real-time, things can get messy pretty quickly. Note this isn’t about “waterfall” vs. “agile,” it’s about frequent, incremental releases over time vs. one-and-done “manufacturing.”

My first stab at this didn’t work super-well. We basically just wrapped up each release, no matter how small, into its own set of documents — features, risks, test cases and results. When we did our first independent audit (internal, thankfully) the auditor asked for the documents for product X. I handed over dozens of these release packages and smiled confidently. We did a cool demo. They then said OK, you showed me this feature that does Y, where are the test results that prove it works? Seems like a reasonable request.

Yikes. Like almost every feature, this one had evolved over time. There were probably twenty stories related to it, scattered across dozens of releases, each one incremental, like “add option Z to the menu.” Was all the information there? Sure, you could figure it out if you really understood the product and had a couple of days to sort through it all. But answering that seemingly simple question in real time in the auditing room? No chance. And while I did say that it was the responsibility of your regulatory team to “package” your raw documents into something palatable to an auditor, this kind of synthesis is way too much to ask.

I’m sure there are many ways to address this issue, but we settled on something we called a “component document.” This was a single, authoritative, narrative document that could be used as the starting point for anybody trying to understand what the product did. It explained its purpose, the general approach to building it, and each major feature or feature area, assigning a unique identifier to each. The document was meant to be largely stable — that is, day-to-day features and bug fixes did not require changes to the text. An example might be a component-level feature that says “abnormal results will cause an alert to be sent to the medical team;” a corresponding release-focused requirement might describe specific alert conditions and channels for notification (like email). Adding SMS alerting would be a new requirement in a new release, but wouldn’t require updates to the component document.

By explicitly associating every requirement with a “component-level” feature, it became trivial to assemble coherent documentation packages. There were other benefits as well — for example, we found that if a requirement triggered a text change in the component document, it almost always warranted a full test pass rather than something more targeted. And the component document was a fantastic training vehicle for new engineers and even end users. It certainly isn’t always the case, but this time the regulatory framework really did directly help us improve our development process. Love that!

Almost there… honest!

At this point you should feel confident that you can build software in a way that satisfies the intent of safety-focused regulation. You understand and can explain what you’re building, you have tested it appropriately, you have assessed risks to health and safety — and you have the documents to prove it in an audit setting. This is really good, and frankly notably better than many self-claimed “compliant” software shops I’ve seen in real life. There is no orange jumpsuit in your future (at least for this reason).

That said, there are always more concepts in the regulatory framework you should be thinking about and evolving towards. None of these are all that challenging, and you should at least be prepared to explain to auditors how you think about them:

“V”erification vs “V”alidation

“V&V” is often used as a synonym for “testing” — which is pretty close, but obscures an important distinction between the two that you’ll need to address:

  • “Verification” ensures that features work as they are specified. It makes no judgment about whether the features do the “right” thing, just that they meet the spec.
  • “Validation” ensures that features do what users need them to do. They are really a test of the specifications themselves.

In an ideal world, verification tests are executed through automation and/or your engineering team, while validation tests are done by actual end users. In reality, most end-users aren’t qualified to do a good job, and you risk wasting time on “test theater” that doesn’t really prove anything. You’ll have to find your own way here; a reasonable approach might be to (a) make sure end-users are formally involved in the up-front process of creating specifications, and (b) label your test cases as “verification” or “validation” to show you’ve been thoughtful about both concepts.

Design Documents

Significant architectural decisions should be recorded in “design documents.” These are just engineer-focused documents in any form that help describe “how” the product is built. Think about the kind of documentation you’d like to show to a new developer on the team before they jump into code. Associating design documents with component-level features is a great way to keep a handle on how it all fits together.

Third-party software and/or “SOUP”

If your product incorporates COTS (“Commercial / Off The Shelf”) software, that also needs to be validated. Some vendors may be able to help you with this, and some may already have a base level validation that you can start from. But in most cases, you’ll want to show that the acquired software does what you need it to do. This is typically a “one-and-done” exercise where you (a) document your requirements and (b) write and execute tests cases to show the product satisfies those requirements.

This applies to third-party libraries you use as well, and even to your own internal software that may have been developed “way back when” without any documentation at all (sometimes called SOUP, for “Software of Unknown Provenance”). The same process applies — write some requirements, write some tests, run the tests, and have that documentation ready for auditors.

Surveillance

“Bugs found in the wild” is a fantastic measure of software quality (hint: fewer is better). Your regulatory team should be managing formal “complaints” (escalating to you as needed), but keeping track of which bugs were found post-release is a great practice that will serve you well. A quarterly meeting to discuss trends and identify problematic features shows that you’re taking it seriously, so keep meeting notes and be prepared to show a graph of incidents and their severity over time.

Approvals and Signatures

This is an area that really bugs me. Regulatory folks can get super hung-up on ink-based signatures and extreme measures to ensure that documentation is “tamper-proof.” Full stop, I think this is a waste of time. Regulation is not intended to stop a sophisticated bad actor — it’s supposed to help folks trying to do the right thing. The burden of security theater can be stupid high. My take:

  • The software you use to manage requirements and tests should require login and keep track of who creates/updates items in the system.
  • Don’t delete stuff; instead use “inactive” or “obsolete” statuses to keep irrelevant or mistaken entries out of everyday view.
  • Make sure that the appropriate people (especially end users) mark their approval of requirements and tests in the system by clicking a button or writing a comment, and be able to show a record of that.
  • Don’t go overboard.

A final note about audits and auditors

You’re never going to be “done” tweaking and evolving this stuff. Auditors are paid to find issues, and no matter how great you are, they’re going to find some. Don’t sweat it and don’t be defensive. Listen, create a plan to address what they find, and then — this is the real key — follow through. When that auditor comes back they’re going to assess your response, and the worst thing you can do is just ignore them. If you disagree, start a dialogue and you’re sure to find a reasonable compromise.

Bottom line — bureaucracy is bureaucracy, and there is for sure burden associated with complying with regulation. Some of that burden is stupid, and some of it helps. Believe it or not, the actual regulators really do understand this, and are always working to make it simpler (even right now). Your biggest challenge will be the “industry” of high-priced consultants who are incented only to keep you worried and paying their hourly fees. Don’t freak out. Put in a little work to understand the real intent, honestly work to incorporate the key concepts — and you’ll be just fine.