Turtles all the way down

Every business is a process shop — a tangled mess of human and automated activities that work together to produce something folks are willing to pay for. And even as AI starts to handle specific jobs and tasks, that inherent complexity doesn’t go away.

Enterprise software is the glue that keeps the machine running, and if you’ve ever been on the hook for it working correctly, you’ve implemented some kind of monitoring solution. Log processors, web pingers, process monitors, “on call” scheduling — there’s an entire industry of software that just watches other software (and people, and AI) to make sure all is well.

And yet, we still get surprised by catastrophic failure — the backup that we thought was happening every night; the SSL certificate we swore was on auto-renewal; the battery-operated door lock that failed-open over the weekend; the ETL job configured to run under that retired guy’s account.

So what do we do? Monitor the monitors, of course. But what if they fail? One of my favorite old saws is turtles all the way down — it seems like there’s no bottom to this stack! That’s why, everywhere I’ve ever been, I’ve built something like backstop. Even in my retired life, it’s an essential tool.

Backstop

The idea behind backstop is to have one authoritative, affirmative check on your world, generally once per day (I run mine about 4am). Backstop is the heartbeat of your enterprise, showing up on schedule with a single, consolidated, explicit, proactive look at everything that matters.

One person needs to expect the backstop email every morning. If it doesn’t appear, silence is not golden: find out why. If it shows errors or warnings, find out why. If anything looks funny, track it down. (Honestly, this person should be your CTO or CIO — nothing else gives better intuition for “how it’s going,” and that awareness is gold.)

This doesn’t obviate the need for any of your other monitoring and alerts — they are more timely and more detailed. Backstop is an assurance that the machine is working and that nothing is falling through the cracks. A good backstop has four critical properties that need careful attention:

1. Bulletproof

The most important feature of a good backstop is that it finishes and reports. Every exception needs to be caught; every hung request needs to timeout. Each metric you’re measuring needs to be checked independently — failure of one check cannot stop evaluation of another (for example, here and here).

This is easy to get wrong, especially because you’re likely to be relying on a bunch of third party libraries to monitor proprietary services and apps. That’s why the human element is absolutely critical. If the backstop email doesn’t show up on schedule, a real person needs to notice and they need to fix it.  

2. Complete

Asking a human to check in on dozens (hundreds) of independent subsystems is untenable; a backstop fixes this by creating one single tip to the spear. For that to work, it needs to be a complete look at your environment.

Your best friend in this endeavor will be something like ProcessResource — a type of checker that can run an arbitrary sub-process. While I’m the first to advocate for limited dependency, reality is complex and so is your environment. You surely rely on some system that only has a node or python client library, and another that has its own native client, etc. etc.. In the backstop use case, completeness is more important than consistency, so hold your nose and script away.

It’s also important to evolve “completeness” over time. Of course adding support for new systems, but also catching up old ones. Unless you suck at your job, most outages reveal new failure modes — adding new checks to your backstop should be a routine part of your post-mortem process.

3. Clean

Nothing spikes my blood pressure quite like somebody saying “oh that happens all the time, we just ignore it.” It’s not just lazy, it’s corrosive — not only will your “real” alerts get lost in the noise, but the “ignorable” ones are almost always worse under the covers than you think.

You have to be able to see what’s wrong. If you’ve accidentally coded a bad metric, change or remove it. If something is time-bound — e.g., you’ve already set a plan to fix it on a specific date in the future — implement a pause that wakes up if that date is missed. But under no circumstances can you allow errors and warnings to persist over time. Please trust me on this.

4. Actionable

Last — every resource you track should include a link that gets you to the right place to investigate and learn more. By definition a backstop problem is an exception, which means a disruption to your carefully-curated calendar of stupid meetings. It’s imperative that you can dive in quickly and figure out what’s up.

This link can be a lot of things — a more detailed look at the resource itself; a pre-filled form to open a trouble ticket; a diagnosis cookbook on a wiki; whatever works. But especially if you have a junior or specialized engineer looking at the backstop error list, knowing where to start can make a huge difference.

Important Metrics

Age / Activity

This is probably the most important backstop metric, because it’s the one that is most often missed by traditional monitors. Some process (or a monitor!) just stops working, but we don’t notice until it’s too late, because silence seems golden.

These failures also tend to create the worst headaches, because they cause damage over time. Backups that don’t get done, key indicators missing critical inputs, that kind of thing.

Trigger Dates

A bunch of processes happen on what I call the “slow clock” — stuff you have to do every quarter, every year or even every few years. In my retired life these are things like renewing my driver’s license or cleaning the air filter in my furnace. In enterprises they’re more like audits, disaster recovery exercises, and domain renewals. Calendars help with these, but slow clock reminders can get lost amongst daily meetings and more immediate events.

Levels

Things rarely fall apart overnight — they slowly degrade, unnoticed, over time. Smoke alarm batteries are a great example, and the water level in our community storage tank.

When these alert at night or in the middle of the day, busy humans tend to ignore them (“I’ll get to that later”). But as a backstop metric, they become visible in the right context — at the right time, together with other outstanding issues.

Availability

This is the OG monitoring classic: is the web server responding? And if you’re fancy, can you perform basic tasks like login or search? These aren’t usually the most important backstops, but they can be useful checks, especially for lesser-used services that otherwise are ignored until the moment they become critical.

My Backstops, aka Code is All That Matters

I’ve written my own backstop harness because, well, I get to choose. I actually don’t know of a commercial or open source tools that really does this job, but there probably is one. Mine is written in Java; it’s free to use and modify on Github. If you’ve got a system with git, java and maven installed you can try it out like this:

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox
mvn clean package install
cd ../backstop
mvn clean package
java -cp target/backstop-1.0-SNAPSHOT.jar \
    com.shutdownhook.backstop.App \
    config-demo.json PRINT

You’ll see a bit of log output but then most importantly a couple of lines like this:

OK,Google,,2138 ms response
OK,Proof of Life,,I ran, therefore I am.

The “demo” configuration file contains two resources: one that simply reports back “OK” and one that checks availability of https://google.com. The “PRINT” argument tells the app to just output to console rather than sending an email.

What’s Going On Here

The code is pretty simple, and purposefully so — its job is to be rock-solid and always, always, send an email at the end. Plus, we want to collect as much useful information as we can, so failures in one resource can’t impact the others.

Configuration starts with a list of “resources”, each defined by a name, url, java class name and map of class-specific parameters. A resource class must implement the Checker interface, doing whatever it needs to and returning results as zero or more Status objects, where zero means all is well. Checkers also have access to a convenience object offering common services like web requests and JSON management.

In the normal case, the entrypoint in Backstop.java just: (1) uses an Executor pool to tell the checkers to do their things; (2) collects and sorts the Status responses into a single list with the worst offenders at the top; and (3) Uses Azure to send an HTML email with the results.

Again, you’ll notice a ton of defensive code throughout — Backstop is a special snowflake.

Not counting my favorite existential DescartesResource, so far I’ve implemented five resource checker types for my personal use:

TriggerResource

This resource type reads “slow clock” events out of a Google Spreadsheet and alerts when deadlines are approaching or past. The best way to get a sense of this is to look at a few items from my household triggers:

My dog Copper needs his flea and tick pill once a month and we always used to forget. I’m secretary of our community HOA on Whidbey and that means some paperwork every year. My beloved electric boat has old-school batteries that need topping off once in awhile, and my license is going to expire next year.

The trigger resource code simply loads up rows from a spreadsheet like this and checks to see if each due date is past (ERROR) or upcoming within an optional WARNING period.

While many of these are recurring, the sheet isn’t smart about that. Once a row “fires”, the only ways to turn it off are to edit the spreadsheet (using the link from the backstop email) and change the “Due Date” to the next occurrence OR add a “Snooze Until” date.

Snooze is useful for things like my license — I set up my appointment for next month, so until then there’s no reason to pollute my backstop list. As simple as this is, I find it pretty transformational. Adulting is chock full of stupid things you’re supposed to remember — maybe you’ll get a reminder or maybe not. A backstop trigger list is the perfect security blanket.

Sending Email

I’ve chosen to use Azure Communication Services to send the backstop email. SMTP used to be so easy — but that was before spam and phishing and all the other nasties that took advantage of its simplicity. These days, reliably sending email that doesn’t land right in the Junk folder is a big hassle. Azure makes this pretty easy, and it’s dirt cheap — less than a dollar a year for once-a-day emails!

I don’t love the dependency, but it seems like the right balance.

Deployment and Logistics

The “last mile” for backstop is deciding where it should run and how it should be triggered. It is not a resource-intensive operation, so the old school option isn’t a bad one: dedicate a single small server or VM to the job, triggered with cron once a day. Sorted!

But this simplicity does come with a big downside — patching that server and keeping it up to date. In an enterprise you may already have good infrastructure for this, and if so go for it. But in my world, servers left on their own tend to decay over time.

I’ve tried to avoid this by using a couple of Azure services to do the job for me. The first I like a lot — the script docker-build.sh creates a container that runs in Azure Container Instances without a dedicated server. The container does its thing and then shuts down, so it’s also dirt cheap, just pennies a month.

That leaves just the cron part — something has to trigger the container to run every morning. I’m pretty surprised this isn’t just part of ACI, but it’s not. The solution I landed on is a timer-based Azure Function. My function uses a cron-style schedule to run each morning, scripting a start to the proper container.

This was a bear to get right. I’m not going to let myself spiral into yet another rant about how poor the Azure developer experience can be — just know it is rubbish. You know who really helped out here? My good friend Claude; way better than any Azure help resource I could find. Whew.

There’s Always Another Resource

I have a pretty long list of resources I’m planning to add to my backstop:

  • FLO whole-home water shutoff
  • Various GE appliances, in particular for rinse-aid in the dishwasher (finally we’re getting down to the real problems)
  • Tesla Powerwall and Enphase panels/inverters                   
  • More shutdownhook demo apps
  • The Rivian!
  • Electric, water and gas usage
  • … and on and on …

Our lives and our enterprises are pretty complicated — and every new piece of smart technology that seems (and is) so great carries its own tax. Servers, services, accounts, batteries, it adds up. To keep things humming you really do need a backstop. A single tip of the spear from which all of the mess can be corralled and observed. I hope you’ll give it a try — with my code or your own. Until next time!

AI Models: 50 First Dates

Back in 1987, Dartmouth required each incoming freshman to have a Macintosh computer. This was unheard of at the time — the whole campus (including dorm rooms) had network taps, there was a huge bank of laser printers you could use for free, the school had its own email system, and live chat wasn’t just a curiosity. It was awesome.

When I met my partner of now 30+ years, she was working at the campus computer store, and one of her jobs was to help people buy and install additional memory for their machines. This was a laughably complex job including, amongst other things, knowing that:

  • You had to install chips in a specific order in specific unlabeled slots;
  • You usually couldn’t just add one chip, you had to add them in pairs;
  • Depending on the computer, you might have to cut (yes physically cut) resistor leads on the motherboard. Or if you were lucky, flip some tiny barely-labeled jumper switches;
  • All of this after opening the case with a set of custom tools straight out of 1930s dentistry.

I mean seriously, don’t miss this page-turner from Apple circa 1992. And that was just the user-level stuff — developers were presented with tedious and finicky concepts like “handles” that enabled the system to optimize its tiny memory space.

Jump to today and barely anybody thinks about RAM. Processors typically use 64 bits to store memory locations, which is basically infinite. Virtual memory swaps still happen, but they’re invisible and handle-type bugs are gone. I can’t even remember the last time I cracked open a laptop case.

Anyhoo, my point here is that there was a time when we knew the state-of-the-art wasn’t good enough, but we didn’t have a great answer to the problem. Creative solutions were ridiculous on their face — once again I refer you to this documentation — but people kept feeling their way around, trying to make progress. And eventually, they did. All the inelegant and inconvenient hacks were replaced by something simple and qualitatively, not just quantitatively, better.

Frozen in Time

Today, large AI (ok, LLM) models have a problem that’s eerily similar to our late twentieth-century RAM circus. And it also involves memory, albeit in a different way. Trained AI models are frozen in time — once formal training stops, they stop learning (basically) forever. Each session is like 50 First Dates, where Lucy starts the morning oblivious to what happened the day before.

The big issue is money. It’s expensive to simulate an analog brain in a digital environment! The 86 billion neurons in our brains form 100 trillion connections, a combination of pre-coded genetics and a lifetime of plasticity. Digital systems crudely mimic this with huge grids of numbers representing the strength of synapse connections. These strengths (or “weights”) are initialized at random, then iteratively adjusted during training until they assume useful values.

Training takes zillions of iterations — lots of time and lots of electricity and lots of money. But it turns out that, once a model is trained, asking questions is pretty darn efficient. You’re no longer adjusting the weights, you’re just providing inputs, doing a round of computation and spitting out results.

TLDR — the models that we use every day are the static result of extended training. They do not continue to learn anything new (except when their owners explicitly re-train). This is why early models might tell you that Biden is president — because he was, when the model was trained. Time (and learning) stops when training is complete.

Not Like Us

Now, I’ve been outspoken about this — I think LLMs are almost certainly sentient, at least to any degree and definition that matters. I get particularly annoyed when people say “but they don’t have a soul or feelings” or whatever, because nobody can tell me what those things actually are. We’re modeling human brains, and they act like human brains, so why are we so convinced we’re special?

But at least in one way, there is an answer to that question. Today’s AI models don’t continue to learn as they exist — they’re static. Even today at the ripe old age of 56, when I get enough positive or negative feedback, I learn — e.g., don’t keep trying to charge your Rivian when the battery is overheating.

This is a core property of every living creature with a brain. We’re constantly learning, from before we’re even born until the day we die. Memories are physically stamped into our biology; synapses grow and change and wither as we experience the real world. It’s just amazing and wonderful and insane. And it’s why we can survive in a changing world for almost 100 years before checking out.

But today’s models can’t do this. And so, we hack. Just like in those early RAM days, folks are inventing workarounds for the static model problem at an incredible pace, and many/most of these attempts are kind of silly when you step back. But for now, we are where we are — so let’s dig in a bit.

Back to School: Fine Tuning

Fine tuning just means “more training” — effective for teaching a model about some specific domain or set of concepts that weren’t part of its initial run. Maybe you have a proprietary customer support database, or you want to get really good at interpreting specific medical images.

The process can be as simple as picking up where the initial training stopped— more data, more feedback, off we go. But of course it’s expensive to do this, and there’s actually a risk of something called “catastrophic forgetting,” where previously-solid knowledge is lost due to new experience.

More commonly, fine-tuning involves tweaking around the edges. For example, you might alter the weights of only the uppermost layers of the network, which tend to be less foundational. For example, lower level image processing may detect edges and shapes, while upper levels translate those primitives into complex figures like tumors or lesions.

Folks have also been experimenting with crazy math-heavy solutions like low-rank adaptation that using smaller parameter sets to impact the overall model. Don’t ask me how this really works. Math is hard; let’s go shopping.

In any case, none of this changes the fundamental situation — after fine-tuning, the model is still static. But it does provide an avenue to integrate new knowledge and help models grow over time. So that’s cool.

Retrieval-Augmented Generation

Another way of providing new data or concepts to a model is Retrieval-Augmented Generation (“RAG” — these folks love their acronyms). In this approach, models are provided the ability to fetch external data when needed.

The typical way “normal” folks encounter RAG is when asking about current events or topics that require context, like this (see the full exchange here or here):

I use Anthropic Claude for most of my AI experiments these days and have allowed it access to web searches. In this conversation you see the model looking for current and historic information about wildfires near Ventura, then drawing conclusions based on what it finds.

Model Context Protocol, Take 1

These days most RAG tools are implemented using Model Context Protocol, an emerging standard for extending AI models. MCP is a lot more than RAG and we’ll talk about that later, but in its simplest form it just provides a consistent way for models to find external information.

What’s really interesting here is that the models themselves decide when they need to look for new data. This is seriously trippy, cool and more than a bit freaky. As a quick demonstration, I MCP-enabled the data behind the water tank that serves our little community on Whidbey Island.

I’ve implemented the protocol from scratch in Java using JsonRpc2 and Azure Functions. I could go on for a long time about how MCP is bat-sh*t insane and sloppy and incredibly poorly-conceived — but I will limit myself to comparing it to those early Macintosh RAM days. Eventually we’ll get to something more elegant. I hope.

Anyways, MCP tools of this variety (“remote servers”) are configured by providing the model with a URL that implements the protocol (in my case, this one). The model interrogates the tool for its capabilities, which are largely expressed with plain-English prose. The full Water Tank description is here; this is the key part:

Returns JSON data representing historical and current water levels in the Witter Beach (Langley, Washington, USA) community water tank. Measurements are recorded every 10 minutes unless there is a problem with network connectivity. The tank holds a maximum of 2,000 gallons and values are reported in centimeters of height of water in the tank. Each 3.4 inches of height represents about 100 gallons of water in the tank. Parameters can be used to customize results; if none are provided the tool will return the most recent 7 days of data with timestamps in the US Pacific time zone.

Other fields explain how to use query parameters. For example, “The number of days to return data for (default 7)” or “The timezone for results (default PST8PDT). This value is parsed by the Java statement ZoneId.of(zone).” Based on all this text, the model infers when it needs to use the tool to answer a question, like this:

Access the full exchange here or here.

*** IMPORANT ASIDE *** If you look closely, you’ll notice that the model seriously screwed up its calculation, claiming a current tank volume of 4,900 gallons, when its maximum capacity is actually 2,000. If you click the link to the full exchange, you’ll see me call it out, and it corrects itself. This kind of thing happens with some regularity across the AI landscape — it’s important to be vigilant and not be lulled into assumptions of infallibility!

This is an amazing sequence of events:

  1. The model realized that it did not have sufficient information to answer my question.
  2. It inferred (from a prose description) that the Witter MCP tool might have useful data.
  3. It fetched and analyzed that data automatically.
  4. It responded intelligently and usefully (even with the math error, the overall answer to my question was correct). Pretty cool.

Large Context: Windows

Folks are also trying to help models learn by providing extra input in real time, with each interaction. For example, when I ask Claude “How would you respond when a golfer always seems to hit their ball into sand traps?” I get a useful but clinical and mechanical set of tips (see here or here). But if I provide more context and a bunch of examples, I can teach the model to be more encouraging and understanding of the frustrations all new golfers experience:

Access the full exchange here or here.

Now, providing this kind of context (known as multi-shot prompting) every single time is obviously stupid. But, for now, it gets the job done.

Early models had small context windows — they just couldn’t handle enough simultaneous input to use a technique like this (ok my little contrived example would have been fine, but real-world usage was too much). But these days context windows are enormous (Claude is currently in the middle of the pack with a 200,000 token window, where each English word corresponds to roughly 1.5 tokens).

Large Context: History

Say we’re at the market and they have a sale on bananas. You ask me if I like them, and I say no, they are gross (because they are). When we move to the bakery, you’re not likely to ask if I want banana muffins, because you remember our earlier interaction.

As we know, AI models can’t do this — but they can simulate it, at least for sessions of limited duration (like a tech support chat). We simply provide the entire chat history every time, like this:

Models are fast enough, and have large enough context windows, that we can do this for quite a long chat before the cost really kills us.

But eventually it does — and so we keep hacking. One technique is to ask the model itself to summarize the chat so far, and then use that (presumably much shorter) summary as input to the next exchange. If the model does a good job of including important ideas (like my distaste for bananas) in the summary, the effect is almost as good as using the full text.

Even this has limits. When the session is over, the model snaps right back to it’s statically-trained self. At least Lucy had that VCR tape to help her catch up.

Model Context Protocol, Take 2

We’ve already seen how MCP helps connect models with external data. But the protocol is more than that, in at least two important ways:

First, MCP enables models to take action in the real world. Today these actions are pretty tame — setting up online meetings or updating a Github repository — but it’s only a matter of time before models are making serious decisions up to and including military action. That’s far beyond our topic for today, but don’t think for a moment it’s not part of our future.

Second and more relevant to this post, MCP is intended to augment the innate capabilities of the model itself — we’re already seeing MCP tools that increase memory capacity beyond internal context windows.

MCP is stateful and two-way. The model asks questions of the MCP server, which can turn around and ask questions of the model to clarify or otherwise improve its own response. We’ve never been so close to true collaboration between intelligent machines. It’s just, for now, an ugly bear of spaghetti mess to get working.

What an amazing, scary, privileged thing to being living through the birth of artificial sentience. But as always, it’s the details that make the difference, and we’re in the infancy of that work. Impressive as they are, our models are static and limited — so we hack and experiment and thrash, trying to figure out where the elegant solutions lie. We’ll get there; the seeds are somewhere in the chaos of fine tuning, context windows, RAG and MCP.

Until next time, I highly recommend you check out Lucy’s story — it’s fantastic.

SQL Hammer (everything is a nail)

Update: SQL Hammer has been updated somewhat extensively since this post was written. It’s still a great introduction and contains good detail, but if you just want to use and understand the latest version of the app, start here: https://github.com/seanno/shutdownhook/tree/main/dss#readme.


Everyone loves to talk about code, but it’s really databases that run the world. And not nerd-chic graph databases, or vector databases, or object stores, or whatever hyped-up new hoohah that Meta created this month. I’m talking about old school, relational, SQL databases. Every (every) (every!) company uses them. And for good reason — tables and columns and relationships do an excellent job of modeling things and processes in the real world.

To be fair, there are solid specialized use cases for the hoohas too. But SQL is and will remain ubiquitous because it does its job. I’m actually of the opinion that everyone should learn SQL, not just because it’s a work superpower (I’m looking at you, Tracy) but because normalization is a useful strategy for thinking about all kinds of problems. Actually, that’s another article I should write — but not right now.

The “Direct SQL” Problem

Today, I want to share a little tool that fills a technology gap I’ve found at every company I’ve ever been at — an access-controlled, auditable way to directly execute ad hoc SQL queries for data analysis, troubleshooting, workflow support and data repair.

See, in the “normal” course of business SQL is an infrastructure component — not something folks interact with directly. Enterprise applications provide higher-level end-user and system interfaces that use SQL “under the covers” as a way to store and model data. These applications add important controls on top of the raw data store: making sure that access restrictions are enforced, applying business rules to keep things consistent, generating logs to support auditing requirements, and so on. All good stuff.

But inevitably, something happens that requires people to venture under those covers. Maybe the application just has a bug. Or more commonly, something needs to happen in the real world that the application wasn’t designed to handle — updating the shipping address for an order after it was placed, reprocessing a job after a network interruption, that kind of thing. Perhaps somebody needs metrics for a presentation and the data warehouse doesn’t have the right data. There’s always something.

Easy enough for somebody who knows SQL. Open up a database client or your SQL-aware IDE and you’re good to go. But there’s a problem! Every one of the guardrails provided by application logic are gone. No access control, no auditing, no double-checking that your query doesn’t (oops) accidentally delete a bunch of records. Not only that, but direct database connections typically require access to a machine running directly in a production network — another opportunity for things to go wrong.

It’s easy to say that this “shouldn’t” happen, but that’s just naïve, wishful thinking. Best to just acknowledge reality and at least put some controls in place that minimize risk and maximize capability. That’s exactly what SQL Hammer does.

Try it live!

Before we install anything, let’s take a look at SQL Hammer running on my trusty server at https://shutdownapps.duckdns.org:7083/. I’m assuming you know the basics of SQL syntax. If not, this app probably ain’t much use to you anyways. Log in with any GitHub account and you’ll see two entries in the “Connection” dropdown: “EV Counts in Washington State” and “Scratch DB for Demos etc.” First choose the EV connection, which contains information about EV sales in Washington State (sourced from data.wa.gov) in a single table “evs.”

Running Queries

Your account is granted read-only access to this connection, so you can’t run or save arbitrary SQL statements — you can only run queries that others have created and marked as “shared.” This feature makes direct SQL data safely available to less technical folks in your organization. Click on “EV counts by model year,” then the Run button. Magic!

The “Open as URL” button creates a link that can be bookmarked or shared to provide direct (but still authenticated) access to this query. “Save as CSV” downloads the data for further processing in a spreadsheet app like Microsoft Excel.

Queries can also be more dynamic, requiring user-provided input at runtime. Select and run “EV counts by make in year;” you’ll see that the target model year is listed as a parameter in the results area and applied to the query results. The default is 2023, but you can change this and “Refresh” to look at another year. Parameters are super-powerful, especially for shared queries.

Creating Queries

Now choose the “Scratch DB” connection from the dropdown at the top. In this account you’ve been granted full SQL access, which is a little scary for me, because TTP is a real phenomenon. Please behave yourself! Also, the database resets itself overnight so don’t expect your changes to stick around. Anyhoo.

First click the “New” button, then enter and run a query like “select * from stuff.” Give the query a name and click “Save” so you can use it again later. The query will only be visible to you unless you check the “Shared” box, in which case it can be run (but not edited) by anyone with access to the connection.

Add a parameter to your query by inserting a ? in the query itself, and providing a name in the parameters box — for example, “select * from stuff where label like ?” and a parameter “search”. Run the query again and you’ll be prompted to enter a search string — try something like “yo%” to see wildcard matching live and in color. Woot!

The parameters list should be comma-separated, and each entry in that list must have a corresponding ? marker in the query itself. SQL Hammer doesn’t do a lot of verification on this, so it’s easy to create syntax errors if you’re not careful. A parameters entry can have a default value by adding a colon to the name followed by the value. For example, try “insert into stuff values (?,?)” with a parameter list like “label:mylabel,num:0”.

More than just SELECT

You may have noticed that the query above was actually an insert statement, not a select. This is fine! Assuming you’re configured for write access to a connection, you’re free to execute DML or even DDL statements. You can share these queries too, which can be a great way to capture data from users when combined with parameter lists.

Auditing

With most direct SQL solutions, figuring out who ran what query, when, with what parameters, can be nearly (or actually) impossible. Not only is this just a practical business problem, in many cases it can be a legal one — especially for folks working in regulated environments like healthcare or banking. SQL Hammer to the rescue! If a connection has its “log_queries” column set to 1, every query statement and parameters is logged to disk, together with the user email that executed it. Preserve this logfile and you’ll have no problem satisfying your auditors.

Of course, these logs may now include highly sensitive information — so you’ll have to put processes in place to protect them from prying eyes or tampering. A pain, but well-worth the benefits.

Installing SQL Hammer

SQL Hammer is 100% open source; the code is on GitHub and building it yourself is pretty straightforward. I’ll talk more about that later, but first let’s walk through a binary installation. I’ll be using Linux, but it’s all pure Java so no reason you can’t use Windows or a Mac if you prefer (do let me know if you run into any problems).

First make sure you’ve got a JRE v11 or above. You can check this by running “java -version” at the command line. Next, download sqlhammer-v1.zip from GitHub and unzip it to a directory on a machine that has network access to the database(s) you care about.

Execute “run.sh” in this directory, and point a browser on the same machine at https://localhost:3001. You’ll have to approve the self-signed certificate, but from there you should be prompted for a GitHub login and be on your way! Logs will be written to nohup.out in the same directory; that’s a good place to look for errors if something doesn’t work.

Very cool — but to run “for real” you’ll want to be sure your configuration is correct and secure, so read on. Don’t skip this!

SSL Certificates

The default config.json uses a self-signed localhost certificate for HTTPS. You’ll need to update the SSLCertificateX values to point at your own certificate and key files, using the same same format as Apache’s SSLCertificateFile and SSLCertificateKeyFile. If you really, really want to run without HTTPS you can just delete these values, but that’s probably a bad idea. Remember to set the “Port” value as desired as well.

OAuth2 Provider

As you’ve seen, the default is set up to authenticate users with GitHub accounts, using a stub OAuth2 application registered to my account (seanno). For obvious reasons, you’ll want to pick your own provider and application! If you want to use a social login like Google, Facebook, Amazon or GitHub you’ll find instructions in my recent blog post.

If you’d rather use your enterprise login (likely), you’ll have to figure out how to configure an OAuth2 / OpenID Connection application. Set the “Provider” in config.json to “other,” then provide values for ClientID, ClientSecret, AuthURL, TokenURL and possibly Scope. Most providers make this pretty easy; e.g., for Azure AD / Entra there are instructions here and a config.json fragment will look something like this:

"OAuth2": {
  "Provider": "other",
  "ClientId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "ClientSecret": "yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy",
  "AuthURL": "https://login.microsoftonline.com/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz/oauth2/v2.0/authorize",
  "TokenURL": "https://login.microsoftonline.com/ zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz /oauth2/v2.0/token"
}

If you have trouble with this, first take a closer look at my Social Login post, and feel free to ping me if it’s still not working — happy to help if I can.

An important note on this: When SQL Hammer starts up the first time, the first person to log in is given full write access to the Metadata store. This will have happened when you first logged with GitHub above. If your chosen provider is associated with the same email address you use on GitHub, all good. If not, you’ll want to start with a fresh metadata store — just delete the dss.sql file (by default in /tmp) after configuring your provider, then log in to recreate these tables.

Cookie Encryption

SQL Hammer uses the config.json values under “CookieEncrypt” to protect it’s login cookie. Since the default value is public on GitHub, anybody with access to your server could theoretically whip up a fake cookie to get access to your stuff. Not good.

In your installation directory, run “java -cp dss-server-1.0-SNAPSHOT.jar com.shutdownhook.toolbox.Encrypt keygen”. This will output a randomly-generated key; put into the “CookieEncrypt.Key” field and you’re good to go.

Metadata Store Location

The default setup stores the DSS database in /tmp; edit the Sql.ConnectionString value if you want it somewhere else.

(Maybe) add JDBC drivers

By default, SQL Hammer includes JDBC drivers for SQLite, mySQL and PostgreSQL databases. If you use a database like Microsoft SQL Server, Azure SQL or Oracle, you’ll need to download those driver(s) to the same directory where you installed SQL Hammer. Then load the JAR(s) by adding them to the “cp” argument in run.sh, separated with colons. Here’s an example doing this for Azure SQL (mssql-jdbc-12.4.2.jre11.jar):

nohup java \
  -cp dss-server-1.0-SNAPSHOT.jar:mssql-jdbc-12.4.2.jre11.jar \
  com.shutdownhook.dss.server.App \
  config.json &

Finding the right JDBC driver is pretty easy; just search “XYZ jdbc driver download” on Google and you’ll get there in no time.

99% of JDBC drivers now self-register by including a META-INF/services/java.sql.Driver file in their JAR. If for some crazy reason yours does not, you’ll need to add its fully-qualified class name to config.json under Sql.PreloadDrivers, which should be an array of string values.

Managing Connections and Access

I’m particularly fond of this — admin tasks are performed not through a pre-built UX, but by using SQL itself. The first user to log into an installation is set up as the administrator, with access to the “DSS Metadata Store” connection. A few pre-built queries are added as well, but honestly the SQL is pretty simple; three tables drive it all. Just click the “Schema” button to see how they’re put together.

connections contains one row for each configured database connection. “name” is the primary key, just a short unique label. “description” is what shows up in the dropdown, and setting “log_queries” to 1 will cause query text and parameters to be logged for audit purposes. The star is “connection_string,” which contains everything necessary to connect to and authorize a specific database connection. SQL Hammer connection strings typically contain login information and are very sensitive — be careful to restrict access to the metadata store!

You can add connections using the “connections: add new” query or just with a simple insert statement. The harder part is figuring out a connection string that works; a great place to start is this article at Baeldung. A few examples can also go a long way; so here you go:

  • SQLite: jdbc:sqlite:/PATH/TO/FILE
  • mySQL: jdbc:mysql://SERVER:PORT/DATABASE?user=USER&password=PASSWORD (port is usually 3306)
  • PostgreSQL: jdbc:postgresql://SERVER:PORT/DATABASE?user=USER&password=PASSWORD (port is usually 5432)

access holds (not surprisingly) rules about who can access which connections. The “user” field can either be a specific user (i.e., logged-in email address) or a wildcard pattern matched with the like operator. Some useful practical applications of this are “%” for matching any user, and “%@xyz.com” to match all users with email addresses at xyz.com.

In order to execute or save arbitrary queries, a user must match an access row for the relevant connection in which the “can_create” column has the value 1. Without this flag, users can only run existing queries associated with the connection that are marked “shared” (and can’t see the underlying SQL statements).

The built-in queries “connections: grant access” and “connections: remove access” can be used to manipulate the table; they’re just parameterized versions of insert and delete.

You may find it useful to create multiple “connection” rows that reference the same database, associating different queries with each and granting access to different users. For example, you may have some users that are interested in queries related to financial data, while others are focused on user behavior. By segmenting these query sets by connection, you can reduce confusion and better conform to the principle of minimum access. Since folks only see the connections they have access to, navigation is no problem.

queries holds the actual statements for saved queries, including the “is_shared” flag that marks whether users other than the creator can execute them. Mostly this table is managed through the user interface, but there’s no reason you can’t manipulate it directly as well.

Building from source

The SQL Hammer frontend is an SPA built with React; the backend is a set of Java handlers that run in the context of my custom classes on top of the built-in HttpServer. All of this is packaged into a single uber-JAR that runs as a standalone process starting with com.shutdownhook.dss.server.App.

Building the package requires a git client, Java JDK v11+, maven, node and npm, most of which you probably having hanging around anyways:

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/dss
./fullbuild.sh

The fullbuild.sh script first builds the React bundle in the client directory (I had to increase node memory with set NODE_OPTIONS=--max-old-space-size=4096) and copies the resulting files into a zip resource under server/src/main/resources. It then builds and locally installs the utility and server classes under ../toolbox, then the server components, and then finally assembles the uber-JAR which lands in server/target/dss-server-1.0-SNAPSHOT.jar. Use this JAR in place of the one in the release directory to run your own build.

If you want to actively play with the user interface, it’s possible to run the React SPA separately from the server. This is particularly nice because it gives you hot-reload. The details are a bit grotty to include here; drop me a note and I’ll get you started. If there’s any kind of broad interest I’ll write it up; just too lazy to do it now!

Quirks and Futures

This is most definitely a “v1” release. It works and works pretty well, at least for my purposes. But the UX is definitely awkward in places — the interplay between “Run” in the editing pane and “Refresh” in the run pane trips me up regularly. I’d like to have more ready-access to the schema information while running queries. The log file isn’t as configurable as it ought to be. “Setup” is a joke. And on and on.

But you have to start somewhere — and I’m hoping that the app will find its way to a few like-minded folks that have shared my pain and acknowledge the need for something like SQL Hammer. Direct SQL is not just inevitable, it’s super-powerful. It can even be an asset to agility and compliance when managed responsibly. I’d love to hear what you think!

Predicting the Future (Tides)

The beach outside our Whidbey place is amazing. There’s about twenty yards of firm sand and rocks along the shore, then a broad, flat, soft expanse of sand/mud/clay for just under 100 yards, then maybe 30 yards of firm sandbars. Beyond the sandbars, the channel drops to a depth of about 500 feet or so (the first “steps” along this drop-off are the best places to drop a crab pot).

The tide sweeping in and out over this shallow area changes our back yard dramatically from hour to hour. At the highest high tide there’s no beach at all — in the Spring whales swim just a few yards away, sucking ghost shrimp out of the mud flats. During summer low-low tides, we head out to the sand bars where you can dig for horse clams and pick up crabs hiding in the eel grass (while Copper chases seagulls for miles).

I know it sounds a bit out there, but the rhythm of our days really does sync up with the water — and it’s a wonderful way to live. “What’s the tide doing today?” is the first question everybody seems to ask as they come down for coffee in the morning. And that, my friends, sounds like fodder for another fun project.

What’s the tide doing today?

NOAA publishes tide information that drives a ton of apps — I use Tides Near Me on my phone and the TideGuide skill on Alexa, and both are great. But what I really want is something that shows me exactly what the tide will look like in my back yard. For some reason I have a really hard time correlating tide numbers to actual conditions, so an image really helps. (As an aside, difficulty associating numbers with reality is a regular thing for me. I find it very curious.) For example, if you were to stand on the deck in the afternoon on September 30, what exactly would you see? Maybe this?

Those images are generated by (a) predicting what the tide and weather will be like at a point in time, and then (b) selecting a past image that best fits these parameters from a historical database generated using an exterior webcam, NOAA data and my Tempest weather station. So the pictures are real, but time-shifted into the future. Spooooky!

Actually, my ultimate goal is to create a driftwood display piece that includes a rotating version of these images together with a nice antique-style analog tide clock. But for today, let’s just focus on predictions and images.

How Tides Work

Ocean Tides are a rabbit hole you can go down a looong way — fascinating stuff. This National Geographic article is a nice intro, and this primer by UW professor Parker MacCready really gets into the weeds. To my understanding, there are at six primary factors that contribute to tide action:

  1. Variations in pull from the Moon’s gravity on the Earth. The side facing the Moon has increased gravity, and the side opposite the moon has slightly less. Both of these cause liquid water on the surface to “bulge” along this axis (more on the closer side, less on the far side).
  2. The same thing happens due to the Sun’s gravity, but less so. Tides are most extreme when the sun and moon “line up” and work together; least so when they are at right angles to each other.
  3. The Earth is spinning, which combines with orbital movement to change which parts of the Earth are being pulled/pushed the most at any given time.
  4. The Earth is tilted, which changes the angles and magnitude of the forces as the seasons change. One consequence of this is that we tend to have daytime lows in the Summer and nighttime lows in the Winter.
  5. Weather (short-term and seasonal) can change the amount of water in a specific location (storm surges being a dramatic example).
  6. Local geography changes the practical impact of tides in specific locations (e.g., levels present differently over a wide flat area like my beach vs. in a narrow fjord).   

All of this makes it really tough to accurately predict tide levels at a particular time in a particular place. Behavior at a given location can be described reasonably well by combining thirty-seven distinct sine waves, each defined by a unique “harmonic constituent.” NOAA reverse-engineers these constituents by dropping buoys in the ocean, measuring actual tide levels over a period of months and years, and doing the math. Our closest “harmonic” or “primary” station is across the water in Everett.

“Subordinate” stations (our closest is Sandy Point) have fewer historical measurements — just enough to compute differences from a primary station (Seattle in this case). But here’s the really interesting bit — most of these “stations” don’t actually have physical sensors at all! The Sandy Point buoy was only in place from February to April, 1977. In Everett, it was there for about five months in late 1995. To find an actual buoy you have to zoom all the way out to Port Townsend! This seems a bit like cheating, but I guess it works? Wild.

You can query NOAA for tide predications at any of these stations, but unless there’s a physical buoy all you really get is high and low tide estimates. If you want to predict water level for a time between the extremes, you need to interpolate. Let’s take a look at that.

The Rule of Twelfths

Image credit Wikipedia

It turns out that sailors have been doing this kind of estimation for a long, long time using the “Rule of Twelfths.” The RoT says that if you divide the span between extremes into six parts, 1/12 of the change happens in the first part; 2/12 in the next; then 3/12, 3/12 again, 2/12 and 1/12 to finish it out. Since the period between tides is about six hours, it’s a pretty easy mental calculation that would have been good to know when I was fifteen years old trying to gun my dad’s boat through the channel off of Ocean Point (spoiler alert: too shallow).

Anyways, I use this rule together with data from NOAA and simple interpolation to predict tide levels on my beach for any given timepoint. The code is in NOAA.java and basically works like this:

  1. The NOAA class exposes a single method “getPredictions” that queries NOAA for tide extremes from one day before to two days after a given timepoint.
  2. The extremes are added to a list, as well as five RoT timepoints between each of them.
  3. The resulting list is returned to the caller as a Predictions object.

The Predictions object exposes a few methods, but the most interesting one is estimateTide, which does a binary search to find the predictions before and after the requested timepoint, then uses linear interpolation to return a best-guess water level. The resulting estimations aren’t perfect, but they are really very accurate — more than good enough for our purposes. Woo hoo!

Stepping Back

OK, let’s back up a bit and look at the code more broadly. Tides is a web app that primarily exposes a single endpoint /predict. It’s running on my trusty Rackspace server, and as always the code is on github. To build and run it, you’ll need a JDK v11 or greater, git and mvn. The following will build up the dependencies and a fat jar with everything you need:

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox && mvn clean package install
cd ../weather && mvn clean package install
cd ../tides && mvn clean package

To run the app you’ll need a config file — which may be challenging because it expects configuration information for a Tempest weather station and a webcam for capturing images. But if you have that stuff, go to town! Honestly I think the code would still work pretty well without any of the weather information — if you are interested in running that way let me know and I’d be happy to fix things up so that runs without crashing.

The code breaks down like this:

  • Camera.java is a very simple wrapper that fetches live images from the webcam.
  • NOAA.java fetches tide predictions, augments them with the RoT, and does interpolation as discussed previously.
  • Weather.java manages interactions with the Tempest. It relies on code I wrote awhile ago and discuss here.
  • TideStore.java is a simple SQL and file system store.
  • Tides.java is a domain layer that pulls all the bits and pieces together.
  • Server.java implements the web interface, using the WebServer class I build long ago.

Capturing Images and Metadata

None of this works without a pretty significant collection of metadata-tagged historical images. And you can’t capture images without a camera — so that was step one here. I have a ton of Ring cameras and I love them, but they are nearly impossible to access programmatically. Sure there are some reverse-engineered libraries, and they “kind of” work, but reliably capturing an image “right now” is a complicated and ultimately only semi-successful mess. So instead I just picked up a simple camera that is civilized enough to just expose the damn image with a URL.

Running the app with the parameter “capture” tells it to call Tides.captureCurrentTide rather than running the web server. This method:

  1. Captures the current “day of year” (basically 1 – 365) and “minute of day” (0 – 1,439). It turns out that these two values are the most critical for finding a good match (after tide height of course) — being near the same time of day at the same time of year really defines the “look” of the ocean and sky, at least here in the Pacific Northwest.
  2. Loads current weather metrics from the Tempest.
  3. Estimates the current tide level.
  4. Captures an image from the webcam.
  5. And finally, writes it all to the TideStore.

The capture stuff runs twice hourly via cron job on a little mini pc I use for random household stuff; super-handy to have a few of these lying around! Once a day, another cron job pushes new images and a copy of the database to an Azure container — a nice backup story for all those images that also lands them in a cloud location perfect for serving beyond my home network. Stage one, complete.

Picking an Image

The code to pick an image for a set of timepoints is for sure the most interesting part of this project. My rather old-school approach starts in Tides.forecastTides, which takes a series of timepoints and returns predictions for each (as well as data about nearby extremes which I’ll talk about later). The timepoints must be presented in order, and typically are clustered pretty closely — e.g., for the /predict endpoint we generate predictions for +1, +3 and +6 hours from now, plus the next three days at noon.

First we load up NOAA predictions and, if any of the timepoints are within the bounds of the Tempest forecast, that data as well. The Tempest can forecast about ten days ahead, so in normal use that works fine (the code actually interpolates weather in the same way we do for tides). As we iterate through the timepoints, we load new NOAA predictions if needed.

Armed with this stuff, the real core of the work happens in Tides.forecastTide. The first pass is in TideStore.queryClosest, which uses a series of thresholds to find images within given ranges of tide height, day of year and hour of day. We start with a very tight threshold — tide within .25 feet, day of year within 10 days and hour of day within 20 minutes. If we don’t find any, we fall back to .5/20/20, and so on from there until our last try is pretty wide at 1/120/120. If we can’t find anything at that point we just give up — hard to even squint and see that as a match. The good news is, even after collecting data for just about a month, we already succeed most of the time.

By querying in stages like this, we end up with a candidate pool of images that, from a tide/time perspective, we consider “equivalently good.” Of course we may just find a single image and have to use it, but typically we’ll find a few. In the second pass, we sort the candidates by fit to the predicted weather metrics. Again we use some thresholding here — e.g., pressure values within 2mb of each other are considered equivalent.

At the end of the day, this is futzy, heuristic stuff and it’s hard to know if all the thresholds and choices are correct. I’ve made myself feel better about it for now by building a testing endpoint that takes a full day of actual images and displays them side-by-side with the images we would have predicted without that day’s history. I’ve pasted a few results for August 30 below, but try the link for yourself, it’s fun to scroll through!

Other Ways We Could Do This: Vectors

Our approach works pretty well, even with a small (but growing!) historical database. But it’s always useful to consider other ideas. One way would be to replace my hand-tuned approach with vector-based selection. Vector distance is a compelling way to rank items by similarity across an arbitrary number of dimensions; it appeals to me because it’s pretty easy to visualize. Say you want to determine how similar other things are to a banana, using the properties “yellowness” and “mushiness” (aside: bananas are gross). You might place them on a graph like the one here.

Computing the Euclidian distance between the items gives a measure of similarity, and it kind of works! Between a papaya, strawberry and pencil, the papaya is intuitively the most similar. So that’s cool, and while in this example we’re only using two dimensions, the same approach works for “N” — it’s just harder to visualize.

But things are never that simple — if you look a little more deeply, it’s hard to argue that the pencil is closer to a banana than the strawberry. So what’s going on? It turns out that a good vector metric needs to address three common pitfalls:

  1. Are you using the right dimensions? This is obvious — mushiness and yellowness probably aren’t the be-all-end-all attributes for banana similarity.
  2. Are your dimensions properly normalized? In my tide case, UV measurements range from 0 – 10, while humidity can range from 0 – 100. So a distance of “1” is a 10% shift in UV, but only a 1% shift in humidity. If these values aren’t normalized to a comparable scale, humidity will swamp UV — probably not what we want.
  3. How do you deal with outliers? This is our pencil-vs-strawberry issue. A pencil is “so yellow” that even though it doesn’t remotely match the other dimension, it sneaks in there.

These are all easily fixable, but require many of the same judgment calls I was making anyways. And it’s a bit challenging to do an efficient vector sort in a SQL database — a good excuse to play with vector databases, but didn’t seem like a big enough advantage to worry about for this scenario.

Other Ways We Could Do This: AI

My friend Zach suggested this option and it’s super-intriguing. Systems like DALL-E generate images from text descriptions — surprisingly effective even in their most generic form! The image here is a response to the prompt “a photographic image of the ocean at low tide east of Whidbey Island, Washington.” That’s pretty spooky — even includes an island that look a lot like Hat from our place.

With a baseline like this, it should be pretty easy to use the historical database to specialty-train a model that generates “future” tide images out of thin air. This is exciting enough that I’m putting on my list of things to try — but at the same time, there’s something just a bit distasteful about deep-faking it. More on this sometime soon!

A Few Loose Ends

The rest of the code is just delivery, mostly in Server.java, using the WebServer and Template classes that make up many of my projects.

One nice little twist — remember that I pushed the images and database to an Azure container for backup. There’s nothing in those files that needs to be secret, so I configured the container for public web access. Doing this lets me serve the images directly from Azure, rather than duplicating them on my Rackspace server.

I also forgot to mention the Extremes part of tide forecasting. It turns out that it’s not really enough to know where the water is at a point in time. You want to know whether it’s rising or falling, and when it will hit the next low or high. We just carry that along with us so we can display it properly on the web page. It’s always small things like this that make the difference between a really useful dashboard and one that falls short.

I’ll definitely tweak the UX a bit when I figure out how to put it into a fancy display piece. And maybe I’ll set it up so I can rotate predictions on my Roku in between checking the ferry cameras! But that is for another day and another post. I had a great time with this one; hope you’ve enjoyed reading about it as well. Now, off to walk the beach!

It’s Always a Normalization Problem

Heads up, this is another nerdy one! ShareToRoku is available on the Google Play store. All of the client and server code is up on my github under MIT license; I hope folks find it useful and/or interesting.

Algorithms are the cool kids of software engineering. We spend whole semesters learning to sort and find stuff. Spreadsheet “recalc” engines revolutionized numeric analysis. Alignment algorithms power advances in biotechnology.  Machine learning algorithms impress and terrify us with their ability to find patterns in oceans of data. They all deserve their rep!

But as great as they are, algorithms are hapless unless they receive inputs in a format they understand — their “model” of the world. And it turns out that these models are really quite strict — data that doesn’t fit exactly can really gum up the works. As engineers we often fail to appreciate just how “unnatural” this rigidity is. If I’m emptying the dishwasher, finding a spork amongst the silverware doesn’t cause my head to explode — even if there isn’t a “spork” section in the drawer (I probably just put it in with the spoons). Discovering a flip-top on my toothpaste rather than a twist cap really isn’t a problem. I can even adapt when the postman leaves packages on top of the package bin, rather than inside of it. Any one of these could easily stop a robot cold (so lame).

It’s easy to forget, because today’s models are increasingly vast and impressive, and better every day at dealing with the unexpected. Tesla’s Autopilot can easily be mistaken for magic — but as all of us who have trusted it to exit 405 North onto NE 8th know, the same weaknesses are still hiding in there under the covers. But that’s another story.

Anyhoo, the point is that our algorithms are only useful if we can feed them data that fits their models. And the code that does that is the workhorse of the world. Maybe not the sexiest stuff out there, but almost every problem we encounter in the real world boils down to data normalization. So you’d better get good at it.

Share to Roku (Release 6)

My little Android TV-watching app is a great (in miniature) example of this dynamic at work. If you read the original post, you’ll recall that it uses the  Android “share” feature to launch TV shows and movies on a Roku device. For example, you can share from the TV Time app to watch the latest episode of a show, or launch a movie directly from its review at the New York Times. Quite handy, but it turns out to be pretty hard to translate from what apps “share” to something specific enough to target the right show. Let’s take a look.

First, the “algorithm” at play here is the code that tells the Roku to play content. We use two methods of the Roku ECP API for this:

  • Deep Linking is ideal because it lets us launch a specific video on a specific channel. Unfortunately the identifiers used aren’t standard across channels, and they aren’t published — it’s a private language between Roku and their channel providers. Sometimes we can figure it out, though — more on this later.
  • Search is a feature-rich interface for jumping into the Roku search interface. It allows the caller to “hint” the search with channel identifiers and such, and in certain cases will auto-start the content it finds. But it’s hard to make it do the right thing. And even when it’s working great it won’t jump to specific episodes, just seasons.
public class RokuSearchInfo
{
public static class ChannelTarget
{
public String ChannelId;
public String ContentId;
public String MediaType;
}
public String Search;
public String Season;
public String Number;
public List<ChannelTarget> Channels;
}

Armed with this data, it’s pretty easy to slap together the optimal API request. You can see it happening in ShareToRokuActivity.resolveAndSendSearch — in short, if we can narrow down to a known channel we try to launch the show there, otherwise we let the Roku search do its best. Getting that data in the first place is where the magic really happens.

A Babel of Inputs

The Android Sharesheet is a pretty general-purpose app-to-app sharing mechanism, but in practice it’s mostly used to share web pages or social media content through text or email or whatever. So most data comes through as unstructured text, links and images. Our job is to make sense of this and turn it into the most specific show data we can. A few examples:

App / SourceShared DataIdeal Target
1. TV Time Episode PageShow Me the Love on TV Time https://tvtime.com/r/2AID4“Trying” Season 1 Episode 6 on AppleTV+
2. Chrome nytimes.com Movie Review (No text selection)https://www.nytimes.com/2022/11/22/movies/strange-world-review.html“Strange World” on Disney+
3. Chrome Wikipedia page (movie title selected)“Joe Versus the Volcano”  https://en.wikipedia.org/wiki/Joe_Versus_the_Volcano#:~:text=Search-,Joe%20Versus%20the%20Volcano,-Article“Joe Versus the Volcano” on multiple streaming services
4. YouTube Videohttps://youtu.be/zH14EyiSlas“When you say nothing at all” cover by Reina del Cid on YouTube
5. Amazon Prime MovieHey I’m watching Black Adam. Check it out now on Prime Video! https://watch.amazon.com/detail?gti=amzn1.dv.gti.1a7638b2-3f5e-464a-a271-07c2e2ec1f8c&ref_=atv_dp_share_mv&r=web“Black Adam” on Amazon Prime
6. Netflix Series PageSeen “Love” on Netflix yet?   https://www.netflix.com/us/title/80026506?s=a&trkid=13747225&t=more&vlang=en&clip=80244686“Love” Season 1 Episode 1 on Netflix
7. Search text entered directly into ShareToRokuProject Runway Season 5“Project Runway” Season 5 on multiple streaming services.

Pipelines and Plugins

All but the simplest normalization code typically breaks down into a collection of rules, each targeted at a particular type of input. The rules are strung together into a pipeline, each doing its little bit to clean things up along the way. This approach makes it easy to add new rules into the mix (and retire obsolete ones) in a modular, evolutionary way.

After experimenting a bit (a lot), I settled on a two-phase approach to my pipeline:

  1. Iterate over a list of “parsers” until one reports that it understands the basic format of the input data.
  2. Iterate over a list of “refiners” that try to enhance the initial model by cleaning up text, identifying target channels, etc.

Each of these is defined by a standard Java interface and strung together in SearchController.java. A fancier approach would be to instantiate and order the implementations through configuration, but that seemed like serious overkill for my little hobby app. If you’re working with a team of multiple developers, or expect to be adding and removing components regularly, that calculus probably looks a bit different.

This split between “parsers” and “refiners” wasn’t obvious at first. Whenever I face a messy normalization problem, I start by writing a ton of if/then spaghetti, usually in pseudocode. That may seem backwards, but it can be hard to create an elegant approach until I lay out all the variations on the table. Once that’s in front of me, it becomes much easier to identify commonalities and patterns that lead to an optimal pipeline.

Parsers

Parsers” in our use case recognize input from specific sources and extract key elements, such as the text most likely to represent a series name. As of today there are three in production:

TheTVDB Parser (Lookup.java)

TV Time and a few other apps are powered by TheTVDB, a community-driven database of TV and movie metadata. The folks there were nice enough to grant me access to the API, which I use to recognize and decode TV Time sharing URLs (example 1 in the table). This is a four step process:

  1. Translate the short URL into their canonical URL. E.g., the short URL in example 1 resolves to https://www.tvtime.com/show/375903/episode/7693526&pid=tvtime_android.
  2. Extract the series (375903) and/or episode (7693526) identifiers from the URL.
  3. Use the API to turn these identifiers into show metadata and translate it into a parsed result.
  4. Apply some final ad-hoc tweaks to the result before returning it.

All of this data is cached using a small SQLite database so that we don’t make too many calls directly to the API. I’m quite proud of the toolbox implementation I put together for this in CachingProxy.java, but that’s an article for another day.

UrlParser.java

UrlParser takes advantage of the fact that many apps send a URL that includes their own internal show identifiers, and often these internal identifiers are the same ones they use for “Deep Linking” with Roku. The parser is configured with entries that include a “marker” string — a unique URL fragment that identifies a particular — together with a Roku channel identifier and some extra sugar not worth worrying about. When the marker is found and an ID extracted, this parser can return enough information to jump directly into a channel. Woo hoo!

SyntaxParser.java

This last parser is kind of a last gasp that tries to clean up share text we haven’t already figured out. For example, it extracts just the search text from a Chrome share, and identifies the common suffix “SxEy” where x is a season and y is an episode number. I expect I’ll add more in here over time but it’s a reasonable start.

Refiners

Once we have the basics of the input — we’ve extracted a clean search string and maybe taken a first cut at identifying the season and channels —  a series of “refiners” are called in turn to improve the results. Unlike parsers which short-circuit after a match is found, all the refiners run every time.

WikiRefiner.java

A ton of the content we watch these days is created by the streaming providers themselves. It turns out that there are folks who keep lists of all these shows on Wikipedia (e.g., this one for Netflix). The first refiner simply loads up a bunch of these lists and then looks at incoming search text for exact matches. If one is found, the channel is added to the model.

As a side note, the channel is actually added to the model only if the user has that channel installed on their Roku (as passed up in the “channels” query parameter). The same show is often available on a number of channels, and it doesn’t make sense to send a Roku to a channel it doesn’t know about. If the show is available on multiple installed channels, the Android UX will ask the user to pick the one they prefer.

RokuSearchRefiner.java

Figuring out this refiner was a turning point for the app. It makes the results far more accurate, which of course makes sense since they are sourced from Roku itself. I’ve left the WikiRefiner in place for now, but suspect I can retire it with really no decrease in quality. The logs will show if that’s true or not after a few weeks.

In any case, this refiner passes the search text up to the same search interface used by roku.com. It is insanely annoying that this API doesn’t return deep link identifiers for any service other than the Roku Channel, but it’s still a huge improvement. By restricting results to “perfect” matches (confidence score = 1), I’m able to almost always jump directly into a channel when appropriate.

I’m not sure Roku would love me calling this — but I do cache results to keep the noise down, so hopefully they’ll just consider it a win for their platform (which it is).

FixupRefiner.java

At the very end of the pipeline, it’s always good to have a place for last-chance cleanup. For example, TVDB knows “The Great British Bake Off,” but Roku in the US knows it as “The Great British Baking Show.” This refiner matches the search string against a set of rules that, if found, allow the model to be altered in a manual way. These make the engineer in me feel a bit dirty, but it’s all part of the normalization game — the choice is whether to feel morally superior or return great results. Oh well, at least the rules are in their own configuration file.

Hard Fought Data == Real Value

This project is a microcosm of most of the normalization problems I’ve experienced over the years. It’s important to try to find some consistency and modularity in the work — that’s why pipelines and plugins and models are so important. But it’s just as important to admit that the real world is a messy place, and be ready to get your hands dirty and just implement some grotty code to clean things up.

When you get that balance right, it creates enormous differentiation for your solution. Folks can likely duplicate or improve upon your algorithms — but if they don’t have the right data in the first place, they’re still out of luck. Companies with useful, normalized, proprietary data sets are just always always always more valuable. So dig in and get ‘er done.

public RokuSearchInfo parse(String input, UserChannelSet channels) {
// 1. PARSE
String trimmed = input.trim();
RokuSearchInfo info = null;
try {
info = tvdbParser.parse(input, channels);
if (info == null) info = urlParser.parse(input, channels);
if (info == null) info = syntaxParser.parse(input, channels);
}
catch (Exception eParse) {
log.warning(Easy.exMsg(eParse, "parsers", true));
info = null;
}
if (info == null) {
info = new RokuSearchInfo();
info.Search = trimmed;
log.info("Default RokuSearchInfo: " + info.toString());
}
// 2. REFINE
tryRefine(info, channels, rokuRefiner, "rokuRefiner");
tryRefine(info, channels, wikiRefiner, "wikiRefiner");
tryRefine(info, channels, fixupRefiner, "fixupRefiner");
// 3. RETURN
log.info(String.format("FINAL [%s] -> %s", trimmed, info));
return(info);
}

Share to Roku!

TLDR: if you watch TV on a Roku and have an Android phone, please give my new Share To Roku app a try! It’s currently in open testing; install it with this link on the web or this link on your Android device. The app is free, has no ads, saves no data and only makes network calls to Rokus on your local network. It acts as a simple remote, but much more usefully lets you “Share” show names from the web or other apps directly to the Roku search interface. I use it with TV Time and it has been working quite well so far — but I need broader real-world testing and would really appreciate your feedback.

Oh user interface development, how I hate you so. But my lack of experience with true mobile development has become increasingly annoying, and I really wanted an app to drive my Roku. So let’s jump back into the world of input events and user interface layouts and see if we can get comfy. Yeesh.

Share To Roku in a nutshell

I’ve talked about this before (here and here). My goal is to transition as smoothly as possible from finding a show on my phone (an Android, currently the Samsung Galaxy S21) to watching it on my TV. I keep my personal watchlist on an app called TV Time and that’s key, but I also want to be able to jump from a recommendation in email or a review on the web. So feature #1 is to create a “share target” that can accept messages from any app.

Armed with this inbound search text, the app will help the user select their Roku, ensure the TV power is on (if that particular Roku supports it), and forward the search. The app then will land on a page hosting controls to help navigate the last mile to the show (including a nice swipe-enabled directional pad that IMNSHO is way better than the official Roku app). This remote control functionality will also be accessible simply by running the app on its own. And that’s about it. Easy peasy!

All of the code for Share To Roku is up on github. I’ll do a final clean-up on the code once the testing period is over, but everything I’ve written here is free to use and adopt under an MIT license, so please take anything you find useful.

Android Studio and “Kotlin”

If you’ve worked with me before, you know that my favorite “IDE” is Emacs; I build stuff from the command line; and I debug using logs and jdb. But for better or worse, the only realistic way to build for Android is to at least mostly use Android Studio, a customized version of IntelliJ IDEA (you can just use IntelliJ too but that’s basically the same thing). AStudio generates a bunch of boilerplate code for even the simplest of apps, and encourages you to edit it all in this weird overlapping sometimes-textual-sometimes-graphical mode that at least in my experience generally ensures a messy final product. I’m not going to spend this whole article complaining about it, but it is pretty stifling.

Love me a million docked windows, three-deep toolbars and controls on every edge of the screen!

Google would also really prefer that you drop Java and instead use their trendy sort-of-language “Kotlin” to build Android apps. I’ve played this Java pre-complier game before with Scala and Groovy, and all I can say is no thank you. I will never understand why people are so obsessed with turning code into a nest of side effects, just to avoid a few semicolons and brackets. At least for now they are grudgingly continuing to support Java development — so that’s where you’ll find me. On MY lawn, where I like it.

Android application basics

Components

The most important thing to understand about Android development is that you are not in charge of your process. There is no “main” and, while you get your own JVM in which to live, that process can come and go at pretty much any time. This makes sense — at least historically mobile devices have had to manage pretty limited memory and processing power, so the OS exerts a ton of control over the use of those resources. But it can be tricky when it comes to understanding state and threading in an app, and clearly a lot of bugs in the wild boil down to a lack of awareness here.

Instead of main, an Android app is effectively a big JAR that uses a manifest file to expose Component classes. The most common of these is an Activity, which is effectively represents one user interface screen within the app. Other components include various types of background process; I’m going to ignore them here. Share to Roku exposes two Activities, one for choosing a Roku and one for the search and remote interface. Each activity derives from an Android base class that defines a set of well-known entrypoints, each of which is called at different points in the process lifecycle.

Tasks and the Back Stack

But before we dig into those, two other important concepts: tasks and the back stack. This can get wildly complicated, but the super-basics are this:

  • A “task” is a thing you’re doing on the device. Most commonly tasks are born by opening an app from the home screen.
  • Each task maintains a “stack” of activities (screens). When you navigate to a new screen (e.g., open an email from a list of emails) a new activity is added to the top of the stack. When you hit the back button, the current (top) activity is closed and you return to the previous one.
  • Mostly each task corresponds to an app — but not always. For example, when you are in Chrome and you “share” a show to my app, a new Share To Roku activity is added to the Chrome task. Tasks are not the same as JVM processes!

Taken together, the general task/activity lifecycle starts to make sense:

  1. The user starts a new task by starting an app from the home screen.
  2. Android starts a JVM for that app and loads an instance of the class for the activity marked as MAIN/LAUNCHER in the manifest.
  3. The onCreate method of the activity is called.
  4. The user interacts with the ux. Maybe at some point they dip into another activity, in which case onPause/onResume and onStop/onStart are called as the new activity starts and finishes.
  5. When the activity is finished (the user hits the back button or closes the screen in some other way) the onDestroy method is called.
  6. When the system decides it’s a good time (e.g., to reduce memory usage), the JVM is shut down.

Of course, it’s not really that simple. For example, Android may just nuke your process at any time, without ever calling onDestroy — so you’ll need to put some thought into how and when to save persistent data. And depending on your configuration, existing activity instances may be “reused” (with a call to onNewIntent). But it’s a pretty good starting place.

Intents

Intents are the means by which users navigate between activities on an Android device. We’ve actually already seen an intent in action, in step #2 above — MAIN/LAUNCHER is a special intent that means “start this app from the beginning.” Intents are used for every activity-to-activity transition, whether that’s explicit (e.g., when an email app opens up a message details activity in response to a click in a message list) or implicit (e.g., when an app opens up a new, pre-populated text message without knowing which app the user has configured for SMS).

Share to Roku uses intents in both ways. Internally, after picking a Roku, ChooseRokuActivity.shareToRoku instantiates an intent to start the ShareToRokuActivity. Because that internal navigation is the only way to land on ShareToRokuActivity, its definition in the manifest sets the “exported” flag to false and doesn’t include any intent-filter elements.

Conversely, the entry for ChooseRokuActivity in the manifest sets “exported” to true and includes no less than three intent-filter elements. The first is our old friend MAIN/LAUNCHER, but the next two are more interesting. Both identify themselves as SEND/DEFAULT filters, which mark the activity as a target for the Android Sharesheet (which we all just know as “sharing” from one app to another). There are two of them because we are registering to handle both text and image content.

Wait, image content? This seems a little weird; surely we can’t send an image file to the Roku search API. That’s correct, but it turns out that when the TV Time app launches a SEND/DEFAULT intent, it registers the content type as an image. There is an image; a little thumbnail of the show, but there is also text included which we use for the search. There isn’t a lot of consistency in the way applications prepare their content for sharing; I foresee a lot of app-specific parsing in my future if Share To Roku gets any real traction with users.

ChooseRokuActivity

OK, let’s look a little more closely at the activities that make up the app. ChooseRokuActivity (code / layout) is the first screen a user sees; a simple list of Rokus found on the local network. Once the user makes a selection here, control is passed to ShareToRokuActivity which we’ll cover next.

The list is a ListView, which gives me another opportunity to complain about modern development. Literally every UX system in the world has a control for simple displays of text-based lists. Android’s ListView is just this — a nice, simple control to which you attach an Adapter that holds the data. But the Android Gods really would rather you don’t use it. Instead, you’re supposed to use RecyclerView, a fine but much more complicated view. It’s great for large, dynamic lists, but way too much for most simple text-based UX lists. This kind of judgy noise just bugs me — an SDK should make common things as easy as possible. Sorry not sorry, I’m using the ListView. Anyways, the list is wrapped in a SwipeRefreshLayout which provides the gesture and feedback to refresh the list by pulling down.

The activity populates the list of Rokus using static methods in Roku.java. Discovery is performed by UDP broadcast in Ssdp.java, a stripped down version of the discovery classes I wrote about extensively in Anyone out there? Service discovery with SSDP, WSD, other acronyms. The Roku class maintains a static (threadsafe) list of the Rokus it finds, and only searches again when asked to manually refresh. This is one of those places where it’s important to be aware of the process lifecycle; the list is cached as long as our JVM remains alive and will be used in any task we end up in.

Take a look at the code in initializeRokus and findRokus (called from onCreate). If we have a cache of Rokus, we populate it directly into the list for maximum responsiveness. If we don’t, we create an ActivityWorker instance that searches using a background thread. The trick here is that each JVM process has exactly one thread dedicated to managing all user interface interactions — only code running on that thread can touch the UX. So if another thread (e.g., our Roku search worker) needs to update user interface components (i.e., update the ListView), it needs help.

There are a TON of ways that people manage this; ActivityWorker is my solution. A caller implements an interface with two methods: doBackground is run on a background thread, and when that method completes, the code in doUx runs on the UI thread (thanks to Activity.runOnUiThread).  These two methods can share member variables (e.g., the “rokus” set) without worrying about concurrency issues — a nice clean wrapper for a common-but-typically-messy situation.

ShareToRokuActivity

The second activity (code / layout) has more UX going on, and I’ll admit that I appreciated the graphical layout tools in AStudio. Designing even a simple interface that squishes and stretches reasonably to fit on so many different device sizes can be a challenge. Hopefully I did an OK job, but testing with emulators only goes so far — we’ll see as I get a few more testers.

If the activity was started from a sharing operation, we pick up that inbound text as “extra” data that comes along with the Intent object (the data actually comes to us indirectly via ChooseRokuActivity, since that was the entry point). Dealing with this search text is definitely the most unpleasant part of the app, because it comes in totally random and often unhelpful forms. If Share To Roku is going to become a meaningfully useful tool I’m going to have to do some more investment here.

A rare rave from me — the Android Volley HTTP library (as used in Http.java) is just fantastic. It works asynchronously, but always makes its callback on the UX thread. That is, it does automatically what I had to do manually with ActivityWorker. Since most mobile apps are really just UX sitting atop some sort of HTTP API, this makes life really really easy. Love it!

The bulk of this activity is just buttons and lists that cause fire-and-forget calls to the Roku, except for the directional pad that takes up the center of the screen. CirclePad.java is a custom control (sorry, custom “View”) that lets the user click a center button and indicate direction with either clicks in the N-S-E-W regions or (way cooler) directional swipes. A long press on the control sends a backspace, which makes entering text on the TV a bit more pleasant. Building this control felt like a throwback to Windows 3.0 development. Set a clip region, draw some lines and circles and icons. The gesture recognition is simultaneously amazingly easy (love the “fling” handler) and oddly prehistoric (check out my manual identification of a “long” press).

Publishing to the Play store

Back in the mid 00’s I spent some time consulting for Microsoft on a project called Windows Marketplace (wow there is a Wikipedia article for everything). Marketplace was sponsored by the Windows marketing team as an attempt to highlight the (yes) shareware market, which had been basically decimated by cross-platform browser-based apps. It worked a lot like any other app store, with some nice features like secure backup of purchased license keys (still a thing with some software!!!). It served a useful role for a number of years — fun times with a neat set of people (looking at you Raj, Vikram, DeeDee, Paul, Matt and Susan) and excellent chaat in Emeryville.

Anyways, that experience gave me some insight into the challenges of running and monetizing a directory of apps developed by everyone from big companies to (hello) random individuals. Making sure the apps at least work some of the time and don’t contain viruses or some weirdo porn or whatever. It’s not easy — but Google and Apple really have really done a shockingly great job. Setting up account on the Play Console is pretty simple — I did have to upload an image of my official ID and pay a one-time $25 fee but that’s about it. The review process is painful because each cycle takes about three or four days and they often come back with pretty vague rejections. For example, “you have used a word you may not have the rights to use” … which word is, apparently, a secret? But I get it.

So anyways — my lovely little app is now available for testing. If you’ve got an Android device, please use the links below to give it a try. If you have an Apple device, I’m sorry for many reasons. I will definitely be doing some work to better manipulate inbound search strings to provide a better search result on the Roku. I’m a little torn as to whether I could just do that all in-app, or if I should publish an API that I can update more easily. Probably the latter, although that does create a dependency that I’m not super-crazy about. We’ll see.

Install the beta version of Share To Roku with this link on the web or this link on your Android device.

Anyone out there? Service discovery with SSDP, WSD, other acronyms.

Those few regular readers of this stuff may remember What should we watch tonight, in which I used the Roku API to build a little web app to manage my TV watchlist. Since then I’ve found TV Time, which is waaay better and even tells me how many days until the next season of Cobrai Kai gets here (51 as of this writing). But what it doesn’t do is launch shows automatically on my TV, and yes I’m lazy enough to be annoyed by that. So I’ve been planning a companion app that will let me “share” shows directly to my Roku using the same API I used a few months ago.

This time, I’d like the app to auto-discover the TV, rather than asking the user to configure its IP address manually. Seems pretty basic — the Roku ECP API describes how it uses “Simple Service Discovery Protocol” to enable just that. But man, putting together a reliable implementation turned out to be a bear, and sent me tumbling down a rabbit hole of “service discovery” that was both fascinating and frankly a bit appalling.

Come with me down that rabbit hole, and let’s learn how those fancy home devices actually try to find each other. It’s nerd-tastic!

Can I get your number?

99% of what happens on networks is conversations between two devices that already know each other, either directly by address (like a phone number), or by a name that they use to look up an address (like using a phone book). And 99% of the time this works great — between “google.com” and QR codes and the bookmark lists we’ve all built up, there’s rarely any need to even think about addresses. But once in awhile — usually when you’re trying to set up a printer or some other smarty-pants device on your home network — things get a bit more complicated.

Devices on your network are assigned a (basically) arbitrary address by your wifi router, and they don’t have a name registered anywhere, so how do other devices find them to start a conversation? It turns out that there are a pile of different ways — most of which involve either multicast or broadcast UDP messaging, two similar techniques that enable a device to initiate a conversation without knowing exactly who it’s talking to. Vox Clamantis in Deserto as it were.

Side note: for this post I’m going to limit examples to IPv4 addressing, because it makes my job a little easier. The same concepts generally apply to IPv6, except that there is no true “broadcast” with v6 because they figured out that multicast could do all the same things more efficiently, but close enough.

An IP broadcast message is received by all devices on the local network that are listening on a given port. Typically these use the special “limited broadcast” address 255.255.255.255 (there’s also a “directed” broadcast address for each subnet which could theoretically be routed to other networks, but that’s more detail than matters for us). An IP multicast message is similar, but is received only by devices that have subscribed to (or joined) the multicast’s special “group” address. Multicast addresses all have 1110 as their most significant bits, which translates to addresses from 224.0.0.0 to 239.255.255.255.

Generally, these messages are restricted to your local network — that is, routers don’t send them out onto the wider Internet (there are exceptions for complex corporate-style networks, but whatever). This is a good thing, because the cacophony of the whole world getting all of these messages would most definitely take down the Internet. It’s also safer, as we’ll see a bit later.

Roku and SSDP

OK, back to the main thread here. Per the ECP documentation, Roku devices use Simple Service Discovery Protocol for discovery. SSDP defines a multicast address (239.255.255.250) and port (1900), a set of messages using what old folks like me still call RFC 822 format, and two interaction patterns for discovery:

  1. A client looking for devices sends a multicast M-SEARCH message with the type ssdp:discover, setting the ST header to either ssdp:all (everybody respond!) or a specific service type string (the primary Roku type is roku:ecp). AFAIK there is no authoritative list of ST values, you just kind of have to know what they are.  Devices listening for these requests respond directly to the client with a unicast HTTP OK response that includes (thank you) addressing information.
  2. Clients can also listen on the same multicast address for NOTIFY messages of type ssdp:alive or ssdp:byebye; devices send these out when they are turned on and off. It’s a good way to keep a list of devices accurate, but implementations are spotty so it really needs to be used in combination with #1.

An SSDP client in Java

Seems simple, right? I mean, OK, the basics really are simple. But a robust implementation runs into a ton of nit-picky little gotchas that, all together, took me days to sort out. The end result is on github and can be built/tested on any machine with java, git and maven installed (be sure to fix up slashes on Windows):

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox
mvn clean package
java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.ServiceDiscovery \
    ssdp

This command line entrypoint just sends out an ssdp:discover message, displays information on all the devices that respond, and loops listening for additional notifications. Somebody on your network is almost sure to respond; in particular for Roku you can look for a line something like this:

+++ ALIVE: uuid:roku:ecp:2N006D062746 | roku:ecp | http://192.168.86.47:8060/ | (/192.168.86.47:1900)

Super cool! If you open up that URL in a browser you’ll see a bunch more detail about your Roku and the interfaces it supports.

Discovery Protocol Abuse

If you don’t see any responses at all, it’s likely that your firewall is blocking UDP messages either to or from port 1900 — my Linux Mint distribution does both by default. Mint uses UncomplicatedFirewall which means you can see blocking activity (as root) in /var/log/ufw.log and open up UDP port 1900 with commands like:

sudo ufw allow from any port 1900 to any proto udp
sudo ufw allow from any to any port 1900 proto udp

Before you do this, you should be aware that there is some potential for bad guys to do bad stuff — pretty unlikely, but still. Any protocol that can “amplify” one message into many (because multiple devices can respond) carries some risk of a denial-of-service attack. That can be very simple: a bad guy on your network just fires off a ton of M-SEARCH requests, prompting a flood of responses that overwhelm the network as a whole. Or it can be nastier: combined with “ip spoofing,” a bad guy can redirect amplified responses to an unsuspecting victim.

Really though, it’s pretty theoretical for a home network — routers don’t generally route these messages, so it’d have to be an inside job anyways. And once you’ve got a bad actor inside your network, they can probably do a lot more damage than just slowing it down. YMMV, but I’m not personally super-worried about this particular attack. Just please don’t confuse my blasé assessment here with the risks of the related Universal Plug-and-Play (UPnP) protocol, which are quite real.

Under the Covers

There is quite a bit to talk about in the code here. Most of the hard work is in UdpServiceDiscovery.java (I’ll explain this abstraction later), which uses two sockets and two worker threads:

Socket/Thread #1 (DISCOVERY) sends M-SEARCH requests and receives back HTTP OK responses. A request is sent when the thread first starts up and can be repeated either on demand or on a timer (by default every twenty minutes).

It’s key to understand that while the request here is a multicast message, the responses are sent back directly as a unicast. I didn’t implement the server side specifically, but you can see how this works in the automated tests — the responding device extracts the source address and port from the multicast and just replies with a standard UDP unicast message. This is important for us because only one process on a computer can actively listen for unicast messages on a port. And on many systems, somebody is probably already doing that on port 1900 (for example, the Windows service “SSDP Discovery”). So if we want to reliably hear HTTP OK responses, we need to be using an unused, automatically-assigned port for this socket.

Socket/Thread #2 (NOTIFICATION) is for receiving unsolicited NOTIFY multicasts. This socket uses joinGroup to register interest and must be opened on port 1900 to work correctly.

forEachUsefulInterface is an interesting little bit of code. It’s used both for sending requests and joining multicast groups, ensuring that the code works in a system that is connected to multiple network interfaces (typically not the case at home, but better safe than sorry). Remember that multicasts are restricted to a local network — so if you’re attached to multiple networks, you’ll need to send out one message on each of them. The realities of coordinating interfaces with addresses can get pretty complicated, but I think this gets it right. Let me know if you think I’ve missed something!

The class also tries to identify and ignore duplicate UDP messages. Dealing with dups just comes with the territory when working with UDP — and while the nature of the SSDP protocol means it generally doesn’t hurt anything to re-process them, it’s just icky. UdpServiceDiscovery tries to filter them out using message hashing and a FIFO queue of recently-received messages. You can tune this behavior (or turn it off) through config; default is a two-second lookback.  

Wait, is that Everyone? Enter WS-Discovery.

If you look closely you’ll see that UdpServiceDiscovery really isn’t specific to SSDP at all — all of the protocol-specific stuff is in Ssdp.java and transits through yet another class ServiceDiscovery.java. What the heck is going on here? The short story is that SSDP doesn’t return most printers, and Microsoft always needs to be special. The longer story requires a quick aside into the insanity that was “WS*”.

Back in 1999 and 2000, folks realized that HTTP would be great for APIs as well as web pages — and two very different approaches emerged. First by a few months was SOAP (and it’s fast-follower WSDL), which tried to be transport-independent (although 99.9% of traffic was over HTTP) and was all about defined, strongly typed interfaces. The foil to SOAP was REST — a much lighter and Internettish way to think about machine-to-machine interaction.

SOAP was big company, REST was scrappy startup. And nobody was more SOAPy than Microsoft. They had a whole group (I’m looking at you Bill, and your buddy John too) that did nothing but make up abstract, overly-complicated, insane SOAP-based “standards” informally known as “WS*” that nobody understood or needed. Seriously, just check out this poster (really, click that link, zoom in and scroll for awhile, it’s shocking). Spoiler alert: REST crushed it.

Anyways — one of these beasts was WS-Discovery, a protocol for finding devices on a network that does exactly the same thing as SSDP. Not “generally the same thing,” but exactly the same thing. The code that works for SSDP works for WSD too, just swap out the HTTP-style metadata for XML. Talk about reinventing the wheel, yeesh. But at least this explains the weird object hierarchy in my discovery classes:

Since these all use callback interfaces and sometimes you just want an answer, I added OneShotServiceDiscovery that wraps up Ssdp and Wsd like this (where “4” below is the number of seconds to wait for UDP responses to come in):

Set<ServiceInfo> infosSSD = OneShotServiceDiscovery.ssdp(4);
Set<ServiceInfo> infosWSD = OneShotServiceDiscovery.wsd(4);

There’s an entrypoint for this too, so to get a WSD device list you can use (the example is my Epson ET-3760):

java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.OneShotServiceDiscovery \
    wsd
...
urn:uuid:cfe92100-67c4-11d4-a45f-e0bb9e278967 | wsdp:Device wscn:ScanDeviceType wprt:PrintDeviceType | http://192.168.5.228:80/WSD/DEVICE | (/192.168.5.228:3702)

Actually, a full WSD implementation is more complicated than this. The protocol defines a “discovery proxy” — a device on the network that can cache device information and reduce network traffic. A proxy advertises itself by sending out HELLO messages with type d:DiscoveryProxy; clients are supposed to switch over to use this service when it’s present. So so much complexity for so so little benefit. No thanks.

Don’t forget the broadcast bunch

And we’re still not done. SSDP and WSD cover a bunch of devices and services, but they still miss a lot. Most of these use some sort of custom broadcast approach. If you poke around in UdpServiceDiscovery you’ll find a few special case bits to handle the broadcast case — we disable the NOTIFICATION thread altogether, and just use the DISCOVERY thread/socket to send out pings and listen for responses. The Misc class provides an entrypoint for this; you can find my Roomba using broadcast port 5678 like this:

java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.Misc \
    255.255.255.255 5678 irobotmcs
...
============ /192.168.4.48:5678
{"ver":"3","hostname":"Roomba-3193C60472324700","robotname":"Bellvoomba","ip":"192.168.4.48","mac":"80:91:33:9D:E2:16","sw":"v2.4.16-126","sku":"R960020","nc":0,"proto":"mqtt","cap":{"pose":1,"ota":2,"multiPass":2,"pp":1,"binFullDetect":1,"langOta":1,"maps":1,"edge":1,"eco":1,"svcConf":1}}

Sometimes you don’t even need a ping. For example, devices that use thes Tuya platform just sit there and constantly broadcast their presence on port 6666 or 6667:

java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.Misc \
    255.255.255.255 6667 tuya
...
============ / 192.168.5.253:60913
????1r???8?W??⌂???▲????H??? _???r?9???3?o*?jwz#?$?Z?H?¶?Q??9??r~  ?U

OK, that’s not super useful — it’s my smart ceiling fan, but apparently not all of their devices broadcast on 6666, and the messages on port 6667 are encrypted (using a global key, duh — this code shows how to decrypt them). This kind of thing annoys me because it doesn’t really secure anything and just makes life harder for everyone. I’m going to register my protest by not writing that code myself; that’ll show them.

In any case, you get the point — there are a lot of ways that devices try to make themselves discoverable. I’ve even seen code that just fully scans the network — mini wardialers that check every possible address for specific open ports (an approach that won’t survive the eventual v4-v6 address transition!). It’d be nice if this was more standardized, but I’m happy to live with a little chaos in return for the innovations that pop up every day. It’ll settle out eventually. Maybe.

Now back to that Roku app, which will use something like 5% of the code I wrote for this post. Just one of the reasons I’m a fan of retirement — I can burn cycles on any rabbit hole I damn well please. Getting the code right can be tricky, so perhaps it’ll prove useful to some other nerd out there. And as always, please let me know if you find a bug!

RuBy – Blocking Russia and Belarus

The Internet is a funny place. At the exact same moment that Russian troops are committing war crimes in the real world, Russian users online are just bopping around as if everything is cool. ShutdownHook is anything but a large-scale website, but it does get enough traffic to provide interesting insights in the form of global usage maps. And pretty much every day, browsers from Russia (and very occasionally Belarus) are stopping by to visit.

Well, at least they were until this afternoon. My love for free speech does not extend to aiding and abetting my enemies — and until the people of Russia and Belarus abandon their attacks on Ukraine, I’m afraid that is the best term for what they are. And before you spin up the de rigueur argument about not punishing people for the acts of their government, please just save it. I get the point, but there is nobody on earth that can fix these countries other than their citizens. They do bear responsibility — just as I and my fellow Americans did when we granted a cowardly, bullying toddler the United States’ nuclear codes for four years. Regardless of our individual votes.

Anyways, while I’m certainly not changing the world with my amateur postings here on ShutdownHook, I am trying in a very small way to share ideas and experience that will make folks better engineers and more creative and eclectic individuals. And I just don’t want to share that stuff with people who are, you know, helping to kill families and steal or destroy their homes. Weird, I know.

Enter RuBy — a tiny little web service that detects browsers from these two countries and replaces site content with a static Ukrainian Flag. You can add it to your web site too, and I hope you will. All it takes is one line anywhere on your site:

 <script src="https://shutdownapps.duckdns.org:7076/ruby.js" type="text/javascript" defer></script>

It’s not perfect — the same VPN functionality that folks use to stream The Great Pottery Throw Down before it’s available in the States will foil my script. But that’s fine — the point is to send a general message that these users are not welcome to participate in civilized company, and I think it does the trick.

If you’d rather not use the script from my server, the code is freely-available on github — go nuts. I’ll cover all the details in this post, so keep on reading.

Geolocation Basics

Image credit Wikipedia

Geolocation is a general term for a bunch of different ways to figure out where a particular device exists in the real world. The most precise of these is embedded GPS. Pretty much all of our phones can receive signals from the GPS satellite network and use that information to understand where they are — it’s how Google Maps shows your position as you sit in traffic during your daily commute. It’s amazing technology, and the speed with which we’ve become dependent on it is stunning.

Most other approaches to positioning are similar; they rely on databases that map some type of identifiable signal to known locations. For your phone that might be cell towers, each of which broadcasts a unique identifier. Combining this data (e.g., from opencellid.org) with real-time signal strength can give some pretty accurate results. You can do the same thing with a location-aware database of wifi networks like the one at wigle.net (the nostalgia behind “wardriving” is strong for this nerd). Even the old WWII-era LORAN system basically worked this way.

But the grand-daddy of location techniques on the Internet is IP-based geolocation, and it remains the most common for locating far-away clients without access to signal-based data. Each device on the Internet has an “IP Address” used to route messages — you can see yours at https://whatsmyip.com/ (ok technically that’s probably your router’s address, but close enough). This address is visible to both sides of a TCP/IP exchange (like a browser making a request to a web server), so if the server has access to a location-aware database of IP addresses, it can estimate the browser’s real-world location. The good folks at ip2location.com have been maintaining exactly this database for years, and insanely they still make a version available for free at https://lite.ip2location.com/.

The good news for IP-based geolocation is that it’s hard to technically spoof an IP address. The bad news is that it’s easy to insert devices between your browser and a server, so spoofing isn’t really even required to hide yourself. The most common approach is to use a virtual private network (“VPN”). With a VPN your browser doesn’t directly connect to the web server at all — instead, it connects to a VPN server and asks it to talk to the real server on your behalf. As far as the server is concerned, you live wherever your VPN server lives.

There are whole companies like NordVPN that deliver VPN services. They maintain thousands of VPN servers — one click makes your browser appear to be anywhere in the world. Great for getting around regional streaming restrictions! And to be fair, a really good way to increase your privacy profile on the Internet. But still, just a teeny bit shady.

Geo-Blocking

There are a few ways to use IP-based location data to restrict who is allowed to visit a website. Most commercial or high-traffic sites sit behind some kind of a firewall, gateway or proxy, and most of these can automatically block traffic using location-based rules. This is actually pretty common, in particular to protect against countries (you know who you are) that tend to be havens for bad actors. Cloud providers like Azure and AWS are making this kind of protection more and more accessible, which is a great thing.

Another approach is to implement blocking at the application level, which is what I’ve done with RuBy. In theory this is super-simple, but there are some interesting quirks of the IP addressing landscape that make it worth some explanation.

But first a quick side note — there are no new ideas, and it turns out that I’m not the only person to have come up with this one. The folks over at redirectrussia.org have a script as well — it’s a little more complicated than mine, and a bit smarter — e.g., they limit web service calls by doing a first check on the browser’s timezone setting. They also allow the site owner to redirect blocked clients to a site of their choosing, whereas I just slap a flag over the page and call game over. Whichever you pick, you’re doing a solid for the good guys.

RuBy as a Web Service

Using the web service is about as simple as it gets; just add that one-line script fragment anywhere on your page and you’re done. Under the covers, what happens is this:

  • The browser fetches some javascript from the URL at https://shutdownapps.duckdns.org:7076/ruby.js. Note the “defer” attribute on the tag; this instructs the browser to load the script asynchronously and delay execution until the rest of the page is loaded. This avoids any performance impact for pages using the script.
  • The web service examines the incoming IP address and compares it to a list of known address ranges coming from Russia and Belarus. If the IP is not in one of those ranges, an empty script is returned and the page renders / behaves normally.
  • If the IP is in one of those ranges, the returned script replaces the HTML of the page with a full-window rendering of the Ukrainian flag (complete with official colors #005BBB and #FFD500). I considered redirecting to another site, but preferred the vibe of fully dead-ending the page.

Most systems can pretty easily add script tags to template pages. For ShutdownHook it was a little harder because I was using a subscription plan at WordPress.com that doesn’t allow it. This isn’t a problem if you’re on the “business” plan (I chose to upgrade) or are hosting the WordPress software yourself or anywhere that allows plugins. After upgrading, I used the very nice “Insert Headers and Footers” plugin to insert the script tag into the HEAD section of my pages.

And really, that’s it. Done and done.

RuBy Under the Covers

The lookup code itself lives in RuBy.java. It depends on access to the IP2Location Lite “DB1” database; in particular the IPV6 / CSV version. Now, there are tons of ready-to-go libraries for working with this database, including for Java. I chose to implement my own because RuBy has very specific, simple requirements that lend themselves to a more space- and time-efficient implementation than a general-purpose library. A classic engineering tradeoff — are those benefits worth the costs of implementation and code ownership? In my case I think so, because I’m running the service for free and want to keep hardware costs to a minimum, but there are definitely arguments on both sides.

In a nutshell, RuBy is configured with a database file and a list of countries to block (specified as ISO-3166 alpha-2 codes). It makes a number of assumptions about the format of the data file (listed at the top of the source file), so be careful if you use another data source. Only matching ranges are loaded into an array sorted by the start of the range, and queries are handled by binary-searching into the array to find a potentially matching range and then checking its bounds. For Russia and Belarus, this ends up holding only about 18,000 records in memory, so resource use is pretty trivial.

IP addressing does get a little complicated though; converting text-based addresses to the integer values in the lookup array can be tricky.

Once upon a time we all used “v4” addresses, which you’ve surely seen and look like this: 127.0.0.1. Each of the four numbers are byte values from 0-255, so there are 8 * 4 = 32 bits available for a total of about 4.3 billion unique addresses. Converting these to a number is a simple matter that will look familiar to anyone who ever had to implement “atoi” in an interview setting:

a.b.c.d = (16777216 * a) + (65536 * b) + (256 * c) + d

Except, oops, it turns out that the Internet uses way more than 4.3 billion addresses. Back a few years ago this was the source of much hand-wringing and in fact the last IPv4 addresses were allocated to regional registries more than a decade ago. The long-term solution to the problem was to create “v6” addressing which uses 128 bits and can assign a unique address to a solid fraction of all the atoms that make up planet Earth. They’re pretty ugly (e.g., 2001:0db8:85a3:0000:0000:8a2e:0370:7334), but they do the trick.

Sadly though, change is hard, and IPv4 has stubbornly refused to die — only something like 20-40% of the traffic on the Internet is currently using IPv6. Mostly this is because somebody invented NAT (Network Address Translation) — a simple protocol that allows all of the dozens of network devices in your house or workplace to share a single public IP address. So at least for the foreseeable future, we’ll be living in a world where both versions are out in the wild.

To get the most coverage, we use the IP2Location database that includes both v4 and v6 addresses. All of the range values in this database are specified as v6 values, which we can manage because a v4 address can be converted to v6 just by adding “::FFFF:” to the front. This amounts to adding an offset of 281,470,681,743,360 to its natural value — you can see this and the other gyrations we do in the addressToBigInteger method (and for kicks its reverse in bigIntegerToAddress).

Spread the Word!

Technically, that’s about it — pretty simple at the end of the day. But getting everything lined up cleanly can be a bit of a hassle; I hope that between the service and the code I’ve made it a little easier.

Most importantly, I hope people actually use the code on their own websites. We really are at a critical moment in modern history — are we going to evolve into a global community able to face the big challenges, or will we slide back to 1850 and play pathetic imperialist games until we just extinguish ourselves? My generation hasn’t particularly distinguished itself yet in the face of this stuff, but I’m hopeful that this disaster is blatant enough that we’ll get it right. My call to action:

  • If you run a website, consider blocking pariah nations. You can do this with your firewall or gateway, with the RuBy or Redirect Russia scripts, or just roll your own. The only sites I hope we’ll leave open are the ones that might help citizens in these countries learn the truth about what is really happening.
  • Share this article with colleagues and friends on social media so they can do the same.
  • And even more key, (1) give to causes like MSF that provide humanitarian aid, and (2) make sure our representatives continue supporting Ukraine with lethal aid and punishing Russia/Belarus with increasing sanctions.

If I can help with any of this, just drop me a line and let me know.

Attribution: This site or product includes IP2Location LITE data available from https://lite.ip2location.com.

Ground-Up with the Bot Framework

It seems I can’t write about code these days without a warmup rant. So feel free to jump directly to the next section if you like. But where’s the fun in that?

My mixed (ok negative) feelings about “quickstarts” go back all the way to the invention of “Wizards” at Microsoft in the early 1990s. They serve a worthy goal, guiding users through a complex process to deliver value quickly. But even in those earliest days, it was clear that the reality was little more than a cheap dopamine hit, mostly good for demos and maybe helping show what’s possible. The problem comes down to two (IMNSHO) fatal flaws:

First, quickstarts abandon users deep in the jungle with a great SUV but no map or driver’s license. Their whole reason to exist is to avoid annoying details and optionality, but that means that the user has no understanding of the context in which the solution was created. How do you change it? What dependencies does it require? How does it fit into your environment? Does it log somewhere? Is it secured appropriately for production? How much will it cost to run? The end result is that people constantly put hacked-up versions of “Hello World” into production and pay for it later when they have no idea what is really going on.

Second, they make developers even lazier than they naturally are anyways. Rather than start with the basics, quickstarts skip most of the hard stuff and lock in decisions that any serious user will have to make for themselves. If this was the start of the documentation, that’d be fine — but it’s usually the end. Instead of more context, the user just gets dropped unceremoniously into auto-generated references that don’t provide any useful narrative. Even worse, existence of the quickstart becomes an excuse for a sloppy underlying interface design (whether that’s an API or menus and dialogs) — e.g., why worry about the steering wheel if people take the test-drive using autopilot?

Anyways, this is really just a long-winded way to say that the Bot Framework quickstart is pretty useless, especially if you’re using Java. Let’s do better, shall we?

What is the Bot Framework?

There are a bunch of SDKs and builders out there for creating chatbots. The Microsoft Bot Framework has been around for a while (launched out of Microsoft Research in 2016) and seems to have pretty significant mindshare. Actually the real momentum really seems to be with no-code or low-code options, which makes sense given how many bots are shallow marketing plays — but I’m jumping right into the SDK here because that’s way more fun, and it’s my blog.

The framework is basically a big normalizer. Your bot presents a standardized HTTPS interface, using the Bot Framework SDK to help manage the various structures and state. The Azure Bot Service acts as a hub, translating messages in and out of various channels (Teams, Slack, SMS, etc.) and presenting them to your interface. Honestly, that’s basically the whole thing. There are additional services to support language understanding and speech-to-text and stuff like that, but it’s all additive to the basic framework.

WumpusBot and RadioBot

I introduced WumpusBot in my last post … basically a chatbot that lets you play a version the classic 1970s game Hunt the Wumpus. The game logic is adapted from a simplified version online and lives in Wumpus.java, but I won’t spend much time on that. I’ve hooked WumpusBot up to Twillio SMS, so you can give it a try by texting “play” to 706-943-3865.

The project also contains RadioBot, a second chatbot that knows how to interact with the Shutdown Radio service I’ve talked about before. This one is hooked up to Microsoft Teams and includes some slightly fancier interactions — I’ll talk about that after we get a handle on the basics.

Build Housekeeping

All this is hosted in an Azure Function App — so let’s start there. The code is on github. You’ll need git, mvn and a JDK. Build like this:

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox
mvn clean package install
cd ../radio/azure
mvn clean package

To run you’ll need two Cosmos Containers (details in Shutdown Radio on Azure, pay attention to the Managed Identity stuff) and a local.settings.json file with the keys COSMOS_ENDPOINT, COSMOS_ DATABASE, COSMOS_CONTAINER and COSMOS_CONTAINER_WUMPUS. You should then be able to run locally using “mvn azure-functions:run.”

Getting a little ahead of myself, but to deploy to Azure you’ll need to update the “functionAppName” setting in pom.xml; “mvn azure-functions:deploy” should work from there assuming you’re logged into the Azure CLI.

The Endpoint

Your bot needs to expose an HTTPS endpoint that receives JSON messages via POST. The Java SDK would really like you to use Spring Boot for this, but it 100% isn’t required. I’ve used a standard Azure Function for mine; that code lives in Functions.java. It really is this simple:

  1. Deserialize the JSON in the request body into an Activity object (line 68).
  2. Pull out the “authorization” header (careful of case-sensitivity) sent by the Bot Framework (line 71).
  3. Get an instance of your “bot” (line 52). This is the message handler and derives from ActivityHandler in WumpusBot.java.
  4. Get an instance of your “adapter.” This is basically the framework engine; we inherit ours from BotFrameworkHttpAdapter in Adapter.java.
  5. Pass all the stuff from steps 1, 2 and 3 to the processIncomingActivity method of your Adapter (line 74).
  6. Use the returned InvokeResponse object to send an HTTPS status and JSON body back down the wire.

All of which is to say, “receive some JSON, do a thing, send back some JSON.” Wrapped up in a million annoying Futures.

The Adapter

The BotAdapter acts as ringmaster for the “do a thing” part of the request, providing helpers and context for your Bot implementation.

BotFrameworkHttpAdapter is almost sufficient to use as-is; the only reason I needed to extend it was to provide a custom Configuration object. By default, the object looks for configuration information in a properties file. This isn’t a bad assumption for Java apps, but in Azure Functions it’s way easier to keep configuration in the environment (via local.settings.json during development and the “Configuration” blade in the portal for production). EnvConfiguration in Adapter.java handles this, and then is wired up to our Adapter at line 34.

The adapter uses its configuration object to fetch the information used in service-to-service authentication. When we register our Bot with the Bot Service, we get an application id and secret. The incoming authentication header (#2 above) is compared to the “MicrosoftAppId” and “MicrosoftAppSecret” values in the configuration to ensure the connection is legitimate.

Actually, EnvConfiguration is more complicated than would normally be required, because I wanted to host two distinct bots within the same Function App (WumpusBot and RadioBot). This requires a way to keep multiple AppId and AppSecret values around, but we only have one System.env() to work with. The “configSuffix” noise in my class takes care of that segmentation.

There are a few other “providers” you can attach to your adapter if needed. The most common of these is the “AuthenticationProvider” that helps manage user-level OAuth, for example if you want your bot to access a user’s personal calendar or send email on their behalf. I didn’t have any need for this, so left the defaults alone.

Once you get all this wired up, you can pretty much ignore it.

The Bot

Here’s where the fun stuff starts. The Adapter sets up a TurnContext object and passes it to the onTurn method of your Bot implementation. The default onTurn handler is really just a big switch on the ActivityType (MESSAGE, TYPING, CONVERSATION_UPDATE, etc.) that farms out calls to type-specific handlers. Your bot can override any of these to receive notifications on various events.

The onMessageActivity method is called whenever your bot receives a (duh) message. For simple text interactions, simply call turnContext.getActivity().getText() to read the incoming text, and turnContext.sendActivity(MessageFactory.text(responseString)) to send back a response.

The Bot Framework has tried to standardize on markdown formatting for text messages, but support is spotty. For example Teams and WebChat work well, but Skype and SMS just display messages as raw text. Get used to running into this a lot — normalization across channels is pretty hit or miss, so for anything complex you can expect to be writing channel-specific code. This goes for conversation semantics as well. For example from my experience so far, the onMembersAdded activity:

  • Is called in Teams right away when the bot enters a channel or a new member joins;
  • Is called in WebChat only after the bot receives an initial message from the user; and
  • Is never called for Twilio SMS conversations at all.

Managing State

Quirks aside, for a stateless bot, that’s really about all there is to it. But not all bots are stateless — some of the most useful functionality emerges from a conversation that develops over time (even ELIZA needed a little bit of memory!) To accomplish that you’ll use the significantly over-engineered “BotState” mechanism you see in use at WumpusBot.java line 57. There are three types of state:

All of these are the same except for the implementation of getStorageKey, which grovels around in the turnContext to construct an appropriate key to identify the desired scope.

The state object delegates actual storage to an implementation of a CRUD interface. The framework implements two versions, one in-memory and one using Cosmos DB. The memory one is another example of why quickstarts are awful — it’s easy, but is basically never appropriate for the real world. It’s just a shortcut to make the framework look simpler than it really is.

The Cosmos DB implementation is fine except that it authenticates using a key. I wanted to use the same Managed Identity I used elsewhere in this app already, so I implemented my own in Storage.java. I cheated a little by ignoring “ETag” support to manage versioning conflicts, but I just couldn’t make myself believe that this was going to be a problem. (Fun fact: Cosmos lets you create items with illegal id values, but then you can’t ever read or delete them without some serious hackage. That’s why safeKey exists.)

Last and very important if you’re implementing your own Storage — notice the call to enableDefaultTyping on the Jackson ObjectMapper. Without this setting, the ObjectMapper serializes to JSON without type information. This is often OK because you’re either providing the type directly or the OM can infer reasonably. But the framework’s state map is polymorphic (it holds Objects), so these mechanisms can’t do the job. Default typing stores type info in the JSON so you get back what you started with.

Once you have picked your scope and set up Storage, you can relatively easily fetch and store state objects (in my situation a WumpusState) with this pattern:

  1. Allocate a BotState object in your Bot singleton (line 39).
  2. Call getProperty in your activity handler to set up a named property (line 57).  
  3. Fetch the state using the returned StatePropertyAccessor and (ugh) wait on the Future (lines 58-60). Notice the constructor here which is used to initialize the object on first access.  
  4. Use the object normally.
  5. Push changes back to storage before exiting your handler (line 68). Change tracking is implicit, so be sure to update state in the specific object instance you got in step #3. This is why Wumpus.newGame() never reallocates a WumpusState once it’s attached.

Testing your Bot Locally

Once you have your Function App running and responding to incoming messages, you can test it out locally using the Bot Framework Emulator. The Emulator is a GUI that can run under Windows, Mac or Linux (in X). You provide your bot’s endpoint URL (e.g., http://localhost:7071/wumpus for the WumpusBot running locally with mvn azure-functions:run) and the app establishes a conversation that includes a bunch of nifty debugging information.

Connecting to the Bot Service

The emulator is nice because you can manage things completely locally. Testing with the real Bot Service gets a little more complicated, because it needs to access an Internet-accessible endpoint.

All of the docs and tutorials have you do this by running yet another random tool. ngrok is admittedly kind of cute — it basically just forwards a port from your local machine to a random url like https://92832de0.ngrok.io. The fact that it can serve up HTTPS is a nice bonus. So if you’re down for that, by all means go for it. But I was able to do most of my testing with the emulator, so by the time I wanted to see it live, I really just wanted to see it live. Deploying the function to Azure is easy and relatively quick, so I just did that and ended up with my real bot URL: https://shutdownradio.azurewebsites.net/wumpus.

The first step is to create the Bot in Azure. Search the portal for “Azure Bot” (it shows up in the Marketplace section). Give your bot a unique handle (I used “wumpus”) and pick your desired subscription and resource group (fair warning — most of all this can be covered under your free subscription plan, but you might want to poke around to be sure you know what you’re getting into). Java bots can only be “Multi Tenant” so choose that option and let the system create a new App ID.

Once creation is complete, paste your bot URL into the “Messaging Endpoint” box. Next copy  down the “Microsoft App Id” value and click “Manage” and then “Certificates & secrets.” Allocate a new client secret since you can’t see the value of the one they created for you (doh). Back in the “Configuration” section of your Function app, add these values (remember my comment about “configSuffix” at the beginning of all this):

  • MicrosoftAppId_wumpus (your app id)
  • MicrosoftAppSecret_wumpus (your app secret)
  • MicrosoftAppType_wumpus (“MultiTenant” with no space)

If you want to run RadioBot as well, repeat all of this for a new bot using the endpoint /bot and without the “_wumpus” suffixes in the configuration values.

Congratulations, you now have a bot! In the Azure portal, you can choose “Test in Web Chat” to give it a spin. It’s pretty easy to embed this chat experience into your web site as well (instructions here).

You can use the “Channels” tab to wire up your bot to additional services. I hooked Wumpus up to Twilio SMS using the instructions here. In brief:

  • Sign up for Twilio and get an SMS number.
  • Create a “TwiML” application on their portal and link it to the Bot Framework using the endpoint https://sms.botframework.com/api/sms.
  • Choose the Twilio channel in the Azure portal and paste in your TwiML application credentials.

That’s it! Just text “play” to 706-943-3865 and you’re off to the races.

Bots in Microsoft Teams

Connecting to Teams is conceptually similar to SMS, just a lot more fiddly.

First, enable the Microsoft Teams channel in your Bot Service configuration. This is pretty much just a checkbox and confirmation that this is a Commercial, not Government, bot.

Next, bop over to the Teams admin site at https://admin.teams.microsoft.com/ (if you’re not an admin you may need a hand here). Under “Teams Apps” / “Setup Policies” / “Global”, make sure that the “Upload custom apps” slider is enabled. Note if you want to be more surgical about this, you can instead add a new policy with this option just for developers and assign it to them under “Manage Users.”

Finally, head over to https://dev.teams.microsoft.com/apps and create a new custom app. There are a lot of options here, but only a few are required:

  • Under “Basic Information”, add values for the website, privacy policy and terms of use. Any URL is fine for now, but they can’t be empty, or you’ll get mysterious errors later.
  • Under “App Features”, add a “Bot.” Paste your bot’s “Microsoft App Id” (the same one you used during the function app configuration) into the “Enter a Bot ID” box. Also check whichever of the “scope” checkboxes are interesting to you (I just checked them all).

Save all this and you’re ready to give it a try. If you want a super-quick dopamine hit, just click the “Preview in Teams” button. If you want to be more official about it, choose “Publish” / “Publish to org” and then ask your Teams Admin to approve the application for use. If you’re feeling really brave, you can go all-in and publish your bot to the Teams Store for anyone to use, but that’s beyond my pay grade here. Whichever way you choose to publish, once the app is in place you can start a new chat with your bot by name, or add them to a channel by typing @ and selecting “Get Bots” in the resulting popup. Pretty cool!

A caveat about using bots in channels: your bot will only receive messages in which they are @mentioned, which can be slightly annoying but net net probably makes sense. Unfortunately though, it is probably going to mess up your message parsing, because the mention is included in the message text (e.g., “<at>botname</at> real message.”). I’ve coded RadioBot to handle this by stripping out anything between “at” markers at line 454. Just another way in which you really do need to know what channel you’re dealing with.

Teams in particular has a whole bunch of other capabilities and restrictions beyond what you’ll find in the vanilla Bot Framework. It’s worth reading through their documentation and in particular being aware of the Teams-specific stuff you’ll find in TeamsChannelData.

We made it!

Well that was a lot; kind of an anti-quickstart. But if you’ve gotten this far, you have a solid understanding of how the Bot Framework works and how the pieces fit together, start to finish. There is a bunch more we could dig into (for instance check out the Adaptive Card interfaces in RadioBot here and here) — but we don’t need to, because you’ll be able to figure it out for yourself. Teach a person to fish or whatever, I guess.

Anyhoo, if you do anything cool with this stuff, I’d sure love to hear about it, and happy to answer questions if you get stuck as well. Beyond that, I hope you’ll enjoy some good conversations with our future robot overlords, and I’ll look forward to checking in with another post soon!

Shutdown Radio on Azure

Back about a year ago when I was playing with ShutdownRadio, I ranted a bit about my failed attempt to implement it using Azure Functions and Cosmos. Just to recap, dependency conflicts in the official Microsoft Java libraries made it impossible to use these two core Azure technologies together — so I punted. I planned to revisit an Azure version once Microsoft got their sh*t together, but life moved on and that never happened.

Separately, a couple of weeks ago I decided I should learn more about chatbots in general and the Microsoft Bot Framework in particular. “Conversational” interfaces are popping up more and more, and while they’re often just annoyingly obtuse, I can imagine a ton of really useful applications. And if we’re ever going to eliminate unsatisfying jobs from the world, bots that can figure out what our crazily imprecise language patterns mean are going to have to play a role.

No joke, this is what my Bellevue workbench looks like right now, today.

But heads up, this post isn’t about bots at all. You know that thing where you want to do a project, but you can’t do the project until the workbench is clean, but you can’t clean up the workbench until you finish the painting job sitting on the bench, but you can’t finish that job until you go to the store for more paint, but you can’t go to the store until you get gas for the car? Yeah, that’s me.

My plan was to write a bot for Microsoft Teams that could interact with ShutdownRadio and make it more natural/engaging for folks that use Teams all day for work anyways. But it seemed really silly to do all of that work in Azure and then call out to a dumb little web app running on my ancient Rackspace VM. So that’s how I got back to implementing ShutdownRadio using Azure Functions. And while it was generally not so bad this time around, there were enough gotchas that I thought I’m immortalize them for Google here before diving into the shiny new fun bot stuff. All of which is to say — this post is probably only interesting to you if you are in fact using Google right now to figure out why your Azure code isn’t working. You have been warned.

A quick recap of the app

The idea of ShutdownRadio is for people to be able to curate and listen to (or watch I suppose) YouTube playlists “in sync” from different physical locations. There is no login and anyone can add videos to any channel — but there is also no list of channels, so somebody has to know the channel name to be a jack*ss. It’s a simple, bare-bones UX — the only magic is in the synchronization that ensures everybody is (for all practical purposes) listening to the same song at the same time. I talked more about all of this in the original article, so won’t belabor it here.

For your listening pleasure, I did migrate over the “songs by bands connected in some way to Seattle” playlist that my colleagues at Adaptive put together in 2020. Use the channel name “seattle” to take it for a spin; there’s some great stuff in there!

Moving to Azure Functions

The concept of Azure Functions (or AWS Lambda) is pretty sweet — rather than deploying code to servers or VMs directly, you just upload “functions” (code packages) to the cloud, configure the endpoints or “triggers” that allow users to execute them (usually HTTP URLs), and let your provider figure out where and how to run everything. This is just one flavor of the “serverless computing” future that is slowly but surely becoming the standard for everything (and of course there are servers, they’re just not your problem). ShutdownRadio exposes four of these functions:

  • /home simply returns the static HTML page that embeds the video player and drives the UX. Easy peasy.
  • /channel returns information about the current state of a channel, including the currently-playing video.
  • /playlist returns all of the videos in the channel.
  • /addVideo adds a new video to the channel.

Each of these routes was originally defined in Handlers.java as HttpHandlers, the construct used by the JDK internal HttpServer. After creating the Functions project using the “quickstart” maven archetype, lifting these over to Azure Functions in Functions.java was pretty straightforward. The class names are different, but the story is pretty much the same.

Routes and Proxies

My goal was to make minimal changes to the original code — obviously these handlers needed to change, as well as the backend store (which we’ll discuss later), but beyond that I wanted to leave things alone as much as possible. By default Azure Functions prepend “/api/” to HTTP routes, but I was able to match the originals by turfing that in the host.json configuration file:

"extensions": {
       "http": {
             "routePrefix": ""
       }
}

A trickier routing issue was getting the “root” page to work (i.e., “/” instead of “/home“). Functions are required to have a non-empty name, so you can’t just use “” (or “/” yes I tried). It took a bunch of digging but eventually Google delivered the goods in two parts:

  1. Function apps support “proxy” rules via proxies.json that can be abused to route requests from the root to a named function (note the non-obvious use of “localhost” in the backendUri value to proxy routes to the same application).
  2. The maven-resources-plugin can be used in pom.xml to put proxies.json in the right place at packaging time so that it makes it up to the cloud.

Finally, the Azure portal “TLS/SSL settings” panel can be used to force all requests to use HTTPS. Not necessary for this app but a nice touch.

All of this seems pretty obscure, but for once I’m inclined to give Microsoft a break. Functions really aren’t meant to implement websites — they have Azure Web Apps and Static Web Apps for that. In this case, I just preferred the Functions model — so the weird configuration is on me.

Moving to Cosmos

I’m a little less sanguine about the challenges I had changing the storage model from a simple directory of files to Cosmos DB. I mean, the final product is really quite simple and works well, so that’s cool. But once again I ran into lazy client library issues and random inconsistencies all along the way.

There are a bunch of ways to use Cosmos, but at heart it’s just a super-scalable NoSQL document store. Honestly I don’t really understand the pedigree of this thing — back in the day “Cosmos” was the in-house data warehouse used to do analytics for Bing Search, but that grew up super-organically with a weird, custom batch interface. I can’t imagine that the public service really shares code with that dinosaur, but as far as I can tell it’s not a fork of any of the big open source NoSQL projects either. So where did it even come from — ground up? Yeesh, only at Microsoft.

Anyhoo, after creating a Cosmos “account” in the Azure portal, it’s relatively easy to create databases (really just namespaces) and containers within them (more like what I could consider databases, or maybe big flexible partitioned tables). Containers hold items which natively are just JSON documents, although they can be made to look like table rows or graph elements with the different APIs.

Access using a Managed Identity

One of the big selling points (at least for me) of using Azure for distributed systems is its support for managed identities. Basically each service (e.g., my Function App) can have its own Active Directory identity, and this identity can be given rights to access other services (e.g., my Cosmos DB container). These relationships completely eliminate the need to store and manage service credentials — everything just happens transparently without any of the noise or risk that comes with traditional service-to-service authentication. It’s beautiful stuff.

Of course, it can be a bit tricky to make this work on dev machines — e.g., the Azure Function App emulator doesn’t know squat about managed identities (it has all kinds of other problems too but let’s focus here). The best (and I think recommended?) approach I’ve found is to use the DefaultAzureCredentialBuilder to get an auth token. The pattern works like this:

  1. In the cloud, configure your service to use a Managed Identity and grant access using that.
  2. For local development, grant your personal Azure login access to test resources — then use “az login” at the command-line to establish credentials on your development machine.
  3. In code, let the DefaultAzureCredential figure out what kind of token is appropriate and then use that token for service auth.

The DefaultAzureCredential iterates over all the various and obtuse authentication types until it finds one that works — with production-class approaches like ManagedIdentityCredential taking higher priority than development-class ones like AzureCliCredential. Net-net it just works in both situations, which is really nice.

Unfortunately, admin support for managed identities (or really any role-based access) with Cosmos is just stupid. There is no way to set it up using the portal — you can only do it via the command line with the Azure CLI or Powershell. I’ve said it before, but this kind of thing drives me absolutely nuts — it seems like every implementation is just random. Maybe it’s here, maybe it’s there, who knows … it’s just exhausting and inexcusable for a company that claims to love developers. But whatever, here’s a snippet that grants an AD object read/write access to a Cosmos container:

az cosmosdb sql role assignment create \
       --account-name 'COSMOS_ACCOUNT' \
       --resource-group 'COSMOS_RESOURCE_GROUP' \
       --scope '/dbs/COSMOS_DATABASE/colls/COSMOS_CONTAINER' \
       --principal-id 'MANAGED_IDENTITY_OR_OTHER_AD_OBJECCT' \
       --role-definition-id '00000000-0000-0000-0000-000000000002'

The role-definition id there is a built-in CosmosDB “contributor” role that grants read and write access. The “scope” can be omitted to grant access to all databases and containers in the account, or just truncated to /dbs/COSMOS_DATABASE for all containers in the database. The same command can be used with your Azure AD account as the principal-id.

Client Library Gotchas

Each Cosmos Container can hold arbitrary JSON documents — they don’t need to all use the same schema. This is nice because it meant I could keep the “channel” and “playlist” objects in the same container, so long as they all had unique identifier values. I created this identifier by adding an internal “id” field on each of the objects in Model.java — the analog of the unique filename suffix I used in the original version.

The base Cosmos Java API lets you read and write POJOs directly using generics and the serialization capabilities of the Jackson JSON library. This is admittedly cool — I use the same pattern often with Google’s Gson library. But here’s the rub — the library can’t serialize common types like the ones in the java.time namespace. In and of itself this is fine, because Jackson provides a way to add serialization modules to do the job for unknown types. But the recommended way of doing this requires setting values on the ObjectMapper used for serialization, and that ObjectMapper isn’t exposed by the client library for public use. Well technically it is, so that’s what I did — but it’s a hack using stuff inside the “implementation” namespace:

log.info("Adding JavaTimeModule to Cosmos Utils ObjectMapper");
com.azure.cosmos.implementation.Utils.getSimpleObjectMapper().registerModule(new JavaTimeModule());

Side node: long after I got this working, I stumbled onto another approach that uses Jackson annotations and doesn’t require directly referencing private implementation. That’s better, but it’s still a crappy, leaky abstraction that requires knowledge and exploitation of undocumented implementation details. Do better, Microsoft!

Pop the Stack

Minor tribulations aside, ShutdownRadio is now happily running in Azure — so mission accomplished for this post. And when I look at the actual code delta between this version and the original one, it’s really quite minimal. Radio.java, YouTube.java and player.html didn’t have to change at all. Model.java took just a couple of tweaks, and I could have even avoided those if I were being really strict with myself. Not too shabby!

Now it’s time to pop this task off of the stack and get back to the business of learning about bots. Next stop, ShutdownRadio in Teams …and maybe Skype if I’m feeling extra bold. Onward!