Shutdown Radio on Azure

Back about a year ago when I was playing with ShutdownRadio, I ranted a bit about my failed attempt to implement it using Azure Functions and Cosmos. Just to recap, dependency conflicts in the official Microsoft Java libraries made it impossible to use these two core Azure technologies together — so I punted. I planned to revisit an Azure version once Microsoft got their sh*t together, but life moved on and that never happened.

Separately, a couple of weeks ago I decided I should learn more about chatbots in general and the Microsoft Bot Framework in particular. “Conversational” interfaces are popping up more and more, and while they’re often just annoyingly obtuse, I can imagine a ton of really useful applications. And if we’re ever going to eliminate unsatisfying jobs from the world, bots that can figure out what our crazily imprecise language patterns mean are going to have to play a role.

No joke, this is what my Bellevue workbench looks like right now, today.

But heads up, this post isn’t about bots at all. You know that thing where you want to do a project, but you can’t do the project until the workbench is clean, but you can’t clean up the workbench until you finish the painting job sitting on the bench, but you can’t finish that job until you go to the store for more paint, but you can’t go to the store until you get gas for the car? Yeah, that’s me.

My plan was to write a bot for Microsoft Teams that could interact with ShutdownRadio and make it more natural/engaging for folks that use Teams all day for work anyways. But it seemed really silly to do all of that work in Azure and then call out to a dumb little web app running on my ancient Rackspace VM. So that’s how I got back to implementing ShutdownRadio using Azure Functions. And while it was generally not so bad this time around, there were enough gotchas that I thought I’m immortalize them for Google here before diving into the shiny new fun bot stuff. All of which is to say — this post is probably only interesting to you if you are in fact using Google right now to figure out why your Azure code isn’t working. You have been warned.

A quick recap of the app

The idea of ShutdownRadio is for people to be able to curate and listen to (or watch I suppose) YouTube playlists “in sync” from different physical locations. There is no login and anyone can add videos to any channel — but there is also no list of channels, so somebody has to know the channel name to be a jack*ss. It’s a simple, bare-bones UX — the only magic is in the synchronization that ensures everybody is (for all practical purposes) listening to the same song at the same time. I talked more about all of this in the original article, so won’t belabor it here.

For your listening pleasure, I did migrate over the “songs by bands connected in some way to Seattle” playlist that my colleagues at Adaptive put together in 2020. Use the channel name “seattle” to take it for a spin; there’s some great stuff in there!

Moving to Azure Functions

The concept of Azure Functions (or AWS Lambda) is pretty sweet — rather than deploying code to servers or VMs directly, you just upload “functions” (code packages) to the cloud, configure the endpoints or “triggers” that allow users to execute them (usually HTTP URLs), and let your provider figure out where and how to run everything. This is just one flavor of the “serverless computing” future that is slowly but surely becoming the standard for everything (and of course there are servers, they’re just not your problem). ShutdownRadio exposes four of these functions:

  • /home simply returns the static HTML page that embeds the video player and drives the UX. Easy peasy.
  • /channel returns information about the current state of a channel, including the currently-playing video.
  • /playlist returns all of the videos in the channel.
  • /addVideo adds a new video to the channel.

Each of these routes was originally defined in Handlers.java as HttpHandlers, the construct used by the JDK internal HttpServer. After creating the Functions project using the “quickstart” maven archetype, lifting these over to Azure Functions in Functions.java was pretty straightforward. The class names are different, but the story is pretty much the same.

Routes and Proxies

My goal was to make minimal changes to the original code — obviously these handlers needed to change, as well as the backend store (which we’ll discuss later), but beyond that I wanted to leave things alone as much as possible. By default Azure Functions prepend “/api/” to HTTP routes, but I was able to match the originals by turfing that in the host.json configuration file:

"extensions": {
       "http": {
             "routePrefix": ""
       }
}

A trickier routing issue was getting the “root” page to work (i.e., “/” instead of “/home“). Functions are required to have a non-empty name, so you can’t just use “” (or “/” yes I tried). It took a bunch of digging but eventually Google delivered the goods in two parts:

  1. Function apps support “proxy” rules via proxies.json that can be abused to route requests from the root to a named function (note the non-obvious use of “localhost” in the backendUri value to proxy routes to the same application).
  2. The maven-resources-plugin can be used in pom.xml to put proxies.json in the right place at packaging time so that it makes it up to the cloud.

Finally, the Azure portal “TLS/SSL settings” panel can be used to force all requests to use HTTPS. Not necessary for this app but a nice touch.

All of this seems pretty obscure, but for once I’m inclined to give Microsoft a break. Functions really aren’t meant to implement websites — they have Azure Web Apps and Static Web Apps for that. In this case, I just preferred the Functions model — so the weird configuration is on me.

Moving to Cosmos

I’m a little less sanguine about the challenges I had changing the storage model from a simple directory of files to Cosmos DB. I mean, the final product is really quite simple and works well, so that’s cool. But once again I ran into lazy client library issues and random inconsistencies all along the way.

There are a bunch of ways to use Cosmos, but at heart it’s just a super-scalable NoSQL document store. Honestly I don’t really understand the pedigree of this thing — back in the day “Cosmos” was the in-house data warehouse used to do analytics for Bing Search, but that grew up super-organically with a weird, custom batch interface. I can’t imagine that the public service really shares code with that dinosaur, but as far as I can tell it’s not a fork of any of the big open source NoSQL projects either. So where did it even come from — ground up? Yeesh, only at Microsoft.

Anyhoo, after creating a Cosmos “account” in the Azure portal, it’s relatively easy to create databases (really just namespaces) and containers within them (more like what I could consider databases, or maybe big flexible partitioned tables). Containers hold items which natively are just JSON documents, although they can be made to look like table rows or graph elements with the different APIs.

Access using a Managed Identity

One of the big selling points (at least for me) of using Azure for distributed systems is its support for managed identities. Basically each service (e.g., my Function App) can have its own Active Directory identity, and this identity can be given rights to access other services (e.g., my Cosmos DB container). These relationships completely eliminate the need to store and manage service credentials — everything just happens transparently without any of the noise or risk that comes with traditional service-to-service authentication. It’s beautiful stuff.

Of course, it can be a bit tricky to make this work on dev machines — e.g., the Azure Function App emulator doesn’t know squat about managed identities (it has all kinds of other problems too but let’s focus here). The best (and I think recommended?) approach I’ve found is to use the DefaultAzureCredentialBuilder to get an auth token. The pattern works like this:

  1. In the cloud, configure your service to use a Managed Identity and grant access using that.
  2. For local development, grant your personal Azure login access to test resources — then use “az login” at the command-line to establish credentials on your development machine.
  3. In code, let the DefaultAzureCredential figure out what kind of token is appropriate and then use that token for service auth.

The DefaultAzureCredential iterates over all the various and obtuse authentication types until it finds one that works — with production-class approaches like ManagedIdentityCredential taking higher priority than development-class ones like AzureCliCredential. Net-net it just works in both situations, which is really nice.

Unfortunately, admin support for managed identities (or really any role-based access) with Cosmos is just stupid. There is no way to set it up using the portal — you can only do it via the command line with the Azure CLI or Powershell. I’ve said it before, but this kind of thing drives me absolutely nuts — it seems like every implementation is just random. Maybe it’s here, maybe it’s there, who knows … it’s just exhausting and inexcusable for a company that claims to love developers. But whatever, here’s a snippet that grants an AD object read/write access to a Cosmos container:

az cosmosdb sql role assignment create \
       --account-name 'COSMOS_ACCOUNT' \
       --resource-group 'COSMOS_RESOURCE_GROUP' \
       --scope '/dbs/COSMOS_DATABASE/colls/COSMOS_CONTAINER' \
       --principal-id 'MANAGED_IDENTITY_OR_OTHER_AD_OBJECCT' \
       --role-definition-id '00000000-0000-0000-0000-000000000002'

The role-definition id there is a built-in CosmosDB “contributor” role that grants read and write access. The “scope” can be omitted to grant access to all databases and containers in the account, or just truncated to /dbs/COSMOS_DATABASE for all containers in the database. The same command can be used with your Azure AD account as the principal-id.

Client Library Gotchas

Each Cosmos Container can hold arbitrary JSON documents — they don’t need to all use the same schema. This is nice because it meant I could keep the “channel” and “playlist” objects in the same container, so long as they all had unique identifier values. I created this identifier by adding an internal “id” field on each of the objects in Model.java — the analog of the unique filename suffix I used in the original version.

The base Cosmos Java API lets you read and write POJOs directly using generics and the serialization capabilities of the Jackson JSON library. This is admittedly cool — I use the same pattern often with Google’s Gson library. But here’s the rub — the library can’t serialize common types like the ones in the java.time namespace. In and of itself this is fine, because Jackson provides a way to add serialization modules to do the job for unknown types. But the recommended way of doing this requires setting values on the ObjectMapper used for serialization, and that ObjectMapper isn’t exposed by the client library for public use. Well technically it is, so that’s what I did — but it’s a hack using stuff inside the “implementation” namespace:

log.info("Adding JavaTimeModule to Cosmos Utils ObjectMapper");
com.azure.cosmos.implementation.Utils.getSimpleObjectMapper().registerModule(new JavaTimeModule());

Side node: long after I got this working, I stumbled onto another approach that uses Jackson annotations and doesn’t require directly referencing private implementation. That’s better, but it’s still a crappy, leaky abstraction that requires knowledge and exploitation of undocumented implementation details. Do better, Microsoft!

Pop the Stack

Minor tribulations aside, ShutdownRadio is now happily running in Azure — so mission accomplished for this post. And when I look at the actual code delta between this version and the original one, it’s really quite minimal. Radio.java, YouTube.java and player.html didn’t have to change at all. Model.java took just a couple of tweaks, and I could have even avoided those if I were being really strict with myself. Not too shabby!

Now it’s time to pop this task off of the stack and get back to the business of learning about bots. Next stop, ShutdownRadio in Teams …and maybe Skype if I’m feeling extra bold. Onward!