It’s Always a Normalization Problem

Heads up, this is another nerdy one! ShareToRoku is available on the Google Play store. All of the client and server code is up on my github under MIT license; I hope folks find it useful and/or interesting.

Algorithms are the cool kids of software engineering. We spend whole semesters learning to sort and find stuff. Spreadsheet “recalc” engines revolutionized numeric analysis. Alignment algorithms power advances in biotechnology.  Machine learning algorithms impress and terrify us with their ability to find patterns in oceans of data. They all deserve their rep!

But as great as they are, algorithms are hapless unless they receive inputs in a format they understand — their “model” of the world. And it turns out that these models are really quite strict — data that doesn’t fit exactly can really gum up the works. As engineers we often fail to appreciate just how “unnatural” this rigidity is. If I’m emptying the dishwasher, finding a spork amongst the silverware doesn’t cause my head to explode — even if there isn’t a “spork” section in the drawer (I probably just put it in with the spoons). Discovering a flip-top on my toothpaste rather than a twist cap really isn’t a problem. I can even adapt when the postman leaves packages on top of the package bin, rather than inside of it. Any one of these could easily stop a robot cold (so lame).

It’s easy to forget, because today’s models are increasingly vast and impressive, and better every day at dealing with the unexpected. Tesla’s Autopilot can easily be mistaken for magic — but as all of us who have trusted it to exit 405 North onto NE 8th know, the same weaknesses are still hiding in there under the covers. But that’s another story.

Anyhoo, the point is that our algorithms are only useful if we can feed them data that fits their models. And the code that does that is the workhorse of the world. Maybe not the sexiest stuff out there, but almost every problem we encounter in the real world boils down to data normalization. So you’d better get good at it.

Share to Roku (Release 6)

My little Android TV-watching app is a great (in miniature) example of this dynamic at work. If you read the original post, you’ll recall that it uses the  Android “share” feature to launch TV shows and movies on a Roku device. For example, you can share from the TV Time app to watch the latest episode of a show, or launch a movie directly from its review at the New York Times. Quite handy, but it turns out to be pretty hard to translate from what apps “share” to something specific enough to target the right show. Let’s take a look.

First, the “algorithm” at play here is the code that tells the Roku to play content. We use two methods of the Roku ECP API for this:

  • Deep Linking is ideal because it lets us launch a specific video on a specific channel. Unfortunately the identifiers used aren’t standard across channels, and they aren’t published — it’s a private language between Roku and their channel providers. Sometimes we can figure it out, though — more on this later.
  • Search is a feature-rich interface for jumping into the Roku search interface. It allows the caller to “hint” the search with channel identifiers and such, and in certain cases will auto-start the content it finds. But it’s hard to make it do the right thing. And even when it’s working great it won’t jump to specific episodes, just seasons.
public class RokuSearchInfo
{
public static class ChannelTarget
{
public String ChannelId;
public String ContentId;
public String MediaType;
}
public String Search;
public String Season;
public String Number;
public List<ChannelTarget> Channels;
}

Armed with this data, it’s pretty easy to slap together the optimal API request. You can see it happening in ShareToRokuActivity.resolveAndSendSearch — in short, if we can narrow down to a known channel we try to launch the show there, otherwise we let the Roku search do its best. Getting that data in the first place is where the magic really happens.

A Babel of Inputs

The Android Sharesheet is a pretty general-purpose app-to-app sharing mechanism, but in practice it’s mostly used to share web pages or social media content through text or email or whatever. So most data comes through as unstructured text, links and images. Our job is to make sense of this and turn it into the most specific show data we can. A few examples:

App / SourceShared DataIdeal Target
1. TV Time Episode PageShow Me the Love on TV Time https://tvtime.com/r/2AID4“Trying” Season 1 Episode 6 on AppleTV+
2. Chrome nytimes.com Movie Review (No text selection)https://www.nytimes.com/2022/11/22/movies/strange-world-review.html“Strange World” on Disney+
3. Chrome Wikipedia page (movie title selected)“Joe Versus the Volcano”  https://en.wikipedia.org/wiki/Joe_Versus_the_Volcano#:~:text=Search-,Joe%20Versus%20the%20Volcano,-Article“Joe Versus the Volcano” on multiple streaming services
4. YouTube Videohttps://youtu.be/zH14EyiSlas“When you say nothing at all” cover by Reina del Cid on YouTube
5. Amazon Prime MovieHey I’m watching Black Adam. Check it out now on Prime Video! https://watch.amazon.com/detail?gti=amzn1.dv.gti.1a7638b2-3f5e-464a-a271-07c2e2ec1f8c&ref_=atv_dp_share_mv&r=web“Black Adam” on Amazon Prime
6. Netflix Series PageSeen “Love” on Netflix yet?   https://www.netflix.com/us/title/80026506?s=a&trkid=13747225&t=more&vlang=en&clip=80244686“Love” Season 1 Episode 1 on Netflix
7. Search text entered directly into ShareToRokuProject Runway Season 5“Project Runway” Season 5 on multiple streaming services.

Pipelines and Plugins

All but the simplest normalization code typically breaks down into a collection of rules, each targeted at a particular type of input. The rules are strung together into a pipeline, each doing its little bit to clean things up along the way. This approach makes it easy to add new rules into the mix (and retire obsolete ones) in a modular, evolutionary way.

After experimenting a bit (a lot), I settled on a two-phase approach to my pipeline:

  1. Iterate over a list of “parsers” until one reports that it understands the basic format of the input data.
  2. Iterate over a list of “refiners” that try to enhance the initial model by cleaning up text, identifying target channels, etc.

Each of these is defined by a standard Java interface and strung together in SearchController.java. A fancier approach would be to instantiate and order the implementations through configuration, but that seemed like serious overkill for my little hobby app. If you’re working with a team of multiple developers, or expect to be adding and removing components regularly, that calculus probably looks a bit different.

This split between “parsers” and “refiners” wasn’t obvious at first. Whenever I face a messy normalization problem, I start by writing a ton of if/then spaghetti, usually in pseudocode. That may seem backwards, but it can be hard to create an elegant approach until I lay out all the variations on the table. Once that’s in front of me, it becomes much easier to identify commonalities and patterns that lead to an optimal pipeline.

Parsers

Parsers” in our use case recognize input from specific sources and extract key elements, such as the text most likely to represent a series name. As of today there are three in production:

TheTVDB Parser (Lookup.java)

TV Time and a few other apps are powered by TheTVDB, a community-driven database of TV and movie metadata. The folks there were nice enough to grant me access to the API, which I use to recognize and decode TV Time sharing URLs (example 1 in the table). This is a four step process:

  1. Translate the short URL into their canonical URL. E.g., the short URL in example 1 resolves to https://www.tvtime.com/show/375903/episode/7693526&pid=tvtime_android.
  2. Extract the series (375903) and/or episode (7693526) identifiers from the URL.
  3. Use the API to turn these identifiers into show metadata and translate it into a parsed result.
  4. Apply some final ad-hoc tweaks to the result before returning it.

All of this data is cached using a small SQLite database so that we don’t make too many calls directly to the API. I’m quite proud of the toolbox implementation I put together for this in CachingProxy.java, but that’s an article for another day.

UrlParser.java

UrlParser takes advantage of the fact that many apps send a URL that includes their own internal show identifiers, and often these internal identifiers are the same ones they use for “Deep Linking” with Roku. The parser is configured with entries that include a “marker” string — a unique URL fragment that identifies a particular — together with a Roku channel identifier and some extra sugar not worth worrying about. When the marker is found and an ID extracted, this parser can return enough information to jump directly into a channel. Woo hoo!

SyntaxParser.java

This last parser is kind of a last gasp that tries to clean up share text we haven’t already figured out. For example, it extracts just the search text from a Chrome share, and identifies the common suffix “SxEy” where x is a season and y is an episode number. I expect I’ll add more in here over time but it’s a reasonable start.

Refiners

Once we have the basics of the input — we’ve extracted a clean search string and maybe taken a first cut at identifying the season and channels —  a series of “refiners” are called in turn to improve the results. Unlike parsers which short-circuit after a match is found, all the refiners run every time.

WikiRefiner.java

A ton of the content we watch these days is created by the streaming providers themselves. It turns out that there are folks who keep lists of all these shows on Wikipedia (e.g., this one for Netflix). The first refiner simply loads up a bunch of these lists and then looks at incoming search text for exact matches. If one is found, the channel is added to the model.

As a side note, the channel is actually added to the model only if the user has that channel installed on their Roku (as passed up in the “channels” query parameter). The same show is often available on a number of channels, and it doesn’t make sense to send a Roku to a channel it doesn’t know about. If the show is available on multiple installed channels, the Android UX will ask the user to pick the one they prefer.

RokuSearchRefiner.java

Figuring out this refiner was a turning point for the app. It makes the results far more accurate, which of course makes sense since they are sourced from Roku itself. I’ve left the WikiRefiner in place for now, but suspect I can retire it with really no decrease in quality. The logs will show if that’s true or not after a few weeks.

In any case, this refiner passes the search text up to the same search interface used by roku.com. It is insanely annoying that this API doesn’t return deep link identifiers for any service other than the Roku Channel, but it’s still a huge improvement. By restricting results to “perfect” matches (confidence score = 1), I’m able to almost always jump directly into a channel when appropriate.

I’m not sure Roku would love me calling this — but I do cache results to keep the noise down, so hopefully they’ll just consider it a win for their platform (which it is).

FixupRefiner.java

At the very end of the pipeline, it’s always good to have a place for last-chance cleanup. For example, TVDB knows “The Great British Bake Off,” but Roku in the US knows it as “The Great British Baking Show.” This refiner matches the search string against a set of rules that, if found, allow the model to be altered in a manual way. These make the engineer in me feel a bit dirty, but it’s all part of the normalization game — the choice is whether to feel morally superior or return great results. Oh well, at least the rules are in their own configuration file.

Hard Fought Data == Real Value

This project is a microcosm of most of the normalization problems I’ve experienced over the years. It’s important to try to find some consistency and modularity in the work — that’s why pipelines and plugins and models are so important. But it’s just as important to admit that the real world is a messy place, and be ready to get your hands dirty and just implement some grotty code to clean things up.

When you get that balance right, it creates enormous differentiation for your solution. Folks can likely duplicate or improve upon your algorithms — but if they don’t have the right data in the first place, they’re still out of luck. Companies with useful, normalized, proprietary data sets are just always always always more valuable. So dig in and get ‘er done.

public RokuSearchInfo parse(String input, UserChannelSet channels) {
// 1. PARSE
String trimmed = input.trim();
RokuSearchInfo info = null;
try {
info = tvdbParser.parse(input, channels);
if (info == null) info = urlParser.parse(input, channels);
if (info == null) info = syntaxParser.parse(input, channels);
}
catch (Exception eParse) {
log.warning(Easy.exMsg(eParse, "parsers", true));
info = null;
}
if (info == null) {
info = new RokuSearchInfo();
info.Search = trimmed;
log.info("Default RokuSearchInfo: " + info.toString());
}
// 2. REFINE
tryRefine(info, channels, rokuRefiner, "rokuRefiner");
tryRefine(info, channels, wikiRefiner, "wikiRefiner");
tryRefine(info, channels, fixupRefiner, "fixupRefiner");
// 3. RETURN
log.info(String.format("FINAL [%s] -> %s", trimmed, info));
return(info);
}

Share to Roku!

TLDR: if you watch TV on a Roku and have an Android phone, please give my new Share To Roku app a try! It’s currently in open testing; install it with this link on the web or this link on your Android device. The app is free, has no ads, saves no data and only makes network calls to Rokus on your local network. It acts as a simple remote, but much more usefully lets you “Share” show names from the web or other apps directly to the Roku search interface. I use it with TV Time and it has been working quite well so far — but I need broader real-world testing and would really appreciate your feedback.

Oh user interface development, how I hate you so. But my lack of experience with true mobile development has become increasingly annoying, and I really wanted an app to drive my Roku. So let’s jump back into the world of input events and user interface layouts and see if we can get comfy. Yeesh.

Share To Roku in a nutshell

I’ve talked about this before (here and here). My goal is to transition as smoothly as possible from finding a show on my phone (an Android, currently the Samsung Galaxy S21) to watching it on my TV. I keep my personal watchlist on an app called TV Time and that’s key, but I also want to be able to jump from a recommendation in email or a review on the web. So feature #1 is to create a “share target” that can accept messages from any app.

Armed with this inbound search text, the app will help the user select their Roku, ensure the TV power is on (if that particular Roku supports it), and forward the search. The app then will land on a page hosting controls to help navigate the last mile to the show (including a nice swipe-enabled directional pad that IMNSHO is way better than the official Roku app). This remote control functionality will also be accessible simply by running the app on its own. And that’s about it. Easy peasy!

All of the code for Share To Roku is up on github. I’ll do a final clean-up on the code once the testing period is over, but everything I’ve written here is free to use and adopt under an MIT license, so please take anything you find useful.

Android Studio and “Kotlin”

If you’ve worked with me before, you know that my favorite “IDE” is Emacs; I build stuff from the command line; and I debug using logs and jdb. But for better or worse, the only realistic way to build for Android is to at least mostly use Android Studio, a customized version of IntelliJ IDEA (you can just use IntelliJ too but that’s basically the same thing). AStudio generates a bunch of boilerplate code for even the simplest of apps, and encourages you to edit it all in this weird overlapping sometimes-textual-sometimes-graphical mode that at least in my experience generally ensures a messy final product. I’m not going to spend this whole article complaining about it, but it is pretty stifling.

Love me a million docked windows, three-deep toolbars and controls on every edge of the screen!

Google would also really prefer that you drop Java and instead use their trendy sort-of-language “Kotlin” to build Android apps. I’ve played this Java pre-complier game before with Scala and Groovy, and all I can say is no thank you. I will never understand why people are so obsessed with turning code into a nest of side effects, just to avoid a few semicolons and brackets. At least for now they are grudgingly continuing to support Java development — so that’s where you’ll find me. On MY lawn, where I like it.

Android application basics

Components

The most important thing to understand about Android development is that you are not in charge of your process. There is no “main” and, while you get your own JVM in which to live, that process can come and go at pretty much any time. This makes sense — at least historically mobile devices have had to manage pretty limited memory and processing power, so the OS exerts a ton of control over the use of those resources. But it can be tricky when it comes to understanding state and threading in an app, and clearly a lot of bugs in the wild boil down to a lack of awareness here.

Instead of main, an Android app is effectively a big JAR that uses a manifest file to expose Component classes. The most common of these is an Activity, which is effectively represents one user interface screen within the app. Other components include various types of background process; I’m going to ignore them here. Share to Roku exposes two Activities, one for choosing a Roku and one for the search and remote interface. Each activity derives from an Android base class that defines a set of well-known entrypoints, each of which is called at different points in the process lifecycle.

Tasks and the Back Stack

But before we dig into those, two other important concepts: tasks and the back stack. This can get wildly complicated, but the super-basics are this:

  • A “task” is a thing you’re doing on the device. Most commonly tasks are born by opening an app from the home screen.
  • Each task maintains a “stack” of activities (screens). When you navigate to a new screen (e.g., open an email from a list of emails) a new activity is added to the top of the stack. When you hit the back button, the current (top) activity is closed and you return to the previous one.
  • Mostly each task corresponds to an app — but not always. For example, when you are in Chrome and you “share” a show to my app, a new Share To Roku activity is added to the Chrome task. Tasks are not the same as JVM processes!

Taken together, the general task/activity lifecycle starts to make sense:

  1. The user starts a new task by starting an app from the home screen.
  2. Android starts a JVM for that app and loads an instance of the class for the activity marked as MAIN/LAUNCHER in the manifest.
  3. The onCreate method of the activity is called.
  4. The user interacts with the ux. Maybe at some point they dip into another activity, in which case onPause/onResume and onStop/onStart are called as the new activity starts and finishes.
  5. When the activity is finished (the user hits the back button or closes the screen in some other way) the onDestroy method is called.
  6. When the system decides it’s a good time (e.g., to reduce memory usage), the JVM is shut down.

Of course, it’s not really that simple. For example, Android may just nuke your process at any time, without ever calling onDestroy — so you’ll need to put some thought into how and when to save persistent data. And depending on your configuration, existing activity instances may be “reused” (with a call to onNewIntent). But it’s a pretty good starting place.

Intents

Intents are the means by which users navigate between activities on an Android device. We’ve actually already seen an intent in action, in step #2 above — MAIN/LAUNCHER is a special intent that means “start this app from the beginning.” Intents are used for every activity-to-activity transition, whether that’s explicit (e.g., when an email app opens up a message details activity in response to a click in a message list) or implicit (e.g., when an app opens up a new, pre-populated text message without knowing which app the user has configured for SMS).

Share to Roku uses intents in both ways. Internally, after picking a Roku, ChooseRokuActivity.shareToRoku instantiates an intent to start the ShareToRokuActivity. Because that internal navigation is the only way to land on ShareToRokuActivity, its definition in the manifest sets the “exported” flag to false and doesn’t include any intent-filter elements.

Conversely, the entry for ChooseRokuActivity in the manifest sets “exported” to true and includes no less than three intent-filter elements. The first is our old friend MAIN/LAUNCHER, but the next two are more interesting. Both identify themselves as SEND/DEFAULT filters, which mark the activity as a target for the Android Sharesheet (which we all just know as “sharing” from one app to another). There are two of them because we are registering to handle both text and image content.

Wait, image content? This seems a little weird; surely we can’t send an image file to the Roku search API. That’s correct, but it turns out that when the TV Time app launches a SEND/DEFAULT intent, it registers the content type as an image. There is an image; a little thumbnail of the show, but there is also text included which we use for the search. There isn’t a lot of consistency in the way applications prepare their content for sharing; I foresee a lot of app-specific parsing in my future if Share To Roku gets any real traction with users.

ChooseRokuActivity

OK, let’s look a little more closely at the activities that make up the app. ChooseRokuActivity (code / layout) is the first screen a user sees; a simple list of Rokus found on the local network. Once the user makes a selection here, control is passed to ShareToRokuActivity which we’ll cover next.

The list is a ListView, which gives me another opportunity to complain about modern development. Literally every UX system in the world has a control for simple displays of text-based lists. Android’s ListView is just this — a nice, simple control to which you attach an Adapter that holds the data. But the Android Gods really would rather you don’t use it. Instead, you’re supposed to use RecyclerView, a fine but much more complicated view. It’s great for large, dynamic lists, but way too much for most simple text-based UX lists. This kind of judgy noise just bugs me — an SDK should make common things as easy as possible. Sorry not sorry, I’m using the ListView. Anyways, the list is wrapped in a SwipeRefreshLayout which provides the gesture and feedback to refresh the list by pulling down.

The activity populates the list of Rokus using static methods in Roku.java. Discovery is performed by UDP broadcast in Ssdp.java, a stripped down version of the discovery classes I wrote about extensively in Anyone out there? Service discovery with SSDP, WSD, other acronyms. The Roku class maintains a static (threadsafe) list of the Rokus it finds, and only searches again when asked to manually refresh. This is one of those places where it’s important to be aware of the process lifecycle; the list is cached as long as our JVM remains alive and will be used in any task we end up in.

Take a look at the code in initializeRokus and findRokus (called from onCreate). If we have a cache of Rokus, we populate it directly into the list for maximum responsiveness. If we don’t, we create an ActivityWorker instance that searches using a background thread. The trick here is that each JVM process has exactly one thread dedicated to managing all user interface interactions — only code running on that thread can touch the UX. So if another thread (e.g., our Roku search worker) needs to update user interface components (i.e., update the ListView), it needs help.

There are a TON of ways that people manage this; ActivityWorker is my solution. A caller implements an interface with two methods: doBackground is run on a background thread, and when that method completes, the code in doUx runs on the UI thread (thanks to Activity.runOnUiThread).  These two methods can share member variables (e.g., the “rokus” set) without worrying about concurrency issues — a nice clean wrapper for a common-but-typically-messy situation.

ShareToRokuActivity

The second activity (code / layout) has more UX going on, and I’ll admit that I appreciated the graphical layout tools in AStudio. Designing even a simple interface that squishes and stretches reasonably to fit on so many different device sizes can be a challenge. Hopefully I did an OK job, but testing with emulators only goes so far — we’ll see as I get a few more testers.

If the activity was started from a sharing operation, we pick up that inbound text as “extra” data that comes along with the Intent object (the data actually comes to us indirectly via ChooseRokuActivity, since that was the entry point). Dealing with this search text is definitely the most unpleasant part of the app, because it comes in totally random and often unhelpful forms. If Share To Roku is going to become a meaningfully useful tool I’m going to have to do some more investment here.

A rare rave from me — the Android Volley HTTP library (as used in Http.java) is just fantastic. It works asynchronously, but always makes its callback on the UX thread. That is, it does automatically what I had to do manually with ActivityWorker. Since most mobile apps are really just UX sitting atop some sort of HTTP API, this makes life really really easy. Love it!

The bulk of this activity is just buttons and lists that cause fire-and-forget calls to the Roku, except for the directional pad that takes up the center of the screen. CirclePad.java is a custom control (sorry, custom “View”) that lets the user click a center button and indicate direction with either clicks in the N-S-E-W regions or (way cooler) directional swipes. A long press on the control sends a backspace, which makes entering text on the TV a bit more pleasant. Building this control felt like a throwback to Windows 3.0 development. Set a clip region, draw some lines and circles and icons. The gesture recognition is simultaneously amazingly easy (love the “fling” handler) and oddly prehistoric (check out my manual identification of a “long” press).

Publishing to the Play store

Back in the mid 00’s I spent some time consulting for Microsoft on a project called Windows Marketplace (wow there is a Wikipedia article for everything). Marketplace was sponsored by the Windows marketing team as an attempt to highlight the (yes) shareware market, which had been basically decimated by cross-platform browser-based apps. It worked a lot like any other app store, with some nice features like secure backup of purchased license keys (still a thing with some software!!!). It served a useful role for a number of years — fun times with a neat set of people (looking at you Raj, Vikram, DeeDee, Paul, Matt and Susan) and excellent chaat in Emeryville.

Anyways, that experience gave me some insight into the challenges of running and monetizing a directory of apps developed by everyone from big companies to (hello) random individuals. Making sure the apps at least work some of the time and don’t contain viruses or some weirdo porn or whatever. It’s not easy — but Google and Apple really have really done a shockingly great job. Setting up account on the Play Console is pretty simple — I did have to upload an image of my official ID and pay a one-time $25 fee but that’s about it. The review process is painful because each cycle takes about three or four days and they often come back with pretty vague rejections. For example, “you have used a word you may not have the rights to use” … which word is, apparently, a secret? But I get it.

So anyways — my lovely little app is now available for testing. If you’ve got an Android device, please use the links below to give it a try. If you have an Apple device, I’m sorry for many reasons. I will definitely be doing some work to better manipulate inbound search strings to provide a better search result on the Roku. I’m a little torn as to whether I could just do that all in-app, or if I should publish an API that I can update more easily. Probably the latter, although that does create a dependency that I’m not super-crazy about. We’ll see.

Install the beta version of Share To Roku with this link on the web or this link on your Android device.

Anyone out there? Service discovery with SSDP, WSD, other acronyms.

Those few regular readers of this stuff may remember What should we watch tonight, in which I used the Roku API to build a little web app to manage my TV watchlist. Since then I’ve found TV Time, which is waaay better and even tells me how many days until the next season of Cobrai Kai gets here (51 as of this writing). But what it doesn’t do is launch shows automatically on my TV, and yes I’m lazy enough to be annoyed by that. So I’ve been planning a companion app that will let me “share” shows directly to my Roku using the same API I used a few months ago.

This time, I’d like the app to auto-discover the TV, rather than asking the user to configure its IP address manually. Seems pretty basic — the Roku ECP API describes how it uses “Simple Service Discovery Protocol” to enable just that. But man, putting together a reliable implementation turned out to be a bear, and sent me tumbling down a rabbit hole of “service discovery” that was both fascinating and frankly a bit appalling.

Come with me down that rabbit hole, and let’s learn how those fancy home devices actually try to find each other. It’s nerd-tastic!

Can I get your number?

99% of what happens on networks is conversations between two devices that already know each other, either directly by address (like a phone number), or by a name that they use to look up an address (like using a phone book). And 99% of the time this works great — between “google.com” and QR codes and the bookmark lists we’ve all built up, there’s rarely any need to even think about addresses. But once in awhile — usually when you’re trying to set up a printer or some other smarty-pants device on your home network — things get a bit more complicated.

Devices on your network are assigned a (basically) arbitrary address by your wifi router, and they don’t have a name registered anywhere, so how do other devices find them to start a conversation? It turns out that there are a pile of different ways — most of which involve either multicast or broadcast UDP messaging, two similar techniques that enable a device to initiate a conversation without knowing exactly who it’s talking to. Vox Clamantis in Deserto as it were.

Side note: for this post I’m going to limit examples to IPv4 addressing, because it makes my job a little easier. The same concepts generally apply to IPv6, except that there is no true “broadcast” with v6 because they figured out that multicast could do all the same things more efficiently, but close enough.

An IP broadcast message is received by all devices on the local network that are listening on a given port. Typically these use the special “limited broadcast” address 255.255.255.255 (there’s also a “directed” broadcast address for each subnet which could theoretically be routed to other networks, but that’s more detail than matters for us). An IP multicast message is similar, but is received only by devices that have subscribed to (or joined) the multicast’s special “group” address. Multicast addresses all have 1110 as their most significant bits, which translates to addresses from 224.0.0.0 to 239.255.255.255.

Generally, these messages are restricted to your local network — that is, routers don’t send them out onto the wider Internet (there are exceptions for complex corporate-style networks, but whatever). This is a good thing, because the cacophony of the whole world getting all of these messages would most definitely take down the Internet. It’s also safer, as we’ll see a bit later.

Roku and SSDP

OK, back to the main thread here. Per the ECP documentation, Roku devices use Simple Service Discovery Protocol for discovery. SSDP defines a multicast address (239.255.255.250) and port (1900), a set of messages using what old folks like me still call RFC 822 format, and two interaction patterns for discovery:

  1. A client looking for devices sends a multicast M-SEARCH message with the type ssdp:discover, setting the ST header to either ssdp:all (everybody respond!) or a specific service type string (the primary Roku type is roku:ecp). AFAIK there is no authoritative list of ST values, you just kind of have to know what they are.  Devices listening for these requests respond directly to the client with a unicast HTTP OK response that includes (thank you) addressing information.
  2. Clients can also listen on the same multicast address for NOTIFY messages of type ssdp:alive or ssdp:byebye; devices send these out when they are turned on and off. It’s a good way to keep a list of devices accurate, but implementations are spotty so it really needs to be used in combination with #1.

An SSDP client in Java

Seems simple, right? I mean, OK, the basics really are simple. But a robust implementation runs into a ton of nit-picky little gotchas that, all together, took me days to sort out. The end result is on github and can be built/tested on any machine with java, git and maven installed (be sure to fix up slashes on Windows):

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/toolbox
mvn clean package
java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.ServiceDiscovery \
    ssdp

This command line entrypoint just sends out an ssdp:discover message, displays information on all the devices that respond, and loops listening for additional notifications. Somebody on your network is almost sure to respond; in particular for Roku you can look for a line something like this:

+++ ALIVE: uuid:roku:ecp:2N006D062746 | roku:ecp | http://192.168.86.47:8060/ | (/192.168.86.47:1900)

Super cool! If you open up that URL in a browser you’ll see a bunch more detail about your Roku and the interfaces it supports.

Discovery Protocol Abuse

If you don’t see any responses at all, it’s likely that your firewall is blocking UDP messages either to or from port 1900 — my Linux Mint distribution does both by default. Mint uses UncomplicatedFirewall which means you can see blocking activity (as root) in /var/log/ufw.log and open up UDP port 1900 with commands like:

sudo ufw allow from any port 1900 to any proto udp
sudo ufw allow from any to any port 1900 proto udp

Before you do this, you should be aware that there is some potential for bad guys to do bad stuff — pretty unlikely, but still. Any protocol that can “amplify” one message into many (because multiple devices can respond) carries some risk of a denial-of-service attack. That can be very simple: a bad guy on your network just fires off a ton of M-SEARCH requests, prompting a flood of responses that overwhelm the network as a whole. Or it can be nastier: combined with “ip spoofing,” a bad guy can redirect amplified responses to an unsuspecting victim.

Really though, it’s pretty theoretical for a home network — routers don’t generally route these messages, so it’d have to be an inside job anyways. And once you’ve got a bad actor inside your network, they can probably do a lot more damage than just slowing it down. YMMV, but I’m not personally super-worried about this particular attack. Just please don’t confuse my blasé assessment here with the risks of the related Universal Plug-and-Play (UPnP) protocol, which are quite real.

Under the Covers

There is quite a bit to talk about in the code here. Most of the hard work is in UdpServiceDiscovery.java (I’ll explain this abstraction later), which uses two sockets and two worker threads:

Socket/Thread #1 (DISCOVERY) sends M-SEARCH requests and receives back HTTP OK responses. A request is sent when the thread first starts up and can be repeated either on demand or on a timer (by default every twenty minutes).

It’s key to understand that while the request here is a multicast message, the responses are sent back directly as a unicast. I didn’t implement the server side specifically, but you can see how this works in the automated tests — the responding device extracts the source address and port from the multicast and just replies with a standard UDP unicast message. This is important for us because only one process on a computer can actively listen for unicast messages on a port. And on many systems, somebody is probably already doing that on port 1900 (for example, the Windows service “SSDP Discovery”). So if we want to reliably hear HTTP OK responses, we need to be using an unused, automatically-assigned port for this socket.

Socket/Thread #2 (NOTIFICATION) is for receiving unsolicited NOTIFY multicasts. This socket uses joinGroup to register interest and must be opened on port 1900 to work correctly.

forEachUsefulInterface is an interesting little bit of code. It’s used both for sending requests and joining multicast groups, ensuring that the code works in a system that is connected to multiple network interfaces (typically not the case at home, but better safe than sorry). Remember that multicasts are restricted to a local network — so if you’re attached to multiple networks, you’ll need to send out one message on each of them. The realities of coordinating interfaces with addresses can get pretty complicated, but I think this gets it right. Let me know if you think I’ve missed something!

The class also tries to identify and ignore duplicate UDP messages. Dealing with dups just comes with the territory when working with UDP — and while the nature of the SSDP protocol means it generally doesn’t hurt anything to re-process them, it’s just icky. UdpServiceDiscovery tries to filter them out using message hashing and a FIFO queue of recently-received messages. You can tune this behavior (or turn it off) through config; default is a two-second lookback.  

Wait, is that Everyone? Enter WS-Discovery.

If you look closely you’ll see that UdpServiceDiscovery really isn’t specific to SSDP at all — all of the protocol-specific stuff is in Ssdp.java and transits through yet another class ServiceDiscovery.java. What the heck is going on here? The short story is that SSDP doesn’t return most printers, and Microsoft always needs to be special. The longer story requires a quick aside into the insanity that was “WS*”.

Back in 1999 and 2000, folks realized that HTTP would be great for APIs as well as web pages — and two very different approaches emerged. First by a few months was SOAP (and it’s fast-follower WSDL), which tried to be transport-independent (although 99.9% of traffic was over HTTP) and was all about defined, strongly typed interfaces. The foil to SOAP was REST — a much lighter and Internettish way to think about machine-to-machine interaction.

SOAP was big company, REST was scrappy startup. And nobody was more SOAPy than Microsoft. They had a whole group (I’m looking at you Bill, and your buddy John too) that did nothing but make up abstract, overly-complicated, insane SOAP-based “standards” informally known as “WS*” that nobody understood or needed. Seriously, just check out this poster (really, click that link, zoom in and scroll for awhile, it’s shocking). Spoiler alert: REST crushed it.

Anyways — one of these beasts was WS-Discovery, a protocol for finding devices on a network that does exactly the same thing as SSDP. Not “generally the same thing,” but exactly the same thing. The code that works for SSDP works for WSD too, just swap out the HTTP-style metadata for XML. Talk about reinventing the wheel, yeesh. But at least this explains the weird object hierarchy in my discovery classes:

Since these all use callback interfaces and sometimes you just want an answer, I added OneShotServiceDiscovery that wraps up Ssdp and Wsd like this (where “4” below is the number of seconds to wait for UDP responses to come in):

Set<ServiceInfo> infosSSD = OneShotServiceDiscovery.ssdp(4);
Set<ServiceInfo> infosWSD = OneShotServiceDiscovery.wsd(4);

There’s an entrypoint for this too, so to get a WSD device list you can use (the example is my Epson ET-3760):

java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.OneShotServiceDiscovery \
    wsd
...
urn:uuid:cfe92100-67c4-11d4-a45f-e0bb9e278967 | wsdp:Device wscn:ScanDeviceType wprt:PrintDeviceType | http://192.168.5.228:80/WSD/DEVICE | (/192.168.5.228:3702)

Actually, a full WSD implementation is more complicated than this. The protocol defines a “discovery proxy” — a device on the network that can cache device information and reduce network traffic. A proxy advertises itself by sending out HELLO messages with type d:DiscoveryProxy; clients are supposed to switch over to use this service when it’s present. So so much complexity for so so little benefit. No thanks.

Don’t forget the broadcast bunch

And we’re still not done. SSDP and WSD cover a bunch of devices and services, but they still miss a lot. Most of these use some sort of custom broadcast approach. If you poke around in UdpServiceDiscovery you’ll find a few special case bits to handle the broadcast case — we disable the NOTIFICATION thread altogether, and just use the DISCOVERY thread/socket to send out pings and listen for responses. The Misc class provides an entrypoint for this; you can find my Roomba using broadcast port 5678 like this:

java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.Misc \
    255.255.255.255 5678 irobotmcs
...
============ /192.168.4.48:5678
{"ver":"3","hostname":"Roomba-3193C60472324700","robotname":"Bellvoomba","ip":"192.168.4.48","mac":"80:91:33:9D:E2:16","sw":"v2.4.16-126","sku":"R960020","nc":0,"proto":"mqtt","cap":{"pose":1,"ota":2,"multiPass":2,"pp":1,"binFullDetect":1,"langOta":1,"maps":1,"edge":1,"eco":1,"svcConf":1}}

Sometimes you don’t even need a ping. For example, devices that use thes Tuya platform just sit there and constantly broadcast their presence on port 6666 or 6667:

java -cp target/toolbox-1.0-SNAPSHOT.jar \
    com.shutdownhook.toolbox.discovery.Misc \
    255.255.255.255 6667 tuya
...
============ / 192.168.5.253:60913
????1r???8?W??⌂???▲????H??? _???r?9???3?o*?jwz#?$?Z?H?¶?Q??9??r~  ?U

OK, that’s not super useful — it’s my smart ceiling fan, but apparently not all of their devices broadcast on 6666, and the messages on port 6667 are encrypted (using a global key, duh — this code shows how to decrypt them). This kind of thing annoys me because it doesn’t really secure anything and just makes life harder for everyone. I’m going to register my protest by not writing that code myself; that’ll show them.

In any case, you get the point — there are a lot of ways that devices try to make themselves discoverable. I’ve even seen code that just fully scans the network — mini wardialers that check every possible address for specific open ports (an approach that won’t survive the eventual v4-v6 address transition!). It’d be nice if this was more standardized, but I’m happy to live with a little chaos in return for the innovations that pop up every day. It’ll settle out eventually. Maybe.

Now back to that Roku app, which will use something like 5% of the code I wrote for this post. Just one of the reasons I’m a fan of retirement — I can burn cycles on any rabbit hole I damn well please. Getting the code right can be tricky, so perhaps it’ll prove useful to some other nerd out there. And as always, please let me know if you find a bug!

What should we watch tonight?

Just before Thanksgiving, my wife had a full knee replacement. If you haven’t seen one of these, the engineering is just amazing. They’re flex-tested over 10 million cycles. L’s implant even has online installation instructions. And her doc is a machine, doing four to five surgeries in a typical day. It’s crazy.

But for all that awesome, the process still sucks. There is a ton of pain, and it’s a long, tough recovery. I am incredibly impressed with the way that L handled it all. My job was to try and make it bearable, part of which meant being Keeper of the Big Whiteboard of Lists.

I’ve blurred out all the medical junk in that image, but left what clearly became the most important part — the TV watchlist. Despite having effectively the entire media catalog of the world at our fingertips, somehow we are always the couple on the couch swapping “I dunno, what do you want to watch?” And when we do finally make a decision, who the heck can remember which of our two dozen streaming channels that particular show lives on?

A few months on from the surgery, the whiteboard is back in storage, but the TV list is more relevant than ever. We watch shows with our far-away adult kids too, which has the same issues times ten. All in all, a great excuse to dive into some new technology. Our household is pretty much all-in on Roku, so that’s where I chose to start.

If you just want the punchline, click over to Roku Sidecar. If you’re interested in the twists and turns and neat stuff I learned along the way, read on.

TV apps are still mostly dumb

I last poked at built-for-TV apps back around 2003, when my friend Chad S. was trying to start up a “new media” company. I originally met Chad at drugstore.com, where he was telling us all that blueberry extract was going to change the world (he’s an idea guy — kind of a smarter version of Michael Keaton in the movie Night Shift). At that point cable set-top boxes were the primary way folks watched TV, and they were at least theoretically codable. Chad’s concept was to build “overlay” apps that wrapped extra content around videos — kind of like WatchParty does with its chat interface today. It was a great idea and super-fun to explore, but back then there just wasn’t enough CPU, memory and (most importantly) bandwidth to make it happen. The development environment was also super-proprietary, expensive, and awful. Ah well.

Fast forward to today. My first idea was to build a channel (app) for the Roku to display a shared watchlist. Because it’s still (and will probably always be) a pain to work with text and even moderately complex user interfaces on the TV, I was imagining a two-part application: a web app for managing the list, and a TV app for browsing, simple filtering and launching shows. After a little while playing with the web side, I realized it was going to be way more efficient to just use a Google Sheet instead of a custom app — it already has the right ux and sharing capabilities way better than anything I was going to write.

That left the TV app. Roku has a robust developer story, but it’s clear that their first love is for content providers, not app developers. The reason so many Roku apps look the same is that they are created with Direct Publisher. Rather than write any code, you just provide a JSON-formatted metadata feed for your videos, host them somewhere, and you get a fully-functional channel with the familiar overview and detail pages, streaming interface, and so on. It’s very nice and now I understand why so many tiny outfits are able to create channels.

On the other hand, for apps that need custom layout and behaviors, the SDK story is a lot more clunky — like writing an old Visual Basic Forms application. User interface elements are defined with an XML dialect called SceneGraph; logic is added using a language they invented called BrightScript. There is a ton of documentation and a super-rich set of controls for doing video-related stuff, but if you stray away from that you start to find the rough edges. For example, you cannot launch from one Roku channel into another one — which put a quick end to my on-TV app concept.

Of course this makes perfect business sense; video is their bread and butter and it’s why I own a bunch of their devices. I’m not knocking them for it, it’s just a little too bad for folks trying to push the envelope. In any case, time well spent — it’s always entertaining to dig under the covers of technologies we use every day. If you want to poke around, this page has everything you need. Honestly, my favorite part was the key combination to enable developer settings — takes me back to Mortal Kombat:

Roku ECP saves the day

But wait! It turns out that Roku has another interface — the “External Control Protocol” at http://ROKU_ADDR:8060. It seems a little sketch that any device on the local network can just lob off unauthenticated commands at my Roku, but I guess the theory is that worst case I get a virus that makes me watch Sex and the City reruns? Security aside, it’s super-handy and I’m glad I ran into it before giving up on my app.

In a nutshell, ECP lets an app be a Roku remote control. You can press buttons, launch channels, do searches, send keystrokes, that kind of thing. There are also commands to query the state of the Roku — for example, get a list of installed channels or device details. It’s primarily meant for mobile apps, but it works for web pages too, with a few caveats:

  1. Because requests must originate from the local network, you can only make calls from the browser (unless you’re running a local web server I guess).
  2. But, the Roku methods don’t set CORS headers to allow cross-origin resource sharing. This is frankly insane — it means that the local web page has to jump through hoops just to make calls, and (much) worse can’t actually read the content returned. This renders methods like /query/apps inaccessible, which is super-annoying.
  3. It’s also kind of dumb that they only support HTTP, because it means that the controlling web page has to be insecure too, but whatever.

Probably I should have just written a mobile app. But I really wanted something simple to deploy and with enough real estate to comfortably display a reasonably-sized list, so I stuck to the web. The end result seems pretty useful — the real test will be if the family is still using it a few months from now. We’ll see.

Roku Sidecar

OK, let’s see how the app turned out. Bop on over to Roku Sidecar and enter your Roku IP Address (found under Settings / Network / About on the TV). This and all other settings for the app are purely local — you’re loading the web page from my server, but that’s it — no commands or anything else are sent or saved there.  

The first thing you’ll notice is the blue bar at the top of the page. The controls here just drive your Roku like your remote. Most immediately useful is the search box which jumps into Roku’s global search. This landing page is where the rest of the buttons come in handy — I certainly wouldn’t use this page instead of my remote, but when I’m searching for something and just need a few clicks to get to my show, it works great.

You can also just launch channels using the second dropdown list. Unfortunately, the issue about CORS headers above means I can’t just get a list of the channels installed on your Roku — instead I just picked the ones I think most people use. This is awful and the second-worst thing about the app (number one will come up later). That said, my list covers about 90% of my use of the Roku; hopefully it’ll be similar for you.

To use the actual watchlist features, you’ll need to hook the page up to a Google Sheet. You can actually use any openly-available TSV (tab-separated values) file, but Google makes it super-easy to publish and collaboratively edit, so it’s probably the best choice. The sheet should have four columns, in this order:

  • Show should be the full, official name so Roku finds it easily.
  • Channel is optional but helps us jump directly to shows.
  • Tag is any optional string that groups shows; used for filtering the list.
  • Notes is just to help you remember what’s what.

Publish your sheet as a TSV file by choosing File / Share / Publish to Web, change the file type dropdown from “Web page” to “Tab-separated values (.tsv)”, and copy the resulting URL (note this technically makes the data available to anybody with that URL). Paste the URL into the “Watchlist URL” box back on Sidecar and click Update. If all goes well, you should see your shows displayed in a grid. Use the buttons and checkboxes on the right to show/hide shows by their tag, and click the name of the show to start watching!

If you’ve listed one of the known channels with the show, most of the time it will just start playing, or at least drop you on the show page within the channel. Whether this happens or not is up to Roku — the ECP command we send says “Search for a show with this name on this channel, and if you find an exact match start playing it.” If it doesn’t find an exact match, you’ll just land on the Roku search page, hopefully just a click or two away from what you want.

This “maybe it’ll work” behavior is the most annoying thing about the app, at least to me. The dumb thing is, there’s actually an API specifically made to jump to a specific title in a channel. But Roku provides no way for an app to reliably find the contentId parameter that is used to specify the show! There is already a public Roku search interface; how hard would it be to return these results as JSON with content and channel ids? I messed around with scraping the results but just couldn’t make it work well. Bummer. Separately, it seems like YouTube TV isn’t playing nice with the Roku search, as queries for that channel seem to fail pretty much all the time.

All in all, it’s a pretty nice and tidy little package. I particularly like how everybody in the family can make edits, and use tags to keep things organized without stepping on each other’s entries.

A quick look at the code

roku.html is the single-page-app that drives the ux. To be honest, the CSS and javascript here are pretty crazy spaghetti. It’s the final result of a ton of experimentation, and I haven’t taken the time to refactor it well. My excuse is that I have some other writing I need to get to, but really I just freaking hate writing user interfaces and that’s what most of this app is. Please feel free to take the code and make it better if you’d like!

This is the first time I’ve used localStorage rather than cookies in an app. I’m not sure why — it’s way more convenient for use in Javascript. Of course, these values don’t get sent to the server automatically, so it doesn’t work for everything, but I’m glad to have added it to my toolkit.

callroku.js is ok and would be easy to drop into your own applications pretty much as-is. It’s written from scratch, but as the header of the file indicates I did take the hidden post approach from the great work done by A. Cassidy Napoli over at http://remoku.tv. Because the Roku doesn’t set CORS headers, normal ajax calls fail. We avoid this by adding a hidden form to the DOM, which targets a hidden iframe. To call the ECP interface we set the “action” parameter to the desired ECP url, and submit the form. Browser security rules say we can have a cross-origin iframe on the page, we just can’t see its data. That’s ok for our limited use case, so off we go.

And that’s about it! A useful little app that solves a real problem. Twists and turns aside, I do love it when a plan comes together. Until next time!