Alexa, ask Bellevue House…

This is part two of my smart light adventure; read part one here.

At the end of the last post, my orphaned Z-Wave lights (thanks again Wink) were working again under the control of a little web app I’d written for my phone. Individual devices were grouped into “virtual lights” that could then be switched on/off or dimmed, either individually or through coordinated “Settings” designed for a particular purpose like watching TV or taking a nap.

Handy, but still missing a key bit of functionality — I’ve gotten used to asking Alexa to turn on the lights in the morning and turn them off at night. Yes I’m that lazy. So my next challenge was to figure out how to wire up my bespoke solution to an Alexa skill. It turns out that Alexa has a whole API dedicated to smart home stuff, which seemed the obvious place to start, but I was quickly overwhelmed with the complexity and ran away screaming. Don’t get me wrong, it’s super-powerful and I appreciate how it enforces a consistent interface to a ton of diverse devices and system. It’s just that my scenario is super-simple; I just want to trigger pre-configured settings with statements like “Alexa, ask Bellevue House to set family room to nap.” For a problem this constrained, the most basic of skills did the trick.

Creating the Skill

Anyone can build Alexa skills and deploy them to their own devices for free. Getting them into the Alexa Skills Directory is more complicated, but not required for our scenario. Just be sure to sign up for the Alexa Developer Program using the same Amazon account that your devices are registered to. There is no cost to develop a skill or use it this way.

The first step is just to create the skill; below are the steps I used. You’ll see throughout that I was extremely lazy; this default template includes a bunch of “hello world” interactions that I haven’t removed — perhaps I’ll come back someday and clean all that up, but probably not.

  1. Log into the Alexa developer console (remember to use the same Amazon account used to register your Alexa devices).
  2. Click “Create Skill”.
  3. Give the skill a name, then choose the “Custom” model and the “Alexa-hosted (Node.js)” hosting option. Then click “Create skill” at the top right.
  4. Choose “Start from Scratch” and click “Continue with template.” This will take a minute or two to complete.

The Interaction Model

Next up is the “interaction model.” This is the grammar and request/response framework for the conversation with Alexa. Mine is very simple: “Alexa, ask Bellevue House to set family room to nap.” Even for this, there are a bunch of components at work:

  • “Alexa, ask Bellevue House to…” is the standard way Alexa interprets which skill you want to invoke. “Bellevue House” is the skill invocation name.
  • Each skill can include one or more intent. My skill only supports one intent, invoking a Setting on a Screen. I creatively named this “ScreenSettingIntent”.
  • “set family room to nap” is an utterance associated with the intent. Because there can be many ways to express the same intent, you can configure multiple utterances. In my case, I also added “turn kitchen off”.
  • Those bold phrases (“family room / kitchen” and “nap / off”) are specially tagged as slots within the utterance. These are the variables that tell the skill what to do. Each slot is assigned a slot type which describes what kind of content is likely to appear in the slot. There are tons of built-in slot types, or you can create your own. An important feature of slot types is that they are not closed vocabularies; sample values help train the model, but Alexa will attach whatever she hears even if it deviates from that training set.

At the end of the day, a successful invocation of the skill identifies the intent, attaches values to each of its slots, and triggers code to actually do whatever should be done. Before we look at that code, a quick cheat sheet for setting all of the values in our interaction model:

  1. Set the skill invocation name under Build / Invocations / Skill Invocation Name. (I used “Bellevue House” but that probably doesn’t work for you!)
  2. Under Slot Types choose Add Slot Type, set the name to SCREEN_NAME, click Next and add some sample values.
  3. Repeat this process for a new Slot Type called SETTING_NAME with appropriate samples.
  4. Under Intents choose Add Intent, set the name to “ScreenSettingIntent” and click Create Custom Intent.
  5. Scroll down to Intent Slots, and add two slots:
    1. “screenName” with the type “SCREEN_NAME”
    1. “settingName” with the type “SETTING_NAME”
  6. Scroll back up and add utterances with slot placeholders (the curly-braces mark the slots):
    1. “set {screenName} to {settingName}”
    1. “turn {screenName} {settingName}”
  7. Use the buttons at the top to “Save Model” and “Build Model”.

Handle the Intent

Each time Alexa needs to do something related to a skill, she makes a call to an HTTPS URL configured for that skill, passing request details as a JSON-formatted POST. This URL can live anywhere and be written in any language. When we created our skill, we chose the “Alexa-hosted (Node.js)” hosting option — under this model, Alexa allocates and hosts an AWS Lambda function for us for free, which seems awfully generous. We can edit the code for this function using the “Code” tab at the top of the Alexa developer console.

Remember I’m being lazy here; there is a bunch of boilerplate code auto-generated for us, and I’m just letting it all be. The important stuff for us is in index.js, a copy of which I’ve stashed for reference up at the ShutdownHook github. This file defines a few global utility objects at the top, then a whole bunch of handler functions, and then at the bottom wires the handlers up as “exports.handler”. Each time Alexa makes a request, this list of exports is scanned for a matching handler until one is found and executed.

The code I’ve added to handle our intent is in the function ScreenSettingIntentHandler, defined at line 10 and inserted into the handlers list at line 208. Don’t worry about the details quite yet — first we have to figure out how we’re going to communicate from this Alexa-hosted Lambda function all the way down to the home control web server we built last time.

Bringing the Outside Inside

We’re getting there, but there’s a networking problem we still need to solve. One that comes up a lot when building solutions that integrate Internet services with code running on a home network. Our personal routers are really good at getting us OUT onto the Internet, but they frown upon the Internet getting back IN to initiate communication with devices inside our houses. This is a very good thing of course — Internet security is scary enough without inviting hackers inside our private networks for drinks and conversation.

So hooray for routers and personal firewalls. But when you DO want an event on the outside to trigger something on the inside (say, for example, a notification from Alexa to turn on the lights), it can be a little challenging. It’s absolutely possible to create the necessary routes: tell your DHCP server to assign a static internal IP address to the device you care about; configure your router to pass through traffic to that address; and set up something like https://www.duckdns.org/ (a primo service btw) to assign a name that outside devices can use to find it. It’s certainly not the end of the world, but it is complicated, and it’s very easy to get wrong in a way that makes you vulnerable to the bad guys out there. I don’t recommend it.

A better solution uses an intermediary to serve as a go between. In this model Alexa drops a message somewhere on the Internet saying “please turn on the lights,” and the inside device reaches OUT to the intermediary to pick up the message. Message queues that work this way are used all over the place, and certainly will do the trick for our use case. Two design decisions will help us pick what flavor of queue we’ll use:

  1. How quickly do incoming messages need to be acted on? In our case, we don’t need millisecond responsiveness, but it can’t take more than a second or two — the lights really need to come on when I ask them to, not five minutes later.
  2. Do message senders require a synchronous response to their messages? That is, does the sender need to wait until the message has been picked up and handled before they can move on? This seems like a nice-but-not-essential feature for our scenario, since it’s pretty obvious if lights come on or not.

Setting up the Queue with SQS

There really are tons of very reasonable, lightweight ways we could go about this, but for now I just went with Amazon’s Simple Queue Service, a well-supported workhorse service with good Java support and easy integration into the Alexa side of things. SQS does require the client to poll for messages, but minimizes the performance penalty by supporting “long polling.” Each polling request stays active for up to twenty seconds, returning immediately if a message arrives. The worst case then is three messages per minute; 180 per hour; 4,320 per day — basically nothing as far as any modern CPU and network is concerned. And in return we get more or less instantaneous delivery. Cool beans!

In order to use SQS you need an account with Amazon Web Services. Note this is distinct from the Alexa developer account, although you can use the same email. There is a remarkable amount you can do within the AWS free tier, including (at least as of this writing) making up to a million SQS requests per month. I’m pretty sure I’m going to stay under a million requests per month turning my lights on and off.

So the big picture is that we’ll set up a Queue in our AWS account and drop a message into it each time our Alexa handler is called. We’ll then add a thread to our home control web server that long-polls the queue for messages and executes the requested screen/setting behaviors on our Z-Wave network. No sweat!

The most challenging part about all of this is authorization; we need to give our Alexa-hosted Lambda function the rights to post messages to the queue in our AWS account. This can get a little hairy, so buckle up and bear with me (general instructions can be found in the Alexa docs under “Use Personal AWS Resources”). The nut of it all is that we’re creating a role in our AWS account that has rights to post messages to a new queue, and then giving the Alexa role the rights to “assume” the AWS role when the skill is invoked:

  1. In the Alexa developer console, find the ARN of your Alexa-hosted Lambda role. Click the Code tab, then “Integrate” in the toolbar. Copy the “arn:” value there and tuck it away. We’ll refer to this as the Alexa Role ARN going forward. Make a note of the account number in this ARN, e.g., the account number for the ARN “arn:aws:iam::866314627097:role/AlexaHostedSkillLambdaRole” is 866314627097. We will refer to this as the Alexa Role Account.
  2. In the AWS management console, choose “Simple Queue Service” under the huge “Services” dropdown at the top-left.
  3. Click “Create Queue” and then the following options:
    1. A “Standard” queue is fine, although it will require a bit of message de-deduplication logic we’ll see later.
    1. Any name is fine.  
    1. Under “Access Policy”, choose “Basic” and then select the radio button “Only the Specified AWS accounts, IAM users and roles” under “Define who can send messages to the queue.” In the edit box that appears, enter the Alexa Role ARN.
    1. The rest of the default configuration settings are fine. You may optionally choose to configure a “dead letter” queue that will receive failed messages; details are here and you can add the option later if you choose.
    1. Click “create queue” to confirm the operation.
  4. On the queue information page that appears, copy the “arn:” value for the queue and tuck it away. We’ll refer to this as the Queue ARN going forward. Also copy the URL value which we’ll refer to as (perhaps not surprisingly) the Queue URL.
  5. Choose “IAM” under the huge “Services” dropdown at the top-left.
  6. Click “Roles” and then “Create Role”.
  7. Under “Select type of trusted entity”, choose “Another AWS account,” enter the Alexa Role Account you saved in step 1, and click “Next.”
  8. On the “Attach permissions policies” page just click “Next: Tags”, then “Next: Review”.
  9. Give the Role a name and then click “Create role.”
  10. Click your newly-created role in the list, and then “Add inline policy” under “Permissions”. Use these settings for the policy:
    1. Under “service”, search for and add SQS.
    1. Under “actions”, search for and add SQS:SendMessage.
    1. Under “resources”, choose “Add ARN” and enter the Queue ARN.
    1. Click “Review Policy” and then provide a name for the policy and click “Create policy.”

Whew, almost done! That takes care of creating the queue and setting up the Alexa role to be able to send message to it. The last bit of authorization required is a user that we will use from the home control web server to poll for messages. We need an access key and secret for that user.

If you’re really really lazy and are logged into the AWS management console as the root user, you can just choose “My Security Credentials” from the top-right account menu, allocate a key and secret, and use those. But those credentials have an insane level of access. Much better to go into the breach one more time and create a user just for accessing the queue:

  1. Logged into the AWS management console, choose “IAM” under the huge “Services” dropdown at the top-left.
  2. Click “Policies” and “Create Policy”.
  3. As with step 10 above, create a policy with access to the Queue ARN, but in this case under actions choose “All SQS actions”.
  4. Click “Users” and “Add Users”.
  5. Provide a user name (I used “homepi”; if I end up using other AWS services from the home control system I’ll reuse this user).
  6. Check “Access key – Programmatic access” and then “Next: Permissions”.
  7. Choose “attach existing policies directly”, then search for and check the box for the policy created in step 3.
  8. Click “Next: Tags” and then “Next: Review”.
  9. Name the user and click “Create User”.
  10. Copy the Access Key ID and Secret access key shown on the confirmation page.

Wow, you made it! Now go have a beer and then come back for the rest. It’s all downhill from here!

Sending to the Queue

OK, we’re ready to look at the code that sends the message from Alexa into the queue we created. This is pretty simple, although figuring the authentication most definitely was not. First, add a couple of global service clients at the top after the Alexa object is allocated (lines 7-8 of my sample index.js):

const AWS = require('aws-sdk');
const STS = new AWS.STS({ apiVersion: '2011-06-15' });

Next, add the handler itself (lines 10-66), which breaks down like this:

  • Lines 11-14 tell Alexa that this routine handles the “ScreenSettingIntent” intent that we built so long ago.
  • Lines 16-18 pull the “screenName” and “settingName” slot values out of the request, as heard by Alexa.
  • Lines 20-31 build the JSON content of the message we’ll put into the queue. The queue will take messages in any format. Be sure to use your own Queue URL at line 30!
  • Line 33 constructs the response that Alexa will speak back to the user. This is where having a synchronous response from our home server would be nice, because Alexa doesn’t really know if the setting was applied successfully or not.
  • Lines 35-51 creates a SQS client operating with the assumed Role we created that has rights to send messages to the queue.
  • Lines 53-60 actually, finally, send the message!
  • Lines 62-64 send the response back to Alexa.

Finally, we register the handler by adding the handler to the list at line 208:

exports.handler = Alexa.SkillBuilders.custom()
    .addRequestHandlers(
        LaunchRequestHandler,
        ScreenSettingIntentHandler,
        HelloWorldIntentHandler,
        HelpIntentHandler,
        CancelAndStopIntentHandler,
        FallbackIntentHandler,
        SessionEndedRequestHandler,
        IntentReflectorHandler)
    .addErrorHandlers(
        ErrorHandler)
    .withCustomUserAgent('sample/hello-world/v1.2')
    .lambda();

With all this in place, use the “Save” and “Deploy” buttons at the top of the code editor to push out your code.

On the “Test” tab in the code editor, you can test the skill right from your browser — just hold down the mic button and try out one of your utterances. You’ll see the response from Alexa, including logs and message details. Over at the AWS management console, you can peek at the contents of your queue to see if the message has actually made it. Don’t be discouraged if you get some auth errors on the first try; there is so much to configure here it’s hard to get it all right.

The really cool thing is that the skill is also now available on all of your Alexa devices. Honest! Skills in test are pushed to all of the Alexa devices registered to the account, so there’s no need to worry about publication or certification. Unless of course you want to allow random folks to turn your lights on and off from their own homes? Weird.

Reading from the Queue

We really are at the last mile now — all that’s left is to read messages out of the queue and take action based on the slot values we receive. For this we’re back to our (well at least my) happy place … Java that runs on my machine. We already have a process running on the Pi that implements the home control web server. We’ll just add a thread that long-polls the queue for messages.

Most of the implementation here is in Queue.java. We also need to add a reference to the AWS SQS Java SDK in our pom.xml. It’s basically obscene how much dependency this drags into our project, but signing AWS access tokens is a huge hassle — so I’m holding my nose and just living with it:

<dependency>
  <groupId>com.amazonaws</groupId>
  <artifactId>aws-java-sdk-sqs</artifactId>
  <version>1.12.94</version>
</dependency>

The reader thread is built on Worker.java … if you’re in the mood for more judgy commentary about managing backgound threads, feel free to check out this article, one of the first I wrote for the blog earlier this year. The queue thread itself is pretty straightforward — allocate the client object on creation, enter a long-polling loop that calls receiveMessage and sends received content to a handler interface and clean up the messages when that’s done. Just a few notes:

  • SQS supports two queue models: “FIFO” queues preserve ordering and guarantee exactly-once delivery; “Standard” queues are more efficient, but do not guarantee ordering and may deliver the same message more than once. I used a “standard” queue, so added a bit of code to protect against duplicates. It only works if two duplicates arrive in sequence, but for our low-volume use case that seems to be the norm.
  • The SQS retrieval pattern is standard for queue technology — when a client receives a message, it becomes “invisible” for a short window (default 2 minutes). If the client successfully processes the message, it confirms this by deleting the message. If that delete does not happen for some reason, the message reappears for another client to retrieve. This serves as a great retry model for transient failures; our Queue expects the handler to throw an exception on failure to support the pattern. It can be problematic for “poison” messages that can never be processed — this is where a dead letter queue comes into play.

The handler code in Server.java does the actual work of turning the lights on and off according to the screen and setting values. Which honestly seems a little anti-climactic after all the minutia we fought to get here … but we made it. I love the end result, and hope all of those lists and steps will save you some time, but the ever-spiraling complexity at AWS (and the other cloud providers) really is unfortunate.

Pushing my luck

Now I can manage my lights from my phone and with my voice. There’s just one last thing I’d love them to do: turn on and off automatically when I get up during the night. I found these pretty cool PIR motion sensors at Amazon, so let’s see if I can get that hooked up. Next time!