How to design an Amazon Alexa skill, part 2

Related Articles

Marketing

We are RocketMill: This is Izzy Scott-Evans

Marketing

We are RocketMill: This is Annie Freer

Marketing

How to design an Amazon Alexa skill, part 2

Missed part one? Find out the fundamentals to building an Alexa skill.

Transcript

Hello, everyone. Pretend I’m not here. Alexa, start RocketMill company meeting.

[Alexa: Welcome to the RocketMill company meeting. Would you like me to introduce myself?]

Yes

[Alexa: Okay, good afternoon, RocketMill. My name is Alexa, and it is my pleasure to welcome you to our second presentation on how to build an Amazon Alexa Skill for your business. To guide you through my inner workings, please welcome a man whose voice could never be described as monotonous. Ladies and gentlemen, it is the one, the only, Chris Philpot.]

Hello, and welcome to ‘How to Design an Amazon Alexa Skill, Part Two’ – the second video on how to design a skill for your business. Now, this would be a good point for me to walk over here, and mute Alexa’s microphone. Otherwise, she’s going to hear me saying “Alexa” throughout this and try to reply to me, which probably wouldn’t look very good.

A recap on Alexa Skills

So, let’s recap our first video, and what we’ve learned about designing Alexa skills so far. If you’ve seen the first video, you will know that skills are powered by the Alexa Skills Kit, aka ASK. There are five types of Alexa Skill. And they are:

  • Custom interaction, which is the one, which is probably useful for most brands. It’s the one you have the most creativity with.
  • It’s the flash briefing, which is ideal for publishers. You can perhaps supplement a flash briefing news report with a video skill, to deliver video streams to the Amazon Echo show.
  • Smart Home, which is the one that homeware manufacturers would use. So, if you want to, I don’t know, control a kettle from Alexa, then as a home ware manufacturer, that’s the API you would tap into.
  • And finally, the list skill, which taps into Amazon’s wish list and shopping list framework, so that you can buy through the Amazon platform.

There are two parts to an Alexa Skill. The first of which is the interaction model, which is your voice user interface, which determines how people can speak to your skill. And secondly, there is a hosted service, or a skill service, which is behind the scenes doing the heavy lifting and the number crunching to process voice commands and turn them into actions.

And within a voice command, there are five main parts:

  • The first of which is the wake word, which gets Alexa’s attention. That’s by default “Alexa”.
  • The indication phrase, which is a verb, which tells Alexa what to do with the rest of the command. So, “ask” or “search” or “play”.
  • An indication name, which is the unique name of the skill you want to use, often of a brand. So, it could be “Uber”, “JustEat”.
  • An utterance is a unique function of the skill which you want to use. So, it could be “Call a taxi”. It could be, “Order a pizza”.
  • And then finally a slot, which is a variable that defines exactly what you want from the skill. So, for that utterance for “Order a pizza”, it could be “pepperoni”, say.

And importantly, many utterances can trigger one intent. In a nutshell, there’s more than one way to say the same thing. And your skill needs to be able to handle that.

Putting it into practice

So, enough theory, let’s put this into practice with a simple Alexa Skill. I’m sure nothing can go wrong, but as the old saying goes: “Never work with animals, children, or Smart Home products”.

I’ve built a simple dictionary of digital marketing jargon, and the goal of this skill, is that the user will be able to ask Alexa to define a digital marketing term, and Alexa will respond with a straightforward definition. Okay, so let’s give this a go. We’ll need to unmute Alexa first. Now she can hear me again.

Alexa, launch RocketMill dictionary.

[Alexa: Welcome to RocketMill marketing dictionary, UK edition. You can ask me to explain the meaning of digital marketing jargon. For example, you can ask me: “What is SEO short for?” So, what can I help you with?]

Define SEO.

[Alexa: Search Engine Optimisation, improving web pages to increase their visibility in search engine results.]

What does PPC stand for?

[Alexa: Pay Per Click, a form of online advertising where you are charged only if a user clicks on your advert.]

Alexa, stop.

[Alexa: Thanks for using Rocket Mill marketing dictionary, UK edition. Goodbye for now.]

In fact, there’s one other thing I can do, which is to try and invoke the skill without having launched it first. So, if I say: “Alexa, ask RocketMill dictionary for the definition of CRO”.

[Alexa: Conversion Rate Optimisation, maximising user revenue by improving the quality of your webpages or app screens.]

Okay, so simple demo of a fairly simple skill. Let me mute Alexa again, so she doesn’t interject. Brilliant. You know, I have this much trouble with actual women as well.

Let’s talk through the steps to launching an Alexa Skill so that you understand, basically the process you need to go through to do something as slick as that.

The steps to launching an Alexa Skill

Firstly, you will set up your skill. Then, you will design the interaction model for it, the front end. Connect it to a hosted service, which is the back end. Test your commands. And then finally, publish it to Amazon so that other people can download it to their devices.

Setting up your skill

So, let’s start at the beginning. How to set up your skill.

Number one: register on the Amazon Developer website. Here’s a direct link which takes you through to a description of the Alexa Skills Kit. And from there you can click on the Start a Skill button, and then choose Alexa Skills Kit to jump straight through to the relevant page.

And then you need to enter your skills’ basic details. So, we have:

  • Skill type
  • Languages – so, in this case, I’d just use British English, but you can also do American English. You could do English targeted to an Indian audience. Why do you need three ways of doing English? Because you might need different phonemes or pronunciations to be dealt with by the different parts of the functions behind your app. You have German and you have Japanese. So you can support five territories at present.
  • You can also enter your skills’ name as it will appear in the Alexa store.
  • And finally, the invocation name. So, what you need to say to bring it to life

And finally, you just need to confirm if your skill uses audio, video, or display directives for the build into the Alexa platform. Most basic skills probably won’t.

So, here’s a quick screen shot of this skill, which you’ve just seen, the RocketMill marketing dictionary. And as you can see, it’s set for just one territory, the UK. It’s a custom skill type, so custom interaction. I’ve got an application ID, which will be useful if I come to launch. I have the name of the skill, and I’ve specified the invocation name. In our case, we have “Rocket” and “Mill” as two words, because it just made it easier for testing purposes, to have it as two. In theory, we could jump through hoops and confirm that RocketMill is a brand and that therefore we could be one word, but it sounds the same. So why bother?

Designing the interaction model

Then, we have how to design the interaction model. So, this is the front end, if you will, to your skill. Now, as a marketer, it’s unlikely you will configure this in the Alexa Skills Kit interface yourself, but it’s worth understanding the steps so you know how to do it. And you should absolutely be involved in designing and defining the interaction model because it’s how users will experience your skill. It’s like designing the user journey for your website.

So, the first thing to consider, is to create one custom intent for each unique function of your skill. So, I don’t know. If we were a sandwich shop, we might create a “Order sandwich” intent.

And then, within each intent, you need to create slots to customise for output. So, if you’re ordering a sandwich, you probably want to specify the bread type and the filling.

Set sample utterances which should trigger each intent. So, in this case: “Order a filling sandwich with bread type bread”. You can enter a few of these, and Alexa will start to pick up the natural language variations over time. But the more explicit you are, the more streamlined the experience will be, certainly at the start of your skills’ life.

And then, finally, define possible values for the slots. So, for our filling slot, we might need to specify chicken or bacon or ham or cheese, and they’ll map to different variables, in effect, in the back end of our skill. So again, let’s have a look at then Alexa Skills Kit interaction model for the skill you just seen, the marketing dictionary.

You can see I’ve got some sample utterances. On the left, you have the sample utterances, which I think is like, define term, what is the definition of term, what is the meaning of term. On the right, we have the slots within this intent. Now for legacy reasons, this is called item. It should probably be term or word or jargon, or whatever. And it has a type of list of items. And if I go through to the list of items, you can see that it’s basically everything which my skill can define. So UX, CRO, PPC and SEO.

Actually, you might be wondering why it’s got life in there. “Alexa, ask RocketMill dictionary, what is the meaning of life?”

[Alexa: According to the science fiction story, “The Hitchhiker’s Guide to the Galaxy”, written by Douglas Adams, the meaning of life, the universe, and everything is 42. Now, stop trying to trick me.]

So, there we go. Remember, there is more than one way to say the same thing. And although the Alexa Skills Kit, and indeed the Alexa platform, will very comfortably handle the wake word and the invocation phrase, you need to specify your indication name in step one, and your utterance and your slot values in step two, so that you can now connect it to a hosted service.

Connecting to a hosted service

Now you can either do that as an AWS Lambda function. This is part of Amazon’s framework using one of these languages: C#, Java, Node.JS, or Python. Or, you can build your skill using any language you fancy, and serve it via an HTTPs endpoint somewhere on the web.

I guess the advantage of one versus the other is probably scalability versus cost. If you host it as an AWS Lambda function within Amazon’s realm, there is definite scalability. If your app becomes very popular overnight, the platform will be able to adapt to accommodate that. If you host it there, though, you’ll have to pay Amazon’s fees to host it. You might have a back end, where you’re already paying for it. Equally, it might be easier to tap into some of the existing functions within your server architecture, within your framework.

So, let’s have a look at the AWS Lambda back end for this skill. The source code from the skill is written in Node.JS. And it builds upon the how-to tutorial skill, provided by Amazon, which we will link to in the transcript. So, you can see how the defining intent for my interaction model, maps to a function within my skill. Actually, it just goes away, and looks up the slot value. And it will then, if it finds a definition, read it out. And if it doesn’t, it will say it can’t find it, and ask for another input.

The other thing to bear in mind, is you need to handle the default intent within the Alexa Skills Kit. So that is help, repeat, stop, and cancel. Basically, so that if people interacting with your skill are using the same language, but using other skills, they don’t have a broken experience. You also need to handle unhandled inputs, in effect, so someone asking for a function which your skill can’t perform. So, if someone asks for a word which isn’t in the dictionary, it has a sensible reply.

In terms of syntax, if you’re used to using document.write, within JavaScript, then you just need to modify your thinking slightly, to use the Alexa SDK, and you can ask Alexa to tell the user something or tell Alexa to ask the user something. We’ll link to a full introduction to the Alexa SDK for Node.JS in the video transcript.

Testing your commands

And so, in terms of the steps to launching a skill, we’re about here: testing our commands.

The final step is to enter some information so that your skill can be listed within the Amazon Alexa app on mobile phones, and moreover on the Amazon website so that people can download it.

Summary

So, in summary, you will need to:

  • Set up your skills’ basic details within the Amazon Developer interface
  • Decide what your skill will do, and how users will talk to it
  • And then work closely with your developer to turn that into an interaction model, which is the front end, the way people talk to the skill. A back end by mapping the intents within your interaction model to functions in your source code. And then host the skill in a suitable environment
  • Test it
  • And then finally, when you’re happy with your skill, publish it to the Amazon website and app

Okay, so coming soon in our series on voice search, our next presentation will cover how to use flash briefings to deliver news stories, how to customise Alexa to sound more like your brand, and what voice optimisation strategy is actually right for your business.

I’ll also be talking in future presentations about adapting an Alexa Skill for Google Assistant and, indeed, the privacy implications for all of this. Having a speaker in your home, listening to you all the time, what are the considerations you need to have as a brand, and I guess as a user too?

But that’s all from me for today. But I will leave the last word to my cybernetic companion. “Alexa, ask RocketMill company meeting to end the presentation”.

[Alexa: That’s all, folks.]