I provide an industry update on smart speakers and walk through the fundamentals of building an Alexa Skill.
Good afternoon, RocketMill. My talk today, as you can see, is called: “How to Design an Amazon Skill for Your Business.”
Now, so far in our series on voice search, we’ve discussed how to adapt your existing web content to be more suitable for voice applications. But today I want to start to talk to you about how to design for a voice first application; a native experience on smart speakers. Now, there’s much too much to cover in one seven minute or so presentation, so consider this an in-depth introduction.
Firstly, quite importantly, as a marketer why would you want to bother?
Let’s take a look at the device landscape in the run up to, lest we forget, Christmas 2017. Yes, the ‘c’ word is upon us, and in the last three weeks or so – not that one, Lotte – Google and Amazon have announced their latest product line-ups.
We start off with Google, who announced the Pixel 2 Smartphone. This has the tagline: “Ask more of your phone.” Now, why? Well because it offers your own personal Google. The marketing material majors on the ability to send text messages, to navigate, to take photos, and of course, lest we forget, to search using only your voice.
This was announced alongside two devices, two smart speakers as it happens. On the left, the Google Home Max; an intelligent voice controlled Sonos rival that does more than just play music. On the right, the rather cuter Google Home Mini. At just £49, that is priced to take on the entry-level Amazon Echo Dot.
Speaking of Amazon, they’ve been just as busy. This is the Echo Spot, not yet out in the UK but in the US you can pre-order this. That’s like an Echo Dot but with a small screen and a very, very cute minimalist design. The ideal alarm clock replacement perhaps, a bedside companion to watch you and listen to you while you sleep.
There’s an upgraded Amazon Echo as well, as well as a new Echo Plus which looks like this but basically a bit taller – an extra Pringles can on top. That’s got a built-in smart home hub. Finally, there’s the Echo Show, which adds a screen to your voice controlled home.
Now what does the cornucopia of devices tell us? Quite simply that tech companies are lobbing jelly at the wall – should have brought a prop really – and watching to see what sticks.
They have every reason to do so. The manufacturers at least. They’ve got every reason to want to be in this space, because data published by the Financial Times this week estimates about 23 million smart speakers will ship during 2017.
In five years, as you can see from the chart, that will reach around 100 million smart speakers in homes across the world. For publishers, marketers, and brands with finite resource and budget, which horse do you back?
Well, there are two clear front runners in this race; seven out of 10 smart speakers sold last year were Amazon Echo devices, and Google Home represented around one in five smart speakers sold in 2016.
Tesco are hedging their bets between both platforms, announcing this week a partnership with IFTTT – If This Then That – which is like a go-between for online services, to integrate online shopping with both Alexa and Google Assistant. Their goal, incidentally, is to remove customers away from only thinking about their shopping lists once a week ahead of the weekly shop, and to make it easier to add throughout the week so that I guess we lose track of how much you’ve added to your basket and how much you’re going to spend.
Now you know the landscape. It’s time to find your voice and learn the fundamentals of building an Alexa Skill.
Firstly, let’s recap what is an Amazon Alexa Skill? In a nutshell, an Alexa Skill is like an app which expands what you’re Amazon Echo can do. You design Alexa Skills using an official framework called the Alexa Skills Kit, which rather brilliantly can be abbreviated to ASK.
There are a few different types of Alex Skill. Four main ones:
There is a fifth type, which is the list Skill, which you have to access slightly differently through the Alexa Skills Kit Command Line Interface, and that allows you to interact with your shopping lists that you save to your Amazon account.
For the sake of this video, I’m going to be focusing on the custom interaction model, which is the best place to start for most brands when building an Alexa Skill.
All Skills have two parts in effect; you have the front end in terms of old school web development, which is your interaction model; your VUI, voice user interface, and the back end, which is a hosted or Skill service. You can think of that, like I say, as the front end and the back end. It’s like a travel website having a pretty conversion optimised form to send data to a server to number crunch and book a holiday.
If you’re anything like me, you want to get straight into their code and building your first Skill, but please don’t do this. You shouldn’t dive straight into the code. That’s not design, it’s development. Instead, designing an Alexa Skill should be about deciding who you want to help, how you want to help them, and translating those functions into a conversation.
What is the syntax of an Alexa Skill? Let’s take a simple instruction that you might give to your Amazon Echo. Let’s imagine we’re building a calendar app for RocketMill. Here’s our instruction: “Alexa, ask RocketMill to schedule a meeting with Dom.” We’ve got a few different parts to this sentence, so we’ll start with the ‘wake word’ as it’s known: “Alexa.” That is something which begins all initial instructions for Alexa, and it triggers your Amazon Echo to listen in. Subsequent interactions with Skill do not usually require the wake word to activate the device again; it’s already listening to you.
Then you have something known as the ‘invocation phrase’. This is a verb which controls Alexa’s behaviour with your Skill, and in the transcript of this video we will link you a list of all the invocation phrases so you can pick the best one for your needs.
Then you have, to accompany that, the ‘invocation name’. Each Skill has a unique invocation name which is a phrase which triggers that particular app. Strictly speaking, it cannot be a single word as in this example, unless it is a recognised trademark, so we’d probably get away with it, or a brand. That said, what you can do is form out your invocation name to be two or three words, which is the preferred format, but have them sound like a single word. So in this case, Rocket space Mill would do the trick. Again, we’ll link to some more tips in the video transcript. [Tips for invocation names].
Then you have a couple of ‘connecting words’, these just support the invocation phrase. They just join the dots. Skipping ahead, we have a variable which directly changes the goal of our interaction. In the syntax of an Alexa Skill, this is known as a ‘slot’, so you fill the slow with a different variable to achieve something slightly different.
Then finally, schedule a meeting. This is the important bit. The string of words which determines what you want the Skill to actually do. This is known as an ‘utterance’. It is the most important thing to get right for an easy to use interaction model. Because there’s more than one way to say the same thing, and working out how users will talk to your Skill is the difference between a great experience and a frustrating experience. Fortunately, Alexa deals with this. The Alexa Skills Kit includes functionality to map multiple utterances to one intent.
Now an ‘intent’ within the Alexa Skills Kit is a unique function within your skill. There are three default intents which all Skills must have; cancel, help, and stop. Basic interactivity with a Skill. But you should supplement these with custom intents for your skill, and then map utterances to them. In this case, three utterances map to the one ‘book meeting intent’. That would be handled by a single function in our back end in our hosted service, our Skill service source code.
To recap the first steps to build your Alexa Skill, a few questions to ask yourself before the next video.
Which interactions with your business would work well as Alexa Skills? Which ones are cumbersome, or repetitive, or just not that much fun which could be more interesting or easier to achieve as a voice interaction?
Which type of skill matches these interactions? If you’re most brands, you’re probably looking at custom interaction and something bespoke, but if you’re a publisher you probably want to be looking at making the flash briefing work for you. Again, those other types of Skill will work in some situations too.
What are the goals of these interactions? Define the unique things you need your Skill to do, in effect the functions you have within an app or a website, and map single intents to those avoiding repetition.
What information would you need to collect to make those intents work and make those work for your users? i.e. What slots do you need?
How many ways can you ask for that same goal? Begin to map out the utterances which map your intents.
Finally, and this actually harks back really well to a presentation which Cat did recently [The Art of Editorial Guides], what does you brand sound like? Is it a formal brand? Is it a colloquial brand? If you have a brand style guide, you’ve probably got this information already, but you need to start thinking about what your brand sounds like in conversation.
Next time, we’ll get under the hood, we’ll have a practical examples of building an Alexa Skill. That’s all for now, thank you.