Digit Tutor

Online Game for Kids

19 october 2020 — 0 comments — 311 views — 2947 words

"Digit Tutor" — a simple online game for kids in Svelte, which uses speech recognition engine to train pronounciation of digits: from "0" to "9".

✳️ Open Digit Tutor

The project is open-source, so, feel free to send a PR, if you wish!

This post contains three parts:

Part I — Idea & Svelte
Part II — SpeechAPI
Part III — Svelte Stores & I18n

Part 1 — Idea and Svelte

It all started when my son began to show interest to actually understand what those strange "0", "1", etc. mean. Some time ago, I've heard, that kids like talking to Alexa (and other voice assistants), and that it also helps with improving their speech:

Voice assistants, usually, pronounce everything very well, so kids have a sample of nice speech
Voice assistants simply don't understand, if you have mispronounced something, so kids have motivation to speak well

So the idea was simple: have an app to show some digit, like "7", on a screen and wait for correctly pronounced "seven", than show some other random digit.

Technology

Both items from the list above joined very well, as I have remembered, that there is SpeechAPI in today's browsers (partially supported).

So, building a game, available to almost everyone, was just a matter of me going through that API.

To add some more challenge, I have selected a much-loved and still-fresh frontend library Svelte. All the initial development was done in Codesandbox, so that I even did not have to install anything locally.

About Svelte

Before we jump to hardcore details about building speech recognizing games, let me give an overview of what Svelte is, and how it is different from Angular, React and Vue.

Svelte is the most popular one from the "disappearing frameworks" group. Technically, this isn't even a library, as it does not have any footprint in your app — instead, it is a compiler, which goes through your code and creates an app and all the functions, that are needed for your app, but nothing more. So, if you app is just a static site — there will not be any javascript "compiled" by Svelte. On the contrary, React/Angular/Vue core libs are always required in your app.

As Svelte has "compilation" step as part of building and app, at which it already knows about all the possible changes to your DOM structure, it does not have (and does not need to have) virtual DOM. If you just need to change the value of an input — Svelte will see that and create just the code to change input's value; nothing more and no VDOM involved. Svelte developers blog calls VDOM a "pure overhead".

Besides, working with Svelte, I have experienced some cool features, some of which are even missing from React, while some might be considered a disadvantage:

In a singe Svelte file, you can define HTML, CSS and JS of your component. This can be achieved in React, using some external libraries, but not out-of-the-box. In Svelte, this all looks natural and easy to understand:

let i = 0;
while (i < 3) {
  i++;
}

console.log(i);

<script>
  let count = 0;

  function handleClick() {
    count += 1;
  }
</script>

<style>
  /* Svelte makes sure, this css is scoped to just your component */
  button {
    background: #ff3e00;
    color: white;
    border: none;
    padding: 8px 12px;
    border-radius: 2px;
  }
</style>

<button on:click="{handleClick}">
  Clicked {count} {count === 1 ? 'time' : 'times'}
</button>

There is no need for hooks or setState — just like in Vue, Svelte makes use of js class getters/setters. Downside of this, is that you have to use immutable structures, or give Svelte hints: arr.push(x) will not be noticed as a change, arr = [...arr, x] is a canonical way to push to arrays in Svelte.
You cannot use js in your templates (like you can in JSX): any specific logic should be expressed via Svelte-specific constructions, like {#if X == Y}<conditinally rendered HTML>{/if}. This is not inconvenient, but pure js is something everyone already knows, so could be a point to improve.
State. What an important word to know for React/Vue/Angular developers! Surprisingly, Svelte has something to offer in state management out-of-the-box, which is called "Stores". On the high-level, anything you can subscribe to and write/read from can be a store in Svelte (sometimes just 2-function-object is enough). I have not yet seen issues in working with shared state between different components, yet, as I have seen in React.

Conclusion

After a short warm-up with Svelte, I found it very convenient and quite easy to get up and running with it. Codesandbox.io supports it quite well, so there is no need to set up everything locally. However, if you prefer using local VSCode, I would suggest going on with VSCode Remote Development plugin, so that you don's intall node and all modules locally. This plugin even lets setting up some VSCode extensions, like Prettier, remotely.

As for the Svelte features, everything seemed convenient and well-thought. I had couple issues with Svelte not seeing array updates (see item#2 above) and being unable to style a Svelte component from root-component (the fix is to wrap component to a div or make sure it captures additional style from props). Otherwise, tutorial at svelte.dev was quite comprehensive and covered everything required.

Would I recommend using Svelte in production? Yes, if you're working with a web-site (not web-app), which is not very dynamic and delivers, mostly, static content. Although there are good react-based static site generators, Svelte will always win in initial bundle size, while being equal in performance. It is also quite mature now, as current major version is 3, meaning three major releases are done now.

Part II — SpeechAPI

Speech recognition in the app is based on SpeechAPI. It is not really supported in many browsers, as of now: https://caniuse.com/speech-recognition. Here is a screenshot of that page at the time of writing this text:

As you see, the support matrix can be quickly described as "chrome-only", however, Firefox can also support it with some config. Other browsers either don't support it at all, or have required objects in window, which do nothing.

So, it is definitely not ready for cross-browser production usage, but fits for experiments, like our game.

Technology

As you might guess, speech recognition in browsers contains of three parts:

Grammar: defines a set of words, which you're interested in & which will be recognized
Synthesis: allows browser to talk to you
Recognition: allows browser to convert speech to text

Logically, there might also be something like "intent recognition", to understand what the user really means, but it seems that current technology state does not allow to do that reliably.

At first, and very logically, I thought I would need to support Grammar and Recognition in my game. Later, during the testing stages, I've figured out, that grammar part is not used at all, so I could only focus on recognition.

A Word on Chrome Implementation

It turned out, that Chrome has very peculiar speech recognition implementation:

Actual recognition might happen on Google's servers, per this MDN article:

Note: On some browsers, like Chrome, using Speech Recognition on a web page involves a server-based recognition engine. Your audio is sent to a web service for recognition processing, so it won't work offline.

Chrome effectively ignores Grammar (a set of words), when recognizing — it just recognizes every word as it can hear.

Keep a note on these details, in case you care about privacy and Grammar support.

Show Me the Code!

Initializing Speech API

The first thing to do is to get & check if Speech objects are in the window. I am using a very simple approach here, with only one idea of having it all as a separate module. Actual file is here:

import { locale } from "./locale.js";

const SpeechRecognition =
    window.SpeechRecognition || window.webkitSpeechRecognition,
  SpeechGrammarList =
    window.SpeechGrammarList || window.webkitSpeechGrammarList,
  // SpeechRecognitionEvent =
  // window.SpeechRecognitionEvent || window.webkitSpeechRecognitionEvent,
  grammar =
    "#JSGF V1.0; grammar colors; public <color> = " +
    locale.getCurrentLocale().numbers.join(" | ") +
    " ;";

let recognition, speechRecognitionList;

if (SpeechRecognition && SpeechGrammarList) {
  recognition = new SpeechRecognition();
  speechRecognitionList = new SpeechGrammarList();

  speechRecognitionList.addFromString(grammar, 1);

  recognition.grammars = speechRecognitionList;
  recognition.continuous = true;
  recognition.lang = locale.getCurrentLocaleCode();
  recognition.interimResults = true;
  recognition.maxAlternatives = 1;
}

export default recognition;

Here we are initializing everything we need (even grammar):

Check vendor-prefixed implementations: window.SpeechRecognition || window.webkitSpeechRecognition
Create instances of the objects, using new

Note, that for grammar initialization I am using a separate module called "locale", since grammar depends on current language. Later, I understood, that grammar isn't actually used.

Before jumping to configuration details, I'll describe how to use that recognition instance:

Call .start() — this is the place, when browser asks for mic access and actual listening to user's speech starts
Listen to .onresult events. Each event may be "final" or "interim". "Interim" result is something you get quite randomly and often, while the user is still speaking. "Final" result is sent as soon as user has stopped speaking and is silent for some cooldown period.
Call .stop() or .abort(), if/when needed. My implementation calls "abort" after a successfull digit guess, and calls start quickly after that. The difference between them is, that "stop" attempts to return one last "final" result, while "abort" does not.

The configuration details:

.grammars: list of grammars, that you would want to recognize. Not really used, but might be in future, so I still have it
.continuous: true if you want to receive multiple final results, or false in case you need just one final result. Since my game is more or less continuous, I have it as true
.lang: recognition language. I am getting it from current system settings, so everyone gets his/her native language
interimResults: when true, you will get "interim" onresult events. Those happen faster, but may contain partially or incorrectly recognized words. I am using this, to improve recognition percieved performance: even if the interim result contains correct digit, it is considered pronounced correctly. This makes game much more responsive
.maxAlternatives: how many recognition options do you need. These are usually similarly sounding words and their combinations. Using "1" in my code, as I only need one option

Using the Speech Recognition

Now, as we have properly initialized all required objects, we need to:

Start using speech recognition
Integrate it to the game process

The instance usage is simple: call .start() to start listening, and subscribe to .onresult event to handle results.

There are also things like .onerror and .onnomatch events: I am using them just to show those errors to user, haven't seen them actually fire, yet.

Let's dive into .onresult implementation; below is its annotated version:

recognition.onresult = function (event) {
  console.log(event); // who doesn't like a good console.log?

  // Extract "current" result from all existing
  // results; this event also stores older results,
  // till the "final" result is in.
  const result = event.results[event.resultIndex];
  const transcript = result[0].transcript.toLowerCase();

  // this is for Svelte, to show hint on screen:
  // "what was heard by computer"
  hint = transcript;

  // Some more logging
  console.log("Result received: " + transcript + ".");
  console.log("Confidence: " + result.confidence);

  // Here we check, if expected digit was pronounced
  // Sometimes Chrome recognizes them as words (e.g. six),
  // sometimes as digits (e.g. 6). Computers aren't clever
  // enough to handle those cases, so we need an IF with OR
  if (
    transcript.indexOf(digit) >= 0 ||
    transcript.indexOf($l.numbers[digit]) >= 0
  ) {
    // Handle correct digit: show big green chek and next one
    onCorrectDigit();
  } else {
    // I had specific handling for incorrect pronounciation
    // But those event fire quite often, so I decieded to
    // not do anything in case answer is not correct.
    // onFail();
  }
};

Can you spot a bug in the code above? If you need to say "six", and you say "twenty six", this will still count as correct. Not a big deal, I think.

As you see, this code works with Svelte and UI, but it isn't impacted in almost any way by Svelte. The cool thing is in this line transcript.indexOf($l.numbers[digit]) >= 0. $l here is a Svelte store, and it will always refer to the current locale, so it will look for correct words as digits.

The last piece of code related to recognition is in onCorrectDigit handler. It calls .abort() and .start() once again, to clear the results array and make sure the app memory footprint is consistently low. I think, it would work even without that restarting, but this feels correct.

Conclusion

That's all! We've covered 100% of lines, related to speech recognition in the app. As you see, there's nothing complex and hard work is abstracted away by the browser. The only piece not covered is handling missing SpeechAPI; this is fairly straightforward, so omitted here.

If I was implementing it, probably, I would go with a simpler arguments for .onresult event, but it takes just 2 more minutes, to figure out its structure and, probably, covers more cases, than mine.

Also note once again, that this all is almost Chrome-only for now, so cannot consider it "production-ready".

In the third Digit-Tutor article I will cover localization implementation: Svelte has a built-in approach, however, I've opted for a custom, store- based implementation, since I had to integrate that with speech recognition, too. What I got, I called a "poor man's localization for Svelte" as it felt simpler and more flexible than the built in one in the end.

Part III — Svelte Stores & I18n

At first, we need to understand what Svelte Stores are and how they work:

A store is simply an object with a subscribe method that allows interested parties to be notified whenever the store value changes.

Here is a simple, but feature-complete store example:

import { writable } from "svelte/store";

function createCount() {
  // this line creates a "writable" store, with "0" initial value
  // and gives subscribe, set and update methods
  // this really resembles "useState" from react
  const { subscribe, set, update } = writable(0);

  return {
    subscribe,
    increment: () => update((n) => n + 1),
    decrement: () => update((n) => n - 1),
    reset: () => set(0),
  };
}

As you can see, store does not necessarily exposes direct "set" or "update" methods — if you want to limit its updates to some specific cases (increment/decrement/reset in the sample) — this is all achievable.

There is also one more Svelte-magic with stores: you don't actually need to use subscribe method or anything, to use store value in code or UI, just use the magic $ sign. Here is how we could use our sample store in the code:

<script>
	import { count } from './stores.js';
</script>


// this is where the magic happens, and `count` from the store
// is reactively bound to the UI
<h1>The count is {$count}</h1>

<button on:click={count.increment}>+</button>
<button on:click={count.decrement}>-</button>
<button on:click={count.reset}>reset</button>

So, stores seem to be fully sufficient to implement internaionalization:

It is easy to bind store values to the UI
Stores can expose additional logic; later I figured out, that stores can be used to logically group methods and values, just like objects from the business layer of the app
If store's value is an object, it works fine if used as $store.value

I18n in a Store

My internationalization implemenetations lives in the locale.js file, and provides following features:

Loads initial user locale from OS settings
Provides list of available locales, loaded from the data-file
Allows to get/set current locale and its code (for the UI)

It is implemented as a writable store, whose value is an object, storing localized versions of all the strings in the app. That object is loaded from the data-file.

The store itself is used just like a usual Svelte store, which is referred as a $l in the code. Whenever selected locale changes, the underlying store object is replaced with a new one, and all UI elements change their values:

// fragment of the App.js file
<h1>{$l.header}</h1>
{#if gameState == STARTING || gameState == NO_RECOGNITION}
  <LocaleSelector />
{/if}
<p>{$l.info}</p>

The Bug

Unfortunately, there was exactly one bug in the "release" version of the game, and it was connected to internationalization. Besides it, I have covered all possible corner casees with the locales, so let's just discuss the bug.

The reason for it was, that in the LocaleSelector component, I haven't initialized the selectedLocaleCode value the user's OS locale. As a result, its initial value was undefined, while the UI was showing English locale as selected — just the first one in the list; at the same time, all the logic, including speech recognition, was using real user's locale (russian, in my case).

The fix is pretty simple: I just made sure, that this variable is properly initalized in the component, too. Here is the fix, which I have done 20 minutes since the bug was reported.

Conclusion

It turned out, that Svelte stores is a sufficiently powerful concept to implement the internationalization. I cannot imagine anything similar from React-world, except third-party libs.

And it also turned out, that you need to be twice as thorough and attentive, when making an internationalized app, since it can be a source of additional bugs.

Anyway, I hope people liked this game, since that time I have even received multiple improvement and evolution ideas, which may some day go to production, we'll see!

Thank you!

🍺 Say thanks to the author