lichess.org
Donate

The meaning of "Clarity" in voice beta

@SamanyayGhosh asked me a great question via PM. I'm answering it here in case others wonder the same.

The question is "what is Clarity?" Answer: It's a number between 0 and 1.

When you give a voice command to the microphone, we retrieve that data and feed it into the Vosk API (open source software from voice recognition experts at alphacephei.com). Under the hood, Vosk and a lower level toolkit known as Kaldi employ a number of mechanisms including a neural network (called the "model" or "language model") to turn that audio data into text phrases in a given language. We retrieve these "heard" phrases from Vosk API and display them in the status area.

At this point, there is some fuzzy matching logic that maps those statements to every allowable exact move phrase (many of which may themselves be ambiguous such as "takes", "knight takes", "take the queen", "c7", etc).

Each mapping from the heard to an exact phrase is then assigned a "cost", which ends up as the sum of individual penalties assigned when a given phoneme must be transformed into another. These penalties were calculated by our Howard the Octopus voice experiment. Some of you may remember that.

Once all available mappings have been assigned a cost, they are sorted. If the distance between the lowest costing mapping and the next is greater than 1 minus the Clarity threshold, then that move is just played, no ifs ands or buts. If it's less than 1 minus the Clarity threshold, then you'll get colored/number arrows with the preferred Green / 1 move being the cheapest cost and hopefully most likely.

In effect, higher clarity values will reduce the cost tolerance threshold of disambiguation. In plain terms it is intended to reward those with good microphones, silent rooms, and decent pronunciation with fewer disambiguation arrows.

That's Clarity.

This topic has been archived and can no longer be replied to.