Sphinx: Speech Recognition Plugin
Sphinx-UE4 is a speech recognition plugin for Unreal Engine 4
Sphinx-UE4 is a speech recognition plugin for Unreal Engine 4/5. The plugin makes use of the Pocket-sphinx library. At the moment, this plugin should be used to detect phrases. (e.g. "open browser"). Singular word recognition is poor. I am looking at ways to improve this to a passable level.
Blueprint Only projects will not package properly with Plugins. This is a known issue of Unreal Engine for the time being. To work around this, simply add an empty C++ class to your Blueprint Only project.'''
When packaging a project. Ensure the model folder is included in 'Additional non-asset directories to Copy' (in Project Settings\\Packaging_.
Example
Speech can control a character, moving them around a soccer field, walking/running, turning and kicking the ball.
Example projects can be downloaded here: https://nerivec.github.io/old-ue4-wiki/pages/speech-recognition-plugin.html#Demo_Project
Rules
The game timer is set at 3 minutes. A ball will spawn, of a random colour (either blue or red). A goal is obtained, by kicking the ball into the colour that matches the ball. When a goal is scored, a new ball is spawned.
How to play
When the game starts, the game will be set up in a keyword listening mode. start the game: Starts a game of soccer.
-
enable walk: The character starts walking
-
enable sprint: The character starts sprinting
-
turn left: Rotates the character 45 degrees to the left.
-
turn right: Rotates the character 45 degrees to the right.
-
kick the ball: If a ball is in range, the player kicks the ball.
-
one eighty: Rotates the character 180 degrees.
-
stop movement: Stops all character movement
Grammar Test
This example shows the grammar file support. The grammar file has the form. Upon a recognition that matches the form, it will show the operation and the result. For example, "three add five", will look like this
Language Test
If you click on the character on the map, you can see that Language is selectable property. There's support for English, Chinese, French, Spanish, Russian. Take a look at the blueprint to see what words are added for what languages. I can only speak English, so the testing of foreign languages is probably pretty blotchy.
Volume Test
An approximate volume of the microphone can be obtained in blueprints. In this example, the cone radius of the light is affected by the volume of the microphone.
How to use the plugin
UE4 Setup
- Download code from GitHub
https://github.com/shanecolb/sphinx-ue4
-
Copy Plugins and Content folders into the project of your choosing.
-
From the Binaries folder, copy the appropriate .dll into Plugins\\SpeechRecognition\\Binaries\\Win64
-
Download and extract the following archive into the path "Content/model" within your project: Language Models
-
Right-click on the .uproject file, and select regenerate solution.
-
Open the Visual Studio project, recompile, and open the project in UE.
-
Open the project, and enable the Speech Recognition plugin.
-
Open the blueprint of whichever actor/class you wish to improve, by adding Speech Recognition Functionality.
I will now run through the changes necessary:
Blueprint Changes
- When the Begin Game event is fired, create a Speech Recognition actor, and save a reference to this actor. After this, create and bind a method to OnWordSpoken. This method is triggered each time a recognized phrase is spoken. Lastly, ensure Shutdown (on the speech recognition actor) is called during End Play.
- Once the actor has been created, we will Initialise, and set configuration parameters for Sphinx. There is a huge range of sphinx params that can be configured. NOTE: Setting the recognition mode (keyword/grammar) will reset the Sphinx params that were previously added.
Ensure sphinx config params are set, before each change of the recognition mode.
Although this list is old, the following provides a detailed list of the various sphinx params.
At the moment, I set the following and would suggest trying the same. I am still experimenting to try and find what works best for me.
- The WordSpoken method takes in an array of recognized phrases. This set is looped over, to trigger in-game logic.
- Make sure on the End Play event, that the shutdown method is called. Otherwise, crashes will occur if multiple instances start up.
UE5 Setup
-
Download code from GitHub: https://github.com/ukustra/sphinx-ue4
-
Copy the SpeechRecognition folder to the Plugins folder in your project's directory (if you don't have the Plugins folder in your project's directory, create it).
-
Unzip the "model" folder or download the "Content/model" folder from any sample project or from Language Models and copy it to your project's Content folder.
-
Open your project and create a GameState BP which inherits from SpeechRecognitionGameStateBase.
- Setup that GameState BP in your GameMode BP.
- In your GameState BP, you can setup the language and optionally the phrases for recognition. The Speech Recognition will be automatically initialized with the following config params:
- If you need any additional setup, you can use the OnSpeechRecognitionSubsystemInitialized event in your GameState.
- Now you can simply bind an event to OnWordSpoken, for instance in your Player Character.
("OnWordSpoken_Event_0" event presents the "Text" struct after using the "Split Struct Pin" feature)
Speech Recognition Events
Just as there is an event fired when speech is detected, there are hooks for other events. Here is a list of all the events
-
OnWordsSpoken: triggered when silence is broken, and one or more recognized phrases are detected.
-
OnUnknownPhrase: triggered when silence is broken, and no recognized phrases are detected.
-
OnStartedSpeaking: triggered when silence is broken, and speech is detected.
-
OnStoppedSpeaking: triggered when speech is broken, by silence.
Recognition Keywords/Adding additional words
At this time, we set the recognition mode to Keyword, and a set of Key phrases are passed in.
These are used to determine which phrases are spoken by the player. A Recognition Phrase comprises a string (representing the phrase we wish to detect) and a tolerance setting. This tolerance determines how easily a phrase will trigger. Play around with the tolerance settings, to test the balance between sensitivity, and false positives.
If your phrase features words that are not in the dictionary, they will not be detected. To add words to the dictionary, open the .dict file that matches the language of your choosing (e.g. English is "Content\\model\\en\\en.dict").
This contains a list of recognized words. The first string is the recognized word. The rest is the phonetics of how the word is recognized.
Here are some examples:
abbott AE B AH T
ball B AO L
bandit B AE N D AH T
Simply add a word in a similar manner, and re-save the file.
Plans ahead
-
Create C++ only examples which showcase the plugin:
Currently, I have only included a Blueprint example. I wish to write some C++ examples, showing how the plugin can run in a C++ class, instead of a Blueprint.
-
Adding additional languages:
Currently, there exists a number of sphinx-trained language models for languages other than English. If the language is supported by Unreal Engine 4/5, and there exists a trained model, then I will add it.
-
Improving accuracy:
-
At the moment, my testing has been anecdotal testing. I wish to work on improving accuracy. Either by tweaking of the parameters passed into Sphinx or by the tweaking of the keyword tolerance values for the keyword Tolerance enumeration.