内容简介:The above video is hosted onIn this post we are going to build a teleprompter web application using theNOTE: The above :free: video goes through all the content of this blog post step by step. You can find the
The above video is hosted on egghead.io .
In this post we are going to build a teleprompter web application using the Web Speech API
. In particular, we'll use the SpeechRecognition
interface to build this app. The idea is that we'll be able to recognize the user's voice, match the words to a predefined script, and then automatically scroll to the next unspoken position.
NOTE: The above :free: video goes through all the content of this blog post step by step. You can find the code for this project in GitHub and you can play with it on CodeSandbox
Preview of Teleprompter
The following is a short animated GIF showing what the end result of what we will be building using native web technologies and JavaScript libraries.
Basic Application
The Teleprompter
component that we'll start with is just a shell of what we will be building. To begin with, we are looping over the words passed to the component and displaying them in <span>
elements, but eventually we will want to wire up the SpeechRecognition
API and auto-scroll the contents as the user speaks.
import React from 'react'; import styled from 'styled-components'; const StyledTeleprompter = styled.div` /* ... */ `; export default function Teleprompter({ words, progress, listening, onChange, }) { return ( <React.Fragment> <StyledTeleprompter> {words.map((word, i) => ( <span key={`${word}:${i}`}>{word} </span> ))} </StyledTeleprompter> </React.Fragment> ); }
Creating a SpeechRecognition
Instance
The first thing we'll do, is create an instance of SpeechRecognition
. To do this, we'll create a recog
reference using React.useRef
hook, passing null
.
Next we'll use a React.useEffect
that we'll execute after the initial render of the component. In here, we will either reference the real window.SpeechRecognition
constructor or the vendor prefixed window.webkitSpeechRecognition
version. Depending on which one exists, we'll create a new instance of it and assign it to the current
property of our recog
reference.
Setting continuous
mode to true
allows us to continuously capture results once we have started, instead of just getting one result.
Also, we'll want to set interimResults
to true as well. This will let us get access to quicker results. However, these results aren't final and may not be as accurate compared to waited a bit longer.
const recog = React.useRef(null); React.useEffect(() => { const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition; recog.current = new SpeechRecognition(); recog.current.continuous = true; recog.current.interimResults = true; }, []);
SpeechRecognition
Browser Compatibility
Before we get too far along, it's important to know that the SpeechRecognition
feature that we are going to use only has minimal browser support at the moment.
Toggling SpeechRecognition
to Start and Stop
Now we'll add another React.useEffect
hook so that we can toggle whether or not our Speech Recognition system should start or stop listening. So, if we are listening
then will tell our recog
ref to start
, otherwise, we will stop
the recognition instance.
React.useEffect(() => { if (listening) { recog.current.start(); } else { recog.current.stop(); } }, [listening]);
Adding and Removing Event Listeners
At this point, starting or stopping does nothing yet, so let's wire that up. We'll have another React.useEffect
hook, and this one will grab our recog
reference and add an event listener, listen to the result
event, and handle that with the handleResult
callback, which we haven't defined yet, but we will very soon.
Also, we'll want to clean-up after ourselves so we'll return a function that will removeEventListener
for the result
event bound to the handleResult
funciton.
React.useEffect(() => { const handleResult = () => { /* ... more code later ... */ }; recog.current.addEventListener('result', handleResult); return () => { recog.current.removeEventListener('result', handleResult); }; }, [onChange, progress, words]);
Handling the Recognition Results
Now, let's define the handleResult
function that we've wired-up to the recog
reference. In here we will grab out the results
portion of the argument passed to us. We'll create an interim
variable and take the SpeechRecognitionResultList
returned and convert it into an array using Array.from()
, limit the results to grab only those that are not final using Array.prototype.filter
, grab the first transcript from each of of those using Array.prototype.map
, and finally join all of those together into one big string using Array.prototype.join
.
In order to leverage the results, let's create some new state using the React.useState
hook and save off the results calling setResults
.
+ const [results, setResults] = React.useState(''); React.useEffect(() => { + const handleResult = ({ results }) => { + const interim = Array.from(results) + .filter((r) => !r.isFinal) + .map((r) => r[0].transcript) + .join(' '); + + setResults(interim); /* ... more code later ... */ }; recog.current.addEventListener('result', handleResult); return () => { recog.current.removeEventListener('result', handleResult); }; }, [onChange, progress, words]);
To see the interim results
that we've captured in state, let's conditionally show them as a sibling to <StyledTeleprompter>
if they exist.
return ( <React.Fragment> <StyledTeleprompter ref={scrollRef}> {words.map((word, i) => ( <span key={`${word}:${i}`}>{word} </span> ))} </StyledTeleprompter> + {results && <Interim>{results}</Interim>} </React.Fragment> );
Scrolling based on Progress Value
Now, let's focus on scrolling. As setup, let's first create a scrollRef
variable using React.useRef
hook, setting it to null
and assign it to our <StyledTeleprompter>
component. To our <span>
we'll add an HTML5 data attribute of data-index
and assign it to the index of the word.
This technically isn't necessary for the scrolling, but let's add a color style to indicate if the word has already been spoken or not. If the word index is less than the progress (meaning it has already been said), then it'll look gray, otherwise it'll look black.
+ const scrollRef = React.useRef(null); return ( <React.Fragment> - <StyledTeleprompter> + <StyledTeleprompter ref={scrollRef}> {words.map((word, i) => ( <span key={`${word}:${i}`} + data-index={i} + style={{ + color: i < progress ? '#ccc' : '#000', + }} > {word}{' '} </span> ))} </StyledTeleprompter> {results && <Interim>{results}</Interim>} </React.Fragment> );
In order to actually scroll the teleprompter, we'll add another React.useEffect
hook. In this one we'll want to be invoked once the progress
prop has changed. We'll grab the scrollRef
's current value and querySelector
the data-index
that is 3 words past what the current progress is currently set to. That's to hopefully scroll before we run out of words that are in view.
Here we'll use the optional chaining operator in case nothing was found, but if it was, we'll use the Element.scrollIntoView()
method, passing behavior smooth, block nearest, and inline start.
React.useEffect(() => { scrollRef.current .querySelector(`[data-index='${progress + 3}']`) ?.scrollIntoView({ behavior: 'smooth', block: 'nearest', inline: 'start', }); }, [progress]);
Element.scrollIntoView
Browser Compatibility
Support for Element.scrollIntoView()
is surprisingly really good, which is great because it's such a handy feature.
Updating to the Next Progress Value
The trickiest part of the app is trying to figure out where the current progress should be. To do this, we'll introduce a newIndex
variable, break up the interim
string back into an array, and compare each word with the next expected unspoken word in the teleprompter script.
To make comparison easier we'll use two techniques. One is we'll create a cleanWord
function to trim whitespace, lowercase the string, and replace any non-alpha characters with an empty string and next, we'll leverge the string-similarity
library from npm.
import stringSimilarity from 'string-similarity'; const cleanWord = (word) => word .trim() .toLocaleLowerCase() .replace(/[^a-z]/gi, '');
If the similarity between our words is greater than 75% then we'll increment our index by one otherwise we'll keep it the same. Then if our newIndex
is greater than it was previously and is less than the total number of words, then we'll let our consuming component know that something has changed.
React.useEffect(() => { const handleResult = ({ results }) => { const interim = Array.from(results) .filter((r) => !r.isFinal) .map((r) => r[0].transcript) .join(' '); setResults(interim); + const newIndex = interim.split(' ').reduce((memo, word) => { + if (memo >= words.length) { + return memo; + } + const similarity = stringSimilarity.compareTwoStrings( + cleanWord(word), + cleanWord(words[memo]), + ); + memo += similarity > 0.75 ? 1 : 0; + return memo; + }, progress); + if (newIndex > progress && newIndex <= words.length) { + onChange(newIndex); + } }; recog.current.addEventListener('result', handleResult); return () => { recog.current.removeEventListener('result', handleResult); }; }, [onChange, progress, words]);
Conclusion
It's pretty amazing some of the features that are availble in many browsers. Although SpeechRecognition
isn't everywhere yet, it's a pretty powerful feature and was definately fun to play with. I hope you enjoy using it as well and find fun and unique ways to leverage the feature.
NOTE: This is a beginning of an egghead playlist that I plan to grow with additional refactors and new features.
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。