Sept 2019 The Hearing Journal
Hearing loss is a major health problem, with 5.3 percent of the global population living with debilitating hearing impairment.Given the gradual increase in life expectancy, the number of people with a debilitating hearing impairment is expected to nearly double over the next 30 years. Hearing loss has important perceptual consequences and can accelerate cognitive decline. Additionally, loss of communication impacts social skills and promotes isolation and loss of confidence, particularly among the elderly. The most common complaint of listeners with hearing loss is difficulty understanding a conversation in noisy environments, such as in a restaurant or at a cocktail party. These listeners usually have difficulty hearing a speaker's voice amidst competing sound sources (the problem of a low signal-to-noise ratio).
Hearing aids try to correct the user's frequency-specific loss of sensitivity by amplifying the specific frequencies accordingly. Despite the noted benefits of hearing aids, only a small proportion of people who need these devices actually use them. One major factor that reduces the enthusiasm for hearing aids is their failure to restore the ability to selectively perceive a speaker because the devices amplify the background noise together with the target speech. While signal processing algorithms can suppress simple background noises, the enhancement of the target speaker fails when the noise and speech are not acoustically different, such as when the noise is coming from another speaker. In such scenarios, no speech enhancement method can help without first knowing which speaker the subject wants to focus on. This condition requires an additional control signal that tells the hearing aid system which speaker is the target and which speakers are interferences. Possible examples of such a control signal include head and gaze direction and manual selection. These solutions, however, are neither natural nor satisfactory; users might want to attend to sources to the side or behind them, users might not want to constantly operate a hand-held manual-selection device, or the target and interfering speakers may be close to each other.
POSSIBILITIES WITH BRAIN CONTROL
In the past, we proved that the human auditory cortex selectively represents the attended speaker relative to unattended sources.3 So when a listener focuses on a specific speaker in a crowded environment, the brainwaves of the listener track the voice of the target speaker. This scientific breakthrough has motivated the prospect of a brain-controlled hearing aid that constantly monitors the brainwaves of a listener and compares them with sound sources in the environment to determine the talker to whom a subject is attending. The device will then amplify the attended speaker relative to others to facilitate hearing that speaker in a crowd. This process is called auditory attention decoding (AAD), a research area that has grown considerably in recent years. Multiple problems must be resolved to make a brain-controlled hearing aid feasible, including noninvasive and nonintrusive methods to measure the neural signals and designing effective decoding algorithms for accurate and rapid detection of attentional focus. Another major challenge is the lack of individual speaker audio. In realistic situations, we only have the mixed audio of speakers recorded from one or more microphones. Therefore, the first step in AAD is to automatically separate the speakers in the mixed audio. However, speaker-independent speech separation (meaning with no prior knowledge of specific speakers) is a very difficult problem that only recently has seen progress towards a solution.
We recently proposed a framework that incorporates speaker-independent speech separation into AAD without needing the individual speaker audio. A critical component of this system is a real-time, low-latency speech separation algorithm based on deep neural network models. These models approximate the computation performed by the biological neurons, and have proven to be extremely effective in most machine learning tasks. Because this system can generalise to new speakers, it overcomes a major limitation of our previous AAD approach that required training of target speakers. To test the feasibility of this brain-controlled hearing device, we used invasive electrophysiology to measure neural activity from three neurosurgical patients undergoing treatment for epilepsy. These patients had clinically-implanted electrodes in their superior temporal gyrus (STG), a brain area that we had previously shown to selectively represent the attended speaker. Each subject was presented with a mixture of simultaneous speech stories and instructed to focus his or her attention on one speaker and ignore the others. The listeners’ brainwaves were then compared to the separated sound sources from the neural networks, and the speaker, most like the brainwaves, was amplified relative to the other speakers to facilitate listening.
To test if the difficulty of attending to the target speaker is reduced using the proposed system, we performed a psychoacoustic experiment comparing the perceived quality of the original mixed audio to the perceived quality of the audio in which AAD was used to detect and amplify the target speaker by 12 dB. Subjects were asked to rate the difficulty of attending to the target speaker when listening to (1) the original mixture and (2) the enhanced target speech using the output of the AAD system. Twenty listeners with normal hearing participated in the psychoacoustic experiment, in which they each heard 20 randomised sentences in each of the two experimental conditions. Subjects were asked to rate the difficulty of listening to the target speaker on a scale of one to five using the mean opinion score (MOS19). The barplots above show the median MOS +/- standard error (SE) for the two conditions. The average subjective score when using the AAD system showed a significant improvement over the mixture (100% improvement, paired t-test, p < 0.001), demonstrating that the listeners had a stronger preference for the modified audio than for the original mixture.
Our ongoing research on this problem focuses on advancing our understanding of auditory attention and its neural markers in the human auditory cortex and removing technological barriers to establishing the feasibility and efficacy of AAD for improving speech intelligibility and reducing the listening effort in people with hearing loss. This research will lead to a novel understanding of the neural mechanisms that enable a listener to focus on a speaker in multi-talker speech conditions, thus bringing brain-controlled hearing aid technologies a significant step closer to reality.