Starkey Research & Clinical Blog

A Digital Finger on a Warm Pulse: Wearables and the future of healthcare

 

Taken together, blood pressure, glucose and oxygenation levels, sympathetic neural activity (stress levels), skin temperature, level of exertion and geo-location provide a very informative, in-the-moment picture of physiological status and activity. All provided today by a clinical grade smart monitors used in medical research projects around the world. Subtle changes in patterns over time can provide very early warnings of many disease and dysfunctional states (see this article in the Journal Artificial Intelligence in Medicine).

It is well established that clinical outcomes are highly correlated with timely diagnosis and efficient differential diagnosis. In the not too distant future your guardian angel is a medic-AI using machine learning to individualize your precise clinical norms matched against an ever-evolving library of norms harvested from the Cloud. You never get to your first cardiac event because you take the advice of your medic-AI and make subtle (and therefore very easy) modifications to your diet and activity patterns through your life. If things do go wrong, then the paramedics arrive well before your symptoms! The improvements in quality of life and the savings in medical costs are (almost) incalculable. This is such a hot topic in research at the moment that Nature had a recent special news feature on wearable electronics.

There are, however, more direct ways in which your medic-AI can help manage your physiological status. Many chronic conditions today are dealt with using embedded drug delivery systems, but they need to be coupled with periodic hospital visits for blood tests and status examinations. Wirelessly connecting your embedded health management system (which includes an array of advanced sensors) to your medic-AI avoids all that. And in fact, the health management system can be designed to ensure that a wide range of physiological parameter remain within their normal ranges despite the occasional healthy living lapse of its host.

For me as a neuroscientist, the most exciting developments in the areas of sensor technology are in the ambulatory measurement of brain activity. Recent work in a number of research laboratories have used different ways to measure the brain activity of people listening to multiple talkers in conversations, not unlike the cocktail party scenario. What they have found is nothing short of amazing. Using relatively simple EEG recordings with scalp electrodes and the audio streams of the concurrent talkers together with rather sophisticated machine learning and decoding, these systems are able to detect which talker the listener is attending to. Some research indicates that not only the person but the spatial location can be decoded from the EEG signal and that this process is quite resistant to acoustic clutter in the environment.

This is a very profound finding as it shows how we can follow the intention of the listener in terms of how they are directing their attention and how this varies over time. This provides important information that we can use to direct the signal processing produced by the hearing aid to focus on the spatial location of the listeners and to enhance the information being processed that the listener wants to hear – effectively defining for us what is signal and what is noise when the environment is full of talkers of which only one is of interest at any particular instance in time.

Other very recent work has been demonstrating just how few EEG electrodes are needed to get robust signals for decoding once the researchers know what to look for. Furthermore, the recordings systems themselves are now sufficiently miniaturized so that these experiments can now be performed outside the laboratory while the listeners are actually engaged in real-world listening activities. One group of researchers at Oxford University actually have their listeners cycling around the campus while doing the experiments!

These developments demonstrate that the bio-sensors necessary are sufficiently mature in principal to cognitively control signal processing to produce targeted hearing enhancement. This scenario also provides a wonderful example of how the hearing instrument can share the processing load depending on the time constraints of the processing. The decoding of the EEG signals will require significant processing but this processing is not time dependent – a few 100 ms is neither here nor there – a syllable or two in the conversation. The obvious solution is that the Cloud takes the processing load and then sends the appropriate control codes back to the hearing aid either directly or via its paired smartphone. As the smartphone is also listening into the same auditory scene as the hearing aid, it can also provide another access point for sound data that could also provide additional and timelier processing capability for other more time critical elements.

But no one is going to walk around wearing an EEG cap with wires and electrodes connected to their hearing aid. A lot of sophisticated industrial design goes into a hearing aid but integrating such a set of peripheral so that they are acceptable to wear outside the laboratory could well defeat the most talented designers. So how do we take the necessary technology and incorporate it into a socially, friendly and acceptable design? We start by examining developments in the world-wide trend of wearables and examine some mid-term technologies that could well play into the market as artistic and symbols of status.

 

Sources:

 

 

Ubiquitous computing a.k.a. The Internet of Things

In The Fabric of Tomorrow, I spoke briefly about Mark Weiser’s influential article in Scientific America where he coined the term “ubiquitous computing.” As with many great ideas, this has a long and illustrious lineage and indeed has continued to evolve. Alan Turing wanted his computers to communicate with each other as well as humans (1950); Marshall McLuhan (1964) identified electric media as the means by which “all previous technologies – including cities – will be transformed into information systems” and in 1966 the computing pioneer Karl Steinbuch declared that “In a few decades time computers will be interwoven into almost every industrial product”. Of course, things really got going when DARPA invested in ARPAnet (1969) and TCP/IP was implemented in the early 1970’s (see http://postscapes.com/internet-of-things-history for a great timeline).

Weiser points out that “The most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it.” This disappearing act demonstrates how technology has seamlessly become an essential part of our everyday lives. Devices have become part of the process of engaging in particular activities. We would miss them if they were gone – as anyone knows when they are separated from their smartphone! But when present, they are invisible.

Mark Weiser’s particular goals at Xerox’s Palo Alto Research Center (PARC) were about augmenting human interaction and problem solving, and he conceived three classes of smart devices. (i) Tabs – wearable (inch) sized devices such smart badges etc; (ii) Pads hand held devices (feet) the size of a writing pad  and (iii) Yard sized devices for interaction and display (e.g. smart boards). These are all macro devices and since his initial ideas others have incorporated device classes on sub millimetre scales. These include (iv) Dust– mm and sub mm sized micro-electro mechanical systems (MEMS) and Smart Dust which are minute wirelessly enabled sensors; (v) Skin – fabrics based on light emitting polymers and flexible organic devices such as OLED displays and (vi) Clay – ensembles of MEMS devices that can form configurable three dimensional shapes and act as so called tangible interfaces that people can interact with (see https://en.wikipedia.org/wiki/Hiroshi_Ishii_(computer_scientist)). Critically, these latter classes of devices usher in new ways of thinking about the interactions between devices and users and the environment. The early thinking was a straightforward reflection of the existing tools for interaction and collaboration, but the latter classes take this thinking down paths untraveled – no doubt some will be blind alleys but others could add motifs and methods that have yet to be conceived.

The term “Internet of Things” (IoT) has been attributed to Kevin Ashton (1999) who had been looking at the ways in which RFID tags, together with the Internet, could be used for advanced supply chain management. Here we see the focus on the sensor and the identity of that which is sensed: This begins to fill out our analogy of the peripheral nervous system of the Cloud. But more importantly, is also begins to inform how we might exploit these ideas in the development of the next generation of hearing technologies. For instance, in Starkey Research we have a project that combines the listening and analytical capabilities of a smart phone to analyse a particular acoustic environment and also to record via Bluetooth, the hearing aid settings selected by the user in that environment. By uploading that information to the Cloud, we can then “crowd source” user preferences for different environment classifications thereby enabling better adaptive pre-sets and controls.

The wireless connection of the smart phone to the hearing instrument is only the first step along the road enabled by the IoT. The hearing aid is connected not just to the phone but to anything the phone can connect too, including the Cloud. In the example above, the hearing instrument off-loads the processing around the environmental classification to the phone, which in turn uploads the data to the Cloud. It is the offline analysis of the data amassed from a wide range of users that provides the associations between environments and the settings, that is, the knowledge that can then inform our designs. On the other hand, there is no reason, in principle, why the data in the Cloud can’t also be used to modify, on the fly, the processing in the hearing instrument.

The point is that, under the new paradigm, the hearing aid is no longer an isolated instrument that is set and forget. It can be updated and modified on the fly using machine level or human interaction or a combination of the two. The user, the health professional, the manufacturer, the crowd can all be leveraged to increase the performance of the instrument. The instrument itself becomes a source of data that can be used to optimize its own operation, or in aggregate with the crowd, the operation of classes of instruments.

The capacity to optimize the instrument at the level of the individual will be dependent, in part, on the nature and quality of the data it can provide.

Read Informed Dreaming here.

Read The Fabric of Tomorrow here.

Read The Power of the Cloud here.

The Power of the Cloud

 

In “The Fabric of Tomorrow,” I laid out a rather high level road map for the ensuing discussion. Now it is time to start digging a bit more into the details and more importantly, understanding how these developments can be leveraged effectively by what we do at Starkey Research.

Let’s start with the Cloud! First the inputs: Ubiquitous computing and seamless interconnectivity are like the peripheral nervous system to the Cloud. Through them, the Cloud receives all its sensory data – the “afferent” information about the world. Data that covers so many more realms than that of the human senses and with a precision and rate that eclipses the sum of all information in previous human history.

Second the outputs: This peripheral nervous system also takes the “efferent” signal from the Cloud to the machines and the displays that will effect the changes in the world – the physical, the intellectual and the emotional worlds we inhabit. We will come back to the peripheral nervous system and its sensors and effectors later – for the moment let’s focus on the Cloud.

People’s expectations and predictions about technology are replete with fails seen in predictions like:

“I think there is a world market for maybe five computers.” – Thomas Watson, chairman of IBM, 1943

“There is no reason anyone would want a computer in their home.” – Ken Olson, president, chairman and founder of Digital Equipment Corp., 1977.

“640K ought to be enough for anybody.” – Attributed to Bill Gates, 1981.

The future is indeed, very hard to foresee. On the other hand, for what we do in Starkey Research, we need to temper our enthusiasm or optimism to properly position our work to deliver in 5 or 10 years time into the real and not the imagined. In contrast to the unbridled excitement of Ray Kerzwiel’s visions of the future, in Starkey Research we have to build and deliver real things that solve real problems!

So with those cautions in mind, what can we say about the Cloud? Electronics Magazine solicited an article from Gordon Moore in 1965 where he made the observation and prediction that the number of components on an integrated circuit board would continue to double each year for at least the next 10 years (he later revised the doubling period to two years). Dubbed by Carver Mead as “Moore’s law”, this came to represent not just a prediction about the capacity of chip foundries and lithographers to miniaturise circuits but a general rubric for improvements in computing power (i.e. Moore’s Law V2.0 & V3.0).

The Cloud, while still based on the chips described by Moore’s law, presents as a virtually unlimited source of practical computing power. The single entity computational behemoths will likely live on in the high security compounds of the world’s defense and research agencies, but for the rest of us, server farms provided by Amazon (AWS), Google (GCE), Windows (Azure) and the like can provide a virtually unlimited source of processing power. No longer are we tied to the capacity of the platform we are using. As long as that platform can connect to the Cloud then the device can share its processing needs with this highly scalable service.

But this comes at a price and that price is time. Although fast, network communications have delays that relate to the switching and routing of the message packets, the request itself is queued and the processing itself takes a finite interval of time before the results are sent back along the network to the requesting device.  At the moment, with a fast network and a modest processing request, the time taken amounts to about the time it takes to blink (~350 ms). For hearing technology this is a very important limitation as the ear is exquisitely sensitive to changes over time. For instance, when sounds are taken in through the ear, there is a delay between processing and comprehension, a delay that can detrimentally influence not only how the sound is interpreted but also a person’s ability to successfully maintain a conversation. This means that we need to find ways to locally processes those elements that are time sensitive and to off-load those processes where a hundred milliseconds or so are not important.

Of course the Cloud is more than just processing power, it also represents information – or more correctly data. Estimating, let alone comprehending, the amount of data currently transmitted across this peripheral nervous system and potentially stored in the Cloud is no mean feat. It requires the use of numbers that are powers of 1000 (terabyte 10004; petabyte 10005; exabyte 10006; zettabyte 10007 and so on). An estimate of traffic can be derived from Cisco’s published forecast figures in 2013 for 2012–17 which indicate that the annual global IP traffic will pass the zettabyte threshold by the end of 2016 and by 2017 global mobile data traffic will reach 134 exabytes annually; growing 13-fold from 2012 to 2017. As for storage, estimate place Google’s current storage at between 10-15 exabytes and Google is but one of the players here – it would be very difficult to determine, for instance, the storage capability of the NSA and other worldwide governmental agencies.

Of course these numbers are mindboggling and there is a point where the actual numbers really don’t add anything more to the conversation. This is just Big Data! What these imply however, is that a whole new range of technologies and tools need to be developed to be able to manipulate these data to derive information. Big Data and Informatics in general have huge implications for the way we conceive how we manage hearing impairment and deliver technologies to support listening in adverse environments.

 

 

 

The Fabric of Tomorrow

In Informed Dreaming, we explored the impact of the rate of scientific discovery and technology change on research in general and on hearing aid research in particular. From here will begin to look more closely at how some of that change will manifest itself in the everyday technologies of tomorrow. So let’s précis that roadmap.

There are two main technological forces in this story – computing power and connectivity. These are quite literally the backbone from which many other profoundly influential players will derive their power. If there was only one dominant idea, it would be ubiquitous computing – a term coined by the brilliant computer scientist Mark Weiser in 1991 in his influential Scientific America article “The Computer for the 21st Century.” As head of computer science at Palo Alto Research Center in Palo Alto, he envisaged a future where our world was inextricably interwoven with a technological fabric of networked “smart” devices. Such a network has the capability to manage our environments from the macro down to a detailed, individualized level – everything from the power grid to the time and temperature of that morning latte.

But these devices are also inputs to the system – detectors and sensors feeding a huge river of information into the central core, or cloud as we now know it. Many of these are already worn by people (mobile phones, smart watches, activity monitors etc. all uploading to the cloud) and the sophistication and bio-monitoring capability of these wearables is increasing by the week. Moreover, many of these sensors are stationary but have highly detailed knowledge about their transactions – cashless transactions record the person the time the place and the goods, tapping on and off public transport, taking a taxi, an Uber, a flight, a Facebook post, street closed-circuit television security systems, your IP address, cookies and the browser trail, etc.

Notwithstanding the issues of privacy (if indeed that still exists), this provides an inkling of the data flowing into the cloud – no doubt only the very tip of this ginatic iceberg. Big Data is here and it is here to stay, and although Google is King, these particular information technologies are but babies.

I was fortunate enough to attend the World Wide Web conference in 1998 where Tim Berners-Lee, the man who invented the World Wide Web while working at CERN in 1989, began promoting the idea of the Semantic Web – a means by which machines can efficiently communicate data between each other. In the ensuing years, much work has gone into developing the standards and implementing the systems. In that time however, two other massive developments have also occurred that may overshadow or subsume these efforts: On the one hand – natural language processing has matured using both text and audio in the forms of Siri, Google Talk and Cortana to mention just a few. On the other hand, driven by huge strides in cognitive neuroscience, processing power and advanced machine learning, we are witnessing a rebirth of Artificial Intelligence (AI) and the promise of so-called Super Intelligence.

So just how can we design listening (hearables) technologies, hearing aids in particular, that can capitalize on these profound developments? Well, let’s take a sneak peek at what a future hearing aid might look like in this brave new world.

Imagine a hearing aid that can listen to the world around the wearer and break down that complex auditory scene into the key relevant pieces – sorting the meaningful from the clutter. A hearing aid that can also listen into the brain activity of the listener and identify the wearer’s focus of attention and enhance the information from that source as it is coded by the brain. A hearing aid that in fact, is not a hearing aid but a device that people wear all the time as a communication channel for other people and machines, for their entertainment, as a brain and body monitor that also maps their passage through space. Such a device provides support in adverse listening conditions to the normally hearing and the hearing impaired alike – it simply changes modes of operation as appropriate.

Possibly the most surprising thing about this scenario is that, in advanced research laboratories around the world (including Starkey Research), the technologies that would enable such a device exist RIGHT NOW. Of course they are not developed to provide the level of sophisticated sensing and control that are required to give life to this vision, nor are they in a form that people can put in their ears. But they do exist and if we have learned anything from watching the progress of science and technology over the last few decades, their emergence as the Universal Hearable Version 1.0 will likely happen even sooner than we might sensibly predict from where we now stand.

 

Informed Dreaming for a Better Hearing Tomorrow

Part of my day job is to dream – not to daydream but to dream in a disciplined and focused way! I call this informed dreaming, and I believe it is essential for some of the other parts of my job. Because what I do is invent the future. Not the whole future – just a little slice. But this is a very important slice of the future. As Senior Director of Research at Starkey Hearing Technologies, envisioning the future is an essential part of designing the listening technology for tomorrow.

Hearing aids have undergone amazing changes over the last couple of decades. The move to digital ushered in a new age. Enabling technologies such as multiband compression, feedback cancellation, noise reduction, speech enhancement, environment classification and a host of other signal processing technologies that have significantly extended listening capability.

Wireless was the next major stepping stone, allowing direct communication and control from smart phones, the development of enhanced directional technologies, binaural linking and preservation of spatial cues, and new forms of noise reduction. The well is far from dry.

But what’s the next big step? Good research — research that takes the solutions to the next level and has a time horizon beyond the immediate capabilities of current platforms and technologies. Ten or even five years out, we have to imagine the capabilities of the technological environments in which our new devices will land. This is where the informed dreaming comes in. Predicting the future is a perilous business but an essential component of the sorts of applied research that we do at Starkey.

So what might this future world look like? The Greek philosopher Heraclitus (also known as “The Obscure” or the “weeping philosopher”) wrote that the only constant was change – quoted by Plato as saying that “you could not step twice into the same river.” Heraclitus could have never imagined how fast that river could flow – a torrent, a rapid that sweeps all before it! Today, the landscape, the very course of the river changes before our eyes.

What we do in research now is based on the science and research of millions of scientists across the world. One estimate of the size of today’s scientific knowledge is the number of peer reviewed articles, which according to the influential scientific journal Nature last year totalled 1.8 million peer reviewed articles published cross 28,000 scientific journals. More to the point, this number is increasing with a compound growth rate of 9 percent a year – this means that the scientific knowledge is doubling every nine years! It shouldn’t surprise us then that in 10 years’ time, like Dorothy, we might suspect that “Toto, I’ve a feeling we’re not in Kansas anymore.”

Over the next few weeks this blog will explore technology and social changes that are extremely relevant to our mission to transform the lives of millions of people whose hearing is challenged. Beethoven, the musical genius who bridged and defined the Classical to the Romantic periods of western music, wrote to his brothers at the onset of his own deafness.  For him it was the crippling social impairment, the loss of his ability to communicate with those he cared for and loved, that drove him to contemplate suicide. It wasn’t his inability to hear the notes of the piano that made him most desperate (although he lamented this most keenly). The great insult to his life was the social isolation that deafness forced upon him. He could still hear his music in his mind. He could only guess at the rest. Fortunately for us, he chose a more philosophical route. In 1802, he wrote

“Forced already in my 28th year to become a philosopher, O it is not easy, less easy for the artist than for anyone else – Divine One thou lookest into my inmost soul, thou knowest it, thou knowest that love of man and desire to do good live therein.”  (see HERE for a scan of the original letter and a translation)

His brothers (Carl and Johann) never received his letter – it was found amongst his papers after his death, but it is a most poignant statement of the catastrophe that hearing impairment visits upon all humankind.

It is critical that we understand the possibilities that the raging river of scientific discovery can provide to remove this veil of isolation, this inability to communicate that forces itself upon otherwise engaged and productive individuals.

Over the next few weeks, this blog will introduce us to the Internet of Things – a near future state, where not only are the things in the world connected and communicating but include a huge range of sensors and data gathering devices that provide a rich and detailed real-time picture of the world. This blog will touch on Big Data, the Semantic Web, Artificial Intelligence and Super Intelligence. We are already immersed in some of this and the only uncertainty is not “if” but “when.” Wearables and hearables, biosensors that touch the skin or dwell beneath the skin, tattoos that transmit, jewellery that knows the focus of the mind’s eye and much more!

My challenge and the challenge of my team, is to understand how we leverage these technologies and this tumultuous torrent of scientific discovery to improve the lives of millions.

 

The Christmas Party Problem: Guest Post from Dr. Simon Carlile

 A version of this blog first appeared as an article in the Australian Audiology Today Christmas edition.

One problem with Christmas parties is that there are so many of them and picking which ones to go to can be difficult. Something to influence your decision (other than the quality of the wine on offer) might be where the party is being held. The downtown club with disco music pounding away might be great if you want to dance the night away but that type of venue is not going to help you develop your network with witty conversation and one-liners. Of course, the real Christmas party challenge, even in less busy environments, is hearing and understanding what others are saying at such gatherings; a problem that is virtually insurmountable for those with even a moderate hearing loss.

The Original “Cocktail Party”

Colin Cherry was the first to coin the phrase “the cocktail party problem,” and it seems appropriate to paraphrase that term in regards to this Christmas issue. While most people reading this article have probably come across this term, not many will have the opportunity to read Cherry’s original paper – and what an interesting read it is! His brief, but very influential paper, “Some experiments on the recognition of speech with one and with two ears” first appeared in the Journal of the Acoustical Society in 1953 and is remarkable for a number of reasons.

First, in coining the term the “cocktail party problem,” the question for Cherry was “How do we recognize what one person is saying when others are speaking at the same time?” Two important ideas can be drawn from this, both of which relate to the fact that the conversational environment of the cocktail party involves multiple talkers rather than just one talker and background noise. The first idea is that some talkers will be conveying information that is of interest and also not of interest, i.e. conversation is a multisource listening challenge where focus must quickly switch between sources. The second idea is that many of the talkers’ voices will be what constitutes noise. This is important because the nature of the background sounds are important in terms of the type of masking needed to enable focusing on the sound of interest and the sorts of processing available to the auditory system to ameliorate that masking (see “A primer on masking” below).

Second, Cherry’s paper is mostly about selective attention in speech understanding, the role of the “statistics of language,” voice characteristics and the costs and time course of switching attention. In the Introduction he makes a very clear distinction between the kinds of perceptions that are studied using simple stimuli, such as clicks or pure tones, and the “acts of recognition and discrimination” that underlie understanding speech in the “cocktail party” environment. Cherry’s paper has been cited nearly 1,200 times, but interestingly enough, the greater proportion of those focused on detecting sounds on a background of other sounds used simple stimuli such as tones against broadband noise or other tones. Hardly the rich and complex stimuli that Cherry was talking about. Of course this was very much the bottom-up, reductionist approach of the physicists and engineers in Bell Labs and elsewhere who had had an immense influence on the development of our thinking about auditory perception, energetic masking in particular (See Box – “A primer on masking” and the discussion of the development of the Articulation Index).

An excellent and almost definitive review of this literature is provided by Adelbert Bronkhorst in 2000: “The Cocktail Party Phenomenon: A Review of Research on Speech Intelligibility in Multiple-Talker Conditions.” The research over that period focused on energetic unmasking. For instance: the head shadow producing a “better ear advantage” by reducing the masker level in the ear furthest from the source, the effects of binaural processing or the effects of the modulation characteristics of speech and other maskers. So, on the one hand, the high citation rate for Cherry’s paper is very surprising because there is very little in the original paper that relates to energetic masking. On the other hand, the appropriation of the term “the cocktail party problem” and the reconfiguring of the research question demonstrates the powerful influence of the bottom-up, physics-engineering approach to thinking about auditory perception. This had become the lens through which much thinking and research was viewed. To be fair though, Bronkhorst does point out in his review that there were some data in the literature involving speech-on-speech masking that were not well explained by energetic masking but that this had not been a particular focus of the research.

 

Informational Masking

The turn of the century was propitious for hearing science as it marked another turning point in our thinking about this “cocktail party” problem. In 1998, Richard Freyman and colleagues reported that differences in the perceived locations of a target and maskers (as opposed to actual physical differences in location) produced a significant unmasking for speech maskers but not for noise. Such a result was not amenable to a simple bottom-up explanation of energetic masking. Thus, Freyman appropriated the term “information masking” which had been previously used in experiments involving relatively simple stimuli. This was the first time it had been applied to something as complex and rich as speech. As we shall see in more detail later, the unmasking produced in this experiment depended on the active, top-down focus of attention. As previously mentioned, Bronkhorst had pointed out that others had noted speech interference of speech understanding seemed to amount to more than the algebraic sum of the spectral energy. Indeed, as early as 1969, Carhart and colleagues had referred to this as “perceptual masking” or “cognitive interference.” Along those lines, information masking in the context of the perceptual unmasking in Freyman’s and later similar experiments came to stand for everything that wasn’t energetic masking.

Over the ensuing 15 years, many studies have been carried out examining the nature of information masking. A number of general observations can be made and some of these are drawn out in the “Primer” below. One very important shift however, was that the “cocktail party problem” became increasingly seen as a particular case of the general problem of auditory scene analysis (ASA). This is the problem of “acoustic superposition” where the energy from multiple concurrent sounds converges on a single encoder; in this case the cochlea of the inner ear. The first task of the auditory system then, is to work out which spectral components belong to which sound sources and to group them together in some way. The second task is how these now segregated components are joined up in time to provide a stream of information associated with a specific sound.

 

Auditory Scene Analysis

Albert Bregman did much to promote thinking in this area with the publication of Auditory Scene Analysis in 1992, marking a significant return of Gestalt thinking to the study of auditory perception. Although this part of the story is still being worked out, it is clear that much of the grouping and steaming processes underlying ASA are largely automatic, that is bottom-up, and they capitalize on the physical acoustics of sounding bodies – probably not surprising given that the auditory system evolved in a world of physically sounding bodies and “the cocktail party problem” is a common evolutionary challenge for nearly all terrestrial animals. The perceptual outcome of this process is the emergence of auditory objects that usually correspond to the individual physical sources. Indeed, many of the experimental approaches to understanding ASA involved stimuli which created perceptual objects that were in some way ambiguous and also looking at the illusions and/or confusions that such manipulation creates.

In the case of “the cocktail party problem”, the speech from each talker forms a specific stream and the problem becomes more about how we are able to select between each of the streams. In practical terms, the greater the differences between the talkers on some dimension (pitch, timbre, accent, rhythm, location etc.), the less likely we are to confuse the streams. That is, the greater stream variety, the more information unmasking we can expect.

This brings us to the key role of attention in understanding listening in a “cocktail party” scenario. Attention has been thought of as a type of filter that can be focused on a feature of interest, allowing for an up-regulation of the processing of information within that filter with a potential down-regulation of information outside the filter. A physical difference in some aspect of the auditory stream provides the hook onto which the listener can focus their attention. In recognizing the critical role that attention plays in understanding what is happening in a cocktail party scenario, it does move the discussion from “hearing” to “listening” and closer to Cherry’s goals of understanding the “acts of recognition and discrimination” that underlie the understanding of speech.

 

Auditory Attention

The neuroscience of auditory attention is in its infancy compared what we know about visual attention, although some tentative generalizations can be made:

Attention is a process of biased competition. The moment to moment focus of attention is dependent on competition between (1) top-down, voluntary or endogenous attentional control and (2) bottom-up, saliency driven or exogenous attention. The cognitive capacity to focus attention plays a key role in the sustained attention necessary to process the stream of information from a particular talker. There is evidence that we listen to only one auditory object at a time and selective attention is critical in enabling this. The exogenous competition introduced by concurrent sounds, particularly other talkers (the distractors) means more cognitive effort is required to sustain attention on a particular target of interest. The implication for an ageing population is that any reduction in cognitive capacity to sustain attention will increase the difficulty of understanding the stream of information from a single talker in the presence of other talkers.

Selective attention works at the level of perceptual objects as opposed to a particular physical dimension such as loudness or pitch. That is, attention focuses on the voice or the location of a particular talker (or both simultaneously – see below). While the attentional hook might be a difference on a particular perceptual dimension, the sum total of characteristics that make up the perceptual object are what becomes enhanced. Models of attention suggest that the competition for attention is played out in working memory and the players are the sensory objects contained in working memory at any particular point in time. Indeed, our conscious perception of the world relies on this process.

What this means, is when auditory objects are not well defined then the application of selective attention can be degraded. There are a number of circumstances where this can happen. For instance, when the stimuli themselves are ambiguous and don’t possess the relevant acoustical elements to support good grouping and streaming. Alternatively, the stimuli themselves may possess the necessary physical characteristics; however, poor encoding at the sensory epithelia and/or degraded neural transmission of the perceptual signal can result in a reduced fidelity or absence of the encoded features necessary for grouping or streaming. Implications for hearing impairment are that degradation of sensory encoding, such as that produced by broader auditory filters (critical bands) or poor temporal resolution, will weaken object formation and make the task of selective attention that much harder.

Attention acts as both a gain control and a gate. There is a growing body of evidence that indicates attention modulates the activity of neurones in the auditory system, not only at a cortical level but even earlier in the signal chain, possibly even at the level of the hair cells of the cochlea. In a number of recent and ground-breaking experiments, this process of up-regulation of the attended talker and down-regulation of the maskers has been convincingly demonstrated in the auditory cortex of people dynamically switching their attention between competing talkers (Mesgarani & Chang, 2012; Ding & Simon, 2013). Importantly, the strength of the selective cortical representation of the “attended-to” talker correlated with the perceptual performance of the listener in understanding the targeted talker over the competing talker.

The auditory system engages two different attentional system – one focused on the spatial location of a source and one focused on non-spatial characteristics of the source – which have two different cortical control systems. In a 2013 study, Adrian “KC” Lee and colleagues (Lee et al, 2013) had listeners change their attentional focus while imaging the brain. They found that the left frontal eye fields (FEF) became active before the onset of a stimulus when subjects were asked to attend to the location of a to-be-heard sound. This is part of the so-called dorsal attention pathway thought to generally support goal-directed attention. On the other hand, when asked to attend to a non-spatial attribute of the stimulus such as the pitch, a different pattern of pre-stimulus activation was observed in the left posterior central sulcus, an area also associated with auditory pitch categorization. This suggests that for the hearing impaired, a loss of the ability to localize the source of a sound disables or degrades a significant component of the auditory attention system resulting in an increased reliance on the non-spatial attention system.

Returning to Colin Cherry’s paper, it appears that we have — to paraphrase T.S. Eliot —“arrived where we started and know the place for the first time.”

So much of what Cherry discussed in his seminal paper is where we now find our neuroscientific focus including: the statistics of language in terms of its phonetic and semantic characteristics; the focus of attention and how that is mediated by spatial location and/or vocal or other characteristics; the transitional probabilities of what is being said and so on. The difference now is that we have both the technical and analytical tools to get a handle on how these processes are represented in the brain. With an increasing understanding of the functional plasticity of the brain, we are at a point now where we are making advances in the understanding of human perception and cognition that will have significant ramifications for how we intervene, support and rehabilitate many of the disorders that manifest as hearing impairment.

Further Reading

Cherry, E.C. (1953). “Some experiments on the recognition of speech with one and with two ears” J Acoust Soc Am, 25:975

Bronkhorst, A. (2000). “The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions” in Acustica 86:117-128.

Lee, A. K. C., et al. (2012). “Auditory selective attention reveals preparatory activity in different cortical regions for selection based on source location and source pitch.” Frontiers in Neuroscience 6: 190-190.

Mesgarani, N. and Chang, E. F. (2012). “Selective cortical representation of attended speaker in multi-talker speech perception.” Nature 485: 233-236.

Ding, N. and Simon, J. Z. (2012). “Emergence of neural encoding of auditory objects while listening to competing speakers.” Proceedings of the National Academy of Sciences of the United States of America 109: 11854-9.