Events Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
M
T
W
T
F
S
S
31
1
4
5
10
11
12
17
24
25
26
28
29
30
1
2
3
30 Mar
2020-03-30 - 2020-03-31    
All Day
This Cardio Diabetes 2020 includes Speaker talks, Keynote & Poster presentations, Exhibition, Symposia, and Workshops. This International Conference will help in interacting and meeting with diabetes and [...]
Trending Topics In Internal Medicine 2020
2020-04-02 - 2020-04-04    
All Day
Trending Topics in Internal Medicine is a CME course that will tackle the latest information trending in healthcare today.   This course will help you discuss options [...]
2020 Summit On National & Global Cancer Health Disparities
2020-04-03 - 2020-04-04    
All Day
The 2020 Summit on National & Global Cancer Health Disparities is planned with the goal of creating a momentum to minimize the disparities in cancer [...]
2020 Primary Care Kauai- Caring For The Active And Athletic Patient
2020-04-06 - 2020-04-10    
All Day
CMX Travel and Meetings programs meetings and group conferences for physicians and medical professionals throughout the United States. CMX Travel and Meetings programs meetings and [...]
ISER- 787th International Conference On Science, Health And Medicine ICSHM
2020-04-07 - 2020-04-08    
All Day
ISER- 787th International Conference on Science, Health and Medicine (ICSHM) is a prestigious event organized with a motivation to provide an excellent international platform for the academicians, [...]
RW- 801st International Conference On Medical And Biosciences ICMBS
2020-04-08 - 2020-04-09    
All Day
About the EventConference : RW- 801st International Conference on Medical and Biosciences ICMBS is a prestigious event organized with a motivation to provide an excellent [...]
Palliative Care 2020
2020-04-08 - 2020-04-09    
All Day
ABOUT PALLIATIVE CARE 2020 Palliative Care 2020 welcomes attendees, presenters, and exhibitors from all over the world to Dubai, UAE. We are glad to invite [...]
The 4th Annual Dubai International Paediatric Neurology Congress
2020-04-09 - 2020-04-11    
All Day
Based on the sound success of previous Dubai International paediatric Neurology congresses the 4th Annual Dubai International paediatric Neurology Conference expects to attract over 400 delegates devoted [...]
13 Apr
2020-04-13 - 2020-04-14    
All Day
IASTEM - 814th International Conference on Medical, Biological and Pharmaceutical Sciences (ICMBPS) will be held on 13th - 14th April, 2020 at Dammam, Saudi Arabia . ICMBPS is to bring together [...]
Patient Engagement USA At Eyeforpharma Philadelphia
2020-04-14 - 2020-04-15    
All Day
As we enter election year in 2020, the pressure has never been higher on our industry to justify what we add to the cost of [...]
28th International Conference On Clinical Pediatrics
2020-04-15 - 2020-04-16    
All Day
It is our great pleasure to invite you to participate in the 28th International Conference on Clinical Pediatrics Clinical Pediatrics 2020 which will take place [...]
5th World Congress On Public Health And Health Care Management
2020-04-16 - 2020-04-17    
All Day
We would like to invite you all people to take part in our Public Health and Health Care Management-2020 Conference in Miami, USA during 16-17 [...]
Topics In Emergency Medicine, Pain Management, And Palliative Care CME Cruise
2020-04-18 - 2020-04-25    
All Day
These set of lectures is designed to provide important updates in emergency medicine with a focus on anticoagulation and the management of venous thromboembolism as [...]
RW- 809th International Conference On Medical And Biosciences ICMBS
2020-04-19 - 2020-04-20    
All Day
RW- 809th International Conference on Medical and Biosciences (ICMBS) is a prestigious event organized with a motivation to provide an excellent international platform for the academicians, researchers, [...]
RF - 627th International Conference On Medical & Health Science - ICMHS 2020
2020-04-20 - 2020-04-21    
All Day
Welcome to the Official Website of the  627th International Conference on Medical & Health Science - ICMHS 2020. It will be held during 20th-21st April, 2020 at San [...]
30th Annual Art And Science Of Health Promotion Conference
2020-04-20 - 2020-04-24    
All Day
Integrating Health Promotion into the Organization’s and Community’s Core Values A common element of virtually every successful health promotion program in workplace, clinical and community [...]
ISER- 796th International Conference On Science, Health And Medicine ICSHM
2020-04-21 - 2020-04-22    
All Day
ISER- 796th International Conference on Science, Health and Medicine ICSHM is a prestigious event organized with a motivation to provide an excellent international platform for [...]
Biomolecular Condensates Summit
2020-04-21 - 2020-04-23    
All Day
An ever-increasing amount of evidence points towards the importance of Biomolecular Condensates function to health and disease. However, with many of the fundamental questions behind [...]
The Middle East Pharma Cold Chain Congress
2020-04-22 - 2020-04-23    
All Day
The pharma sector in the MENA region has witnessed rapid development, which has been largely fueled by high population growth, increased life expectancy coupled with [...]
45th Annual Regional Anesthesiology And Acute Pain Medicine Meeting
2020-04-23 - 2020-04-25    
All Day
ASRA was officially "re-founded" in 1975, led by Alon P. Winnie, MD, who had a dream of a society devoted to teaching regional anesthesia. (An [...]
25th International Conference on Dermatology & Skin Care
2020-04-27 - 2020-04-28    
All Day
About Conference Derma 2020 Derma 2020 welcomes all the attendees, lecturers, patrons and other research expertise from all over the world to 25th International Conference on Dermatology & [...]
Events on 2020-03-30
Events on 2020-04-02
Events on 2020-04-03
Events on 2020-04-08
Events on 2020-04-14
Events on 2020-04-15
Events on 2020-04-22
Events on 2020-04-23
Events on 2020-04-27
Articles

A Brief History of ASR: Automatic Speech Recognition

This is the first post in a series on Automatic Speech Recognition, the foundational technology that makes Descript possible. We’ll be exploring the current state of the industry, where it’s heading — and, in this installment, where it’s been.


Descript is proud to be part of a new generation of creative software enabled by recent advancements in automatic speech recognition (ASR). It’s an exciting time: the technology recently crossed a threshold that sees it trading its longstanding promise for remarkable utility, and it’s only getting better.

This moment has been a long time coming. The technology behind speech recognition has been in development for over half a century, going through several periods of intense promise — and disappointment. So what changed to make ASR viable in commercial applications? And what exactly could these systems accomplish, long before any of us had heard of Siri?

The story of speech recognition is as much about the application of different approaches as the development of raw technology, though the two are inextricably linked. Over a period of decades, researchers would conceive of myriad ways to dissect language: by sounds, by structure — and with statistics.

Early Days

Human interest in recognizing and synthesizing speech dates back hundreds of years (at least!) — but it wasn’t until the mid-20th century that our forebears built something recognizable as ASR.

1961 — IBM Shoebox

Among the earliest projects was a “digit recognizer” called Audrey, created by researchers at Bell Laboratories in 1952. Audrey could recognize spoken numerical digits by looking for audio fingerprints called formants¹ — thedistilled essences of sounds.

In the 1960s, IBM developed Shoebox — a system that could recognize digits and arithmetic commands like “plus” and “total”. Better yet, Shoebox could pass the math problem to an adding machine, which would calculate and print the answer².

1961 — A demonstration of IBM’s Shoebox

Meanwhile researchers in Japan built hardware that could recognize the constituent parts of speech like vowels; other systems could evaluate the structure of speech to figure out where a word might end. And a team at University College in England could recognize 4 vowels and 9 consonants by analyzing phonemes, the discrete sounds of a language¹.

But while the field was taking incremental steps forward, it wasn’t necessarily clear where the path was heading. And then: disaster.

October 1969 The Journal of the Acoustical Society of America

A Piercing Freeze

The turning point came in the form of a letter written by John R. Pierce in 1969.

Pierce had long since established himself as an engineer of international renown; among other achievements he coined the word transistor (now ubiquitous in engineering) and helped launch Echo I, the first-ever communications satellite. By 1969 he was an executive at Bell Labs, which had invested extensively in the development of speech recognition.

In an open letter³ published in The Journal of the Acoustical Society of America, Pierce laid out his concerns. Citing a “lush” funding environment in the aftermath of World War II and Sputnik, and the lack of accountability thereof, Pierce admonished the field for its lack of scientific rigor, asserting that there was too much wild experimentation going on:

“We all believe that a science of speech is possible, despite the scarcity in the field of people who behave like scientists and of results that look like science.” — J.R. Pierce, 1969

Pierce put his employer’s money where his mouth was: he defunded Bell’s ASR programs, which wouldn’t be reinstated until after he resigned in 1971.

Progress Continues

Thankfully there was more optimism elsewhere. In the early 1970s, the U.S. Department of Defense’s ARPA (the agency now known as DARPA) funded a five-year program called Speech Understanding Research. This led to the creation of several new ASR systems, the most successful of which was Carnegie Mellon University’s Harpy, which could recognize just over 1000 words by 1976.

1976 —CMU’s Harpy Speech Recognition System

Meanwhile efforts from IBM and AT&T’s Bell Laboratories pushed the technology toward possible commercial applications. IBM prioritized speech transcription in the context of office correspondence, and Bell was concerned with ‘command and control’ scenarios: the precursors to the voice dialing and automated phone trees we know today¹.

Despite this progress, by the end of the 1970s ASR was still a long ways from being viable for anything but highly-specific use-cases.

This hurts my head, too.

The ‘80s: Markovs and More

A key turning point came with the popularization of Hidden Markov Models(HMMs) in the mid-1980s. This approach represented a significant shift “from simple pattern recognition methods, based on templates and a spectral distance measure, to a statistical method for speech processing”⁴—which translated to a leap forward in accuracy.

A large part of the improvement in speech recognition systems since the late 1960s is due to the power of this statistical approach, coupled with the advances in computer technology necessary to implement HMMs.⁵

HMMs took the industry by storm — but they were no overnight success. Jim Baker first applied them to speech recognition in the early 1970s at CMU, and the models themselves had been described by Leonard E. Baum in the ‘60s. It wasn’t until 1980, when Jack Ferguson gave a set of illuminating lectures at the Institute for Defense Analyses, that the technique began to disseminate more widely⁴.

The success of HMMs validated the work of Frederick Jelinek at IBM’s Watson Research Center, who since the early 1970s had advocated for the use of statistical models to interpret speech, rather than trying to get computers to mimic the way humans digest language: through meaning, syntax, and grammar (a common approach at the time). As Jelinek later put it: “Airplanes don’t flap their wings.”⁹

These data-driven approaches also facilitated progress that had as much to do with industry collaboration and accountability as individual eureka moments. With the increasing popularity of statistical models, the ASR field began coalescing around a suite of tests that would provide a standardized benchmark to compare to. This was further encouraged by the release of shared data sets: large corpuses of data that researchers could use to train and test their models on.

In other words: finally, there was an (imperfect) way to measure and compare success.

November 1990, Infoworld

Consumer Availability — The ‘90s

For better and worse, the 90s introduced consumers to automatic speech recognition in a form we’d recognize today. Dragon Dictate launched in 1990 for a staggering $9,000, touting a dictionary of 80,000 words and features like natural language processing (see the Infoworld article above).

These tools were time-consuming (the article claims otherwise, but Dragon became known for prompting users to ‘train’ the dictation software to their own voice). And it required that users speak in a stilted manner: Dragon could initially recognize only 30–40 words a minute; people typically talk around four times faster than that.

But it worked well enough for Dragon to grow into a business with hundreds of employees, and customers spanning healthcare, law, and more. By 1997 the company introduced Dragon NaturallySpeaking, which could capture words at a more fluid pace — and, at $150, a much lower price-tag⁸.

Even so, there may have been as many grumbles as squeals of delight: to the degree that there is consumer skepticism around ASR today, some of the credit should go to the over-enthusiastic marketing of these early products. But without the efforts of industry pioneers James and Janet Baker (who founded Dragon Systems in 1982), the productization of ASR may have taken much longer.

November 1993, IEEE Communications Magazine

Whither Speech Recognition— The Sequel

25 years after J.R. Pierce’s paper was published, the IEEE published a follow-up titled Whither Speech Recognition: the Next 25 Years⁵, authored by two senior employees of Bell Laboratories (the same institution where Pierce worked).

The latter article surveys the state of the industry circa 1993, when the paper was published — and serves as a sort of rebuttal to the pessimism of the original. Among its takeaways:

  • The key issue with Pierce’s letter was his assumption that in order for speech recognition to become useful, computers would need to comprehend what words mean. Given the technology of the time, this was completely infeasible.
  • In a sense, Pierce was right: by 1993 computers had meager understanding of language—and in 2018, they’re still notoriously bad at discerning meaning.
  • Pierce’s mistake lay in his failure to anticipate the myriad ways speech recognition can be useful, even when the computer doesn’t know what the words actually mean.

The Whither sequel ends with a prognosis, forecasting where ASR would head in the years after 1993. The section is couched in cheeky hedges (“We confidently predict that at least one of these eight predictions will turn out to have been incorrect”) — but it’s intriguing all the same. Among their eight predictions:

  • “By the year 2000, more people will get remote information via voice dialogues than by typing commands on computer keyboards to access remote databases.”
  • “People will learn to modify their speech habits to use speech recognition devices, just as they have changed their speaking behavior to leave messages on answering machines. Even though they will learn how to use this technology, people will always complain about speech recognizers.”

The Dark Horse

In a forthcoming installment in this series, we’ll be exploring more recent developments and the current state of automatic speech recognition. Spoiler alert: neural networks have played a starring role.

But neural networks are actually as old as most of the approaches described here — they were introduced in the 1950s¹! It wasn’t until the computational power of the modern era (along with much larger data sets) that they changed the landscape.

But we’re getting ahead of ourselves. Stay tuned for our next post on Automatic Speech Recognition by following Descript on MediumTwitter, or Facebook.

Timeline via Juang & Rabiner¹

Note: The history of ASR is filled with more contributors and innovations than we can detail in this piece; we’ve covered some major milestones and included links to further reading below. If we’ve missed something vital, let us know!

Further Reading

Here are the resources that were helpful in writing this piece, some of which go into far more detail:

  1. Automatic Speech Recognition — A Brief History of the Technology DevelopmentB.H. Juang & Lawrence R. Rabiner. If you’re interested in a more extensive history of ASR, this is a great resource.
  2. Shoebox — IBM History Exhibits

3. Whither Speech Recognition? — J.R. Pierce

4. First-Hand:The Hidden Markov Model — Lawrence R. Rabiner

5. Whither Speech Recognition: The Next 25 Years — D.B. Roe & J.G. Wilpon

6. Timeline of speech and voice recognition — Wikipedia

7. Speech Recognition — Wikipedia

8. Fortune article about Dragon Naturally Speaking, 1998— Shaifali Puri

9. Frederick Jelinek, Who Gave Machines the Key to Human Speech, Dies at 77 — Steve Lohr

10. Fifty years of progress in speech and speaker recognition — Sadaoki Furui

Thanks to Arlo Faria and Adam Janin of Remeeting who provided valuable historical context.

Source: Descript