Placeholder Image

Subtitles section Play video

  • I'd like to talk today

  • about a powerful and fundamental aspect

  • of who we are: our voice.

  • Each one of us has a unique voiceprint

  • that reflects our age, our size,

  • even our lifestyle and personality.

  • In the words of the poet Longfellow,

  • "the human voice is the organ of the soul."

  • As a speech scientist, I'm fascinated

  • by how the voice is produced,

  • and I have an idea for how it can be engineered.

  • That's what I'd like to share with you.

  • I'm going to start by playing you a sample

  • of a voice that you may recognize.

  • (Recording) Stephen Hawking: "I would have thought

  • it was fairly obvious what I meant."

  • Rupal Patel: That was the voice

  • of Professor Stephen Hawking.

  • What you may not know is that same voice

  • may also be used by this little girl

  • who is unable to speak

  • because of a neurological condition.

  • In fact, all of these individuals

  • may be using the same voice,

  • and that's because there's only a few options available.

  • In the U.S. alone, there are 2.5 million Americans

  • who are unable to speak,

  • and many of whom use computerized devices

  • to communicate.

  • Now that's millions of people worldwide

  • who are using generic voices,

  • including Professor Hawking,

  • who uses an American-accented voice.

  • This lack of individuation of the synthetic voice

  • really hit home

  • when I was at an assistive technology conference

  • a few years ago,

  • and I recall walking into an exhibit hall

  • and seeing a little girl and a grown man

  • having a conversation using their devices,

  • different devices, but the same voice.

  • And I looked around and I saw this happening

  • all around me, literally hundreds of individuals

  • using a handful of voices,

  • voices that didn't fit their bodies

  • or their personalities.

  • We wouldn't dream of fitting a little girl

  • with the prosthetic limb of a grown man.

  • So why then the same prosthetic voice?

  • It really struck me,

  • and I wanted to do something about this.

  • I'm going to play you now a sample

  • of someone who has, two people actually,

  • who have severe speech disorders.

  • I want you to take a listen to how they sound.

  • They're saying the same utterance.

  • (First voice)

  • (Second voice)

  • You probably didn't understand what they said,

  • but I hope that you heard

  • their unique vocal identities.

  • So what I wanted to do next is,

  • I wanted to find out how we could harness

  • these residual vocal abilities

  • and build a technology

  • that could be customized for them,

  • voices that could be customized for them.

  • So I reached out to my collaborator, Tim Bunnell.

  • Dr. Bunnell is an expert in speech synthesis,

  • and what he'd been doing is building

  • personalized voices for people

  • by putting together

  • pre-recorded samples of their voice

  • and reconstructing a voice for them.

  • These are people who had lost their voice

  • later in life.

  • We didn't have the luxury

  • of pre-recorded samples of speech

  • for those born with speech disorder.

  • But I thought, there had to be a way

  • to reverse engineer a voice

  • from whatever little is left over.

  • So we decided to do exactly that.

  • We set out with a little bit of funding from the National Science Foundation,

  • to create custom-crafted voices that captured

  • their unique vocal identities.

  • We call this project VocaliD, or vocal I.D.,

  • for vocal identity.

  • Now before I get into the details of how

  • the voice is made and let you listen to it,

  • I need to give you a real quick speech science lesson. Okay?

  • So first, we know that the voice is changing

  • dramatically over the course of development.

  • Children sound different from teens

  • who sound different from adults.

  • We've all experienced this.

  • Fact number two is that speech

  • is a combination of the source,

  • which is the vibrations generated by your voice box,

  • which are then pushed through

  • the rest of the vocal tract.

  • These are the chambers of your head and neck

  • that vibrate,

  • and they actually filter that source sound

  • to produce consonants and vowels.

  • So the combination of source and filter

  • is how we produce speech.

  • And that happens in one individual.

  • Now I told you earlier that I'd spent

  • a good part of my career

  • understanding and studying

  • the source characteristics of people

  • with severe speech disorder,

  • and what I've found

  • is that even though their filters were impaired,

  • they were able to modulate their source:

  • the pitch, the loudness, the tempo of their voice.

  • These are called prosody, and I've been documenting for years

  • that the prosodic abilities of these individuals

  • are preserved.

  • So when I realized that those same cues

  • are also important for speaker identity,

  • I had this idea.

  • Why don't we take the source

  • from the person we want the voice to sound like,

  • because it's preserved,

  • and borrow the filter

  • from someone about the same age and size,

  • because they can articulate speech,

  • and then mix them?

  • Because when we mix them,

  • we can get a voice that's as clear

  • as our surrogate talker --

  • that's the person we borrowed the filter from

  • and is similar in identity to our target talker.

  • It's that simple.

  • That's the science behind what we're doing.

  • So once you have that in mind,

  • how do you go about building this voice?

  • Well, you have to find someone

  • who is willing to be a surrogate.

  • It's not such an ominous thing.

  • Being a surrogate donor

  • only requires you to say a few hundred

  • to a few thousand utterances.

  • The process goes something like this.

  • (Video) Voice: Things happen in pairs.

  • I love to sleep.

  • The sky is blue without clouds.

  • RP: Now she's going to go on like this

  • for about three to four hours,

  • and the idea is not for her to say everything

  • that the target is going to want to say,

  • but the idea is to cover all the different combinations

  • of the sounds that occur in the language.

  • The more speech you have,

  • the better sounding voice you're going to have.

  • Once you have those recordings,

  • what we need to do

  • is we have to parse these recordings

  • into little snippets of speech,

  • one- or two-sound combinations,

  • sometimes even whole words

  • that start populating a dataset or a database.

  • We're going to call this database a voice bank.

  • Now the power of the voice bank

  • is that from this voice bank,

  • we can now say any new utterance,

  • like, "I love chocolate" --

  • everyone needs to be able to say that

  • fish through that database

  • and find all the segments necessary

  • to say that utterance.

  • (Video) Voice: I love chocolate.

  • RP: So that's speech synthesis.

  • It's called concatenative synthesis, and that's what we're using.

  • That's not the novel part.

  • What's novel is how we make it sound

  • like this young woman.

  • This is Samantha.

  • I met her when she was nine,

  • and since then, my team and I

  • have been trying to build her a personalized voice.

  • We first had to find a surrogate donor,

  • and then we had to have Samantha

  • produce some utterances.

  • What she can produce are mostly vowel-like sounds,

  • but that's enough for us to extract

  • her source characteristics.

  • What happens next is best described

  • by my daughter's analogy. She's six.

  • She calls it mixing colors to paint voices.

  • It's beautiful. It's exactly that.

  • Samantha's voice is like a concentrated sample

  • of red food dye which we can infuse

  • into the recordings of her surrogate

  • to get a pink voice just like this.

  • (Video) Samantha: Aaaaaah.

  • RP: So now, Samantha can say this.

  • (Video) Samantha: This voice is only for me.

  • I can't wait to use my new voice with my friends.

  • RP: Thank you. (Applause)

  • I'll never forget the gentle smile

  • that spread across her face

  • when she heard that voice for the first time.

  • Now there's millions of people

  • around the world like Samantha, millions,

  • and we've only begun to scratch the surface.

  • What we've done so far is we have

  • a few surrogate talkers from around the U.S.

  • who have donated their voices,

  • and we have been using those

  • to build our first few personalized voices.

  • But there's so much more work to be done.

  • For Samantha, her surrogate

  • came from somewhere in the Midwest, a stranger

  • who gave her the gift of voice.

  • And as a scientist, I'm so excited

  • to take this work out of the laboratory

  • and finally into the real world

  • so it can have real-world impact.

  • What I want to share with you next

  • is how I envision taking this work

  • to that next level.

  • I imagine a whole world of surrogate donors

  • from all walks of life, different sizes, different ages,

  • coming together in this voice drive

  • to give people voices

  • that are as colorful as their personalities.

  • To do that as a first step,

  • we've put together this website, VocaliD.org,

  • as a way to bring together those

  • who want to join us as voice donors,

  • as expertise donors,

  • in whatever way to make this vision a reality.

  • They say that giving blood can save lives.

  • Well, giving your voice can change lives.

  • All we need is a few hours of speech

  • from our surrogate talker,

  • and as little as a vowel from our target talker,

  • to create a unique vocal identity.

  • So that's the science behind what we're doing.

  • I want to end by circling back to the human side

  • that is really the inspiration for this work.

  • About five years ago, we built our very first voice

  • for a little boy named William.

  • When his mom first heard this voice,

  • she said, "This is what William

  • would have sounded like

  • had he been able to speak."

  • And then I saw William typing a message

  • on his device.

  • I wondered, what was he thinking?

  • Imagine carrying around someone else's voice

  • for nine years

  • and finally finding your own voice.

  • Imagine that.

  • This is what William said:

  • "Never heard me before."

  • Thank you.

  • (Applause)

I'd like to talk today

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it