Subtitles section Play video Print subtitles [MUSIC PLAYING] LINNE HA: Hi, thank you for having me here. My name is Linne. And I am a director of research programs at Google AI. I was here a couple of years ago, where I talked about how much being in this position and reaching this position was really about not me but everybody behind me, meaning all the people who helped me to get to this moment. And afterwards, I thought about this for a little bit. And of course, all of that is true. But as I've been giving these talks, and I give these talks around the world, I noticed that people were asking me questions like, how did you get your job? And after so many of those questions, I started to think, and I really kind of reacted defensively, in the sense that, are they questioning that it's a fluke that I have this job? That I actually don't deserve this job? And I don't know, whatever reason. Because they weren't asking all the other panelists with me. And then I realized, actually, what that was really an opportunity to tell my story about how I arrived here. Because that journey is actually quite interesting. My job at Google is super interesting. But how I arrived at Google is also another very interesting life story. But let me just say that I didn't plan on working for Google, or working in AI, or working in tech. I actually planned my life, since I was 9 years old, to be a writer. I spent all of my education going towards becoming a writer, because I knew without a doubt. I was the kid who went to the library, back when libraries were still very popular and books were very popular, taking out stacks of books. And because I grew up in Guam and then in Alaska-- so these very remote, far-reaching-- where your creativity and imagination were really important to transport you to new places. So wanting to be a writer, I also knew that I was probably not going to get a very good job with an MFA. So what I did as an undergrad is I studied as many things as possible. Also, because I'm kind of a nerd, and I like to learn many things, so I did double majors and double minors. And then the ironic part of that is that when I graduated, the jobs that were offered to me were at an investment bank, or the CIA, or going into publishing. And I really wanted to go into publishing. But they were going to require me to work 50 hours a week, making 17,000 a year. And so basically, I was too scared to go to the CIA. So I went to banking. [LAUGHTER] And I went to banking. And what my plan was, living in New York-- because I came from Alaska. I came to New York, and it was very magical to be here. My plan was to work as much as I could, which was usually about a year, year and a half, save up money, take off, go to a residency, and write. So I kept doing this for a long time. And then one day, the job that I was offered was at Google. And actually, I had a pretty decent job before that. I had actually moved to San Francisco, because I found it very expensive to live in New York. And back then, San Francisco was very cheap. [LAUGHTER] Actually, I didn't apply. I was already at a job, and the recruiter found my CV on Monster, or Indeed, or one of those on online boards. And they offered to bring me down to the Mountain View campus, the headquarters, which is about 40 minutes away from San Francisco, and to interview, and give me a tour, and a free lunch. And I was thinking, oh, how else would I possibly get a tour to Google unless I went through this whole thing? I was like, oh, sure. Why not? And because I had a job, it was totally fine whether the interview failed or not. It didn't matter to me. But I actually ended up meeting some of the most interesting, smart people. But I was still happy with my job. And I didn't really want to commute down to Mountain View. Then one day, my manager started questioning and started micromanaging my work. And I was like, you know what? I don't really need to take this, because I have a job offer from Google. [LAUGHTER] And back then, it was actually not as big of a deal as it is now. But it seemed more exciting. So my first job was with a division of Google called International. So we didn't have all of these offices around the world. The offices were on the Google campus, which is in Mountain View, as I was saying. And it was in one particular building where-- and they had flags-- all the marketing people, the product managers, the localization people, the QA people, were all in one building specifically dedicated to international products. So other than my first job, which was really within the localization department. And it was called International Program Manager. Other than that first job, every subsequent one, I created, actually. And this is sort of the beauty of Google. What happens is you start working on something. And you see that there's something wrong with the product. And other people see it, too. But you have an idea about how you can fix it. And you convince everybody else that it's a good idea. And then once it actually starts to flourish, it becomes a viable product. And many of Google products, including Gmail-- and I can't even think of all the products that came out of this 20% idea. So I was working on Google Earth. And Google Earth had been a recent acquisition. And the same people who, by the way, are doing the Pokemon Go-- Project Niantic, right? And with that first job working on Google Earth, which was a downloadable client application for regular consumers, prosumers, and professionals-- so enterprise, right? So there were three different versions. And then you had the support for the Mac OS. You had support for PCs, Windows, and then you also had Linux support. So it was quite unwieldy. But in that acquisition, one of the recent mandates we'd had was that Eric Schmidt, who was our CEO at that time, wanted all of our products be internationalized within a month. So basically, in order to do that, imagine all the different UIs and so on. And when you're talking about geography, it's not just a matter of translation. In fact, names of cities, or bodies of water, or mountains, or borders are highly contentious. And these are, of course, what many wars are based on. So basically, it was a very hot product, but one nobody really wanted, because it was also very challenging. But I was given this because somebody was going on maternity leave. And basically, at Google, at that time, you just had so much work. It was one of those situations where you could just work all day and never really catch up. So with Google Earth, one of the things that was really annoying for the nationalization of Google Earth to rush out was we were going to do FIGS, which is usually the first set of languages that you launch to, which is French, Italian, German, Spanish. So the left hand navigation bar was hardcoded. It wasn't flexible. It was hard coded based on the most clicks and queries. So it was really menu items of food items like fast food. Fast food was one category. Pizza was another category. Barbecue restaurants were another category, and Italian was another category. So in theory, that should be fine. Because everybody eats pizza, and fast food, and barbecue, and so on. But what is an Italian restaurant in Italy? How do you-- so coming up with this ontology was something that we really wanted to fix. So I found out that a lot of the data that we were getting was from Google Maps. So I'd worked with the team on working with Linguist to figure out what should those categories actually be. What are people looking for in these countries? What are the most popular queries around businesses? And so with Linguist, we started building up that ontology. So it was actually flexible for each country we were launching into. And then, as we were building up these ontologies, which, ultimately, ended up to be part of Google Maps, as well, I moved to Google Maps so that it would help the rest of the teams, if you will. Everybody else could use this data. So working with Google Maps on building these ontologies, I was traveling around the world. Because you really want to be able to go locally to see how people call-- just to give an example, a motel in the US is a motor hotel. It's something very convenient. It's budget accommodations. A motel in Japan or in Brazil, for instance, is actually a sex hotel. So you get charged by the hour. So unless you know this, unless you could actually get this localized, the information is not going to be very useful. So I would go to Japan often. And one of the annoying things, for me, was that the Japanese maps was all in Japanese. And part of that was because most of the data at that time was based on the license data that other people were producing. And it was what was most rich for that country. So from that, I worked with some other people who thought, oh, let's try to fix this in some sort of algorithmic way. So I worked with a speech team to come up with some sort of pronunciation. So the way you pronounce certain labels or certain words, excuse me, we could actually go ahead and transliterate this into Latin characters. And then based on a set, let's say 20,000-- that's sort of a ballpark. I don't remember if that is exactly what we did. But based on a small set, we can actually train and test whether we can generate these automatically. So in Japanese, for instance, you have many different character sets. You have kanji. You have katakana, hiragana, and romaji. So you wanted to be able to get all of that transliterated into Latin characters so that we can then produce them into other characters like Chinese or whatnot. So Linguist would go through, and go and transliterate, which is how something sounds into Latin characters. So doing that, we're able to create new labels for maps. And we did a launch. And that formed a team called Maps Transliteration. They still continue to do this, to this day. The next thing was, because I had been working with a speech team on those pronunciations, they actually had just formed, and they were trying to launch-- and they were listening to Eric's mandate of launching into 40 languages. But there's a big difference between translating a UI and actually creating a new language model. I didn't know anything about language models. But I did know that when I was traveling for Google Maps, people would say to me, oh, Linne, you work on maps, right? There's something wrong with my address. It's showing that it's here, but it's really here. Or the navigation to get to my work is really annoying, because it's actually dropping me off at a different entrance. That sort of thing. And I would hear this, and go back to work, and file a bug into the team. And I always thought, wouldn't it be great if the people who are using our products could talk to the people who are making the products? So I found myself to be the conduit. So when the speech team said that they actually needed to launch into many of these languages, and it required that we collect a lot of data, which is basically acoustic data for acoustic modeling. How words sound, for instance, and linguistic data, lexicon, about all the different unique rules for that language. So basically, what we wanted to do and what the industry before the iPhone, before-- this was back when we had our very first Google phone, which was called the G1. I thought about how it had been done previously. This is how old it was, back before mobile phones. The industry was really led by DARPA, which is the military group, where much of our technology comes from. DARPA did a lot of speech and language modeling, and so on. And so we really had one or two vendors who did this work for all the different industry-- are basically our competitors. And so the whole process of getting, basically, voice samples from different people, back in the day, it would be a classified ad in the paper. And somebody would call on their landline and answer questions, as a little survey. And then that's how they would get their acoustic data. But the problem is that the sound frequency on our landline is different from the sound frequency on a mobile phone. And we also wanted to profile the mic. We wanted our phone to work really well for our products. So I remembered how so many people that I'd met along the way, traveling with Google Maps, had told me-- and they were total fans-- different ideas of what they would do, what they want, and how they clearly were, one, proud of their language, two, very big fans of Google. And I thought, why not actually go bring our phones to our Google fans and have them give us their voice samples as well as their friends and people within their social network? So that when we did actually launch this app, it would work really well for them. Because they gave us their voice sample. And so that's basically what I did. And that's called crowd-sourcing. Because before that, it was all done in a very discrete way through a company. And these companies did not know exactly what kind of queries, and words, and sound units we were interested in. And by the time we got them all that information, it would take about six months. So the first time I did this as an example, we went to Thailand. Because Thai was the most difficult. And it was also going to be really expensive. Off the top of my head, I think from the vendors, it was going to be like $150,000 to get like-- I don't remember-- 500,000 utterances. And we went and worked with a school. And we had this whole training process of how speech technology works. And we selected about 15 crowd-sourcers, if you will, or Google fans. We gave them phones, and we paid them for every voice sample they collected. And we got everything that we needed within two days. Normally, it takes six months minimum just to get it cleared through legal and so on. And then not only did we get this data very quickly, they were completely engaged in the fact that they were part of something. So when the product actually was released, they were super excited. So it was a win-win. So that idea of crowd-sourcing and working with the people you meet, people who are enthusiastic of your product, basically, was a way to really connect the people who are building the products-- the models, basically, in this case-- to the people who are using it. So that's crowd-sourcing. And up until I did it, nobody had done it before in this industry. So there were articles about it and so on. And I was happy, because all the people who were participating were also being acknowledged for their help. So this went through. We collected, in about three years, almost 70 languages. So we scaled very, very quickly. That's almost unheard of. Even our competitors now have not reached where we are with the languages. So if you think about what that is with voice or just speech recognition, that is the ear of the machine. It converts sound into text. And then speech synthesis, which is to speak out what the machine is trying to say-- which is TTS, text to speech-- is a different animal altogether. Because for ASR, speech recognition, you need as many different variety of speakers as possible. You want to be able to catch all the different accents, all the different ways you would say tomato or tomato. You want to capture all of that. For speech synthesis, you actually only want one perfect voice. And that perfect voice has to in the perfect studio with no other sound. Because you're generating, now. So I also went with the acquisition. I had built a team of linguists, and we did the collection for that, as well. So within the speech team, I worked on the speech recognition as well as the speech synthesis. And then if you think about it, what is missing here is now the brain. We need to process all this information, the text that's coming in and then the text that's going to go out. So I moved to a new organization, which is the Google AI group, right now. And basically, created a team of linguists, because we knew that we actually needed to get more information about the languages. And before, we'd worked with linguists from a pronunciation-- a different linguistic phenomena that happens. But now, we wanted to actually work with linguists to understand the syntax, semantics, and all the ways the language actually works. So I created a team called Pygmalion, mostly because of-- I don't know if you guys are familiar with the "Pygmalion" story, but I was going for the "My Fair Lady" version, which is teaching a machine or somebody who doesn't know proper English the proper English. So there was a Pygmalion team. And then we also needed to figure out how to generate the text in a way that was fluid and semantically accurate for each language. Because in English, we don't have that many linguistic phenomena compared to French, for instance. How you say whether you're going, whether it's raining in New York or in Paris, we basically have one preposition. In French, you actually have, depending on many different things-- whether it's feminine, masculine, whether the word starts with a vowel or an H-- the preposition changes. So we wanted to be able to do all of that. And so we created other team to do that, exactly, which is syntactic realization, natural language generation. So now, we have the ear of the machine. We have the mouth of the machine. We have the brain. And to do that work, we wanted linguists to work with engineers to come up with those rules. So in doing all of that, I was also asked-- because I had so many people on the ground collecting speech data, I was asked to look at a new area, which is an area that's called For Low Resource Languages. Where basically, there's not enough data with web pages, so we have to figure out-- there are many languages that are really spoken, but they're not written. Or there's no standard to how they're written. So we wanted to figure out how we can bootstrap our technology to figure out new ways to advance what we were already doing, but not go at it in the same old way. Because the same old way would not work, because there's not enough data to build the language model. Or it's very difficult to find the perfect voice for a particular language. So I created a separate team for the Low Resource Language Project. And the idea here was that we have, excuse me, 90 million people in Bangladesh. There are not enough web pages compared to in other languages or in other countries, like compared to English, for instance. So the question here was we had the speech recognition from the collections, where people were volunteering. But how do we get the speech synthesis? And I had this idea that, basically, I was watching Saturday Night Live. And there was a comedian who was mimicking a politician. And he sounded exactly like the politician. And I was thinking about one of the challenges that we have in creating the perfect TTS voice is that if you create the perfect TTS voice, it sounds exactly like a living, breathing person. If you're a company that has a voice that's supposed to represent your brand, to have it mimic a living person can be a little bit challenging. And there's all kinds of questions around what that may be like. So for instance, you want to have many different kinds of voices. You want a human voice. So what I thought would be interesting is why not actually get, instead of having a professional voice talent-- because we couldn't really find a voice talent-- why not experiment with having many non-professional speakers of that language. And basically, give us a sample. And then we could actually blend it and combine it into how many utterances we need. So the old model was using a concatenated model, which means that you needed lots and lots of data at a professional studio. The new way that we wanted to experiment was really blending the voice. We were trying to leverage all the latest neural networks, neural net models that we can leverage. So basically, what we wanted to do is we did a call out to all the Bangladeshi Googlers. Because we knew that they were very big fans of Google products being launched into their country. So I think about 50 Bangladeshi Googlers were available in Mountain View. We had a little anechoic chamber, a little studio there, that we could test this with. The other thing that happened was that a new ventless laptop existed. Because before that, all laptops had this fan which would interrupt the recording. And now, we had this laptop called the Asus laptop, which allowed us to actually use the laptop and have a portable studio, if you will. So the thing is that we were creating voices that could be blasted from a studio. And it would sound great. But in these countries, we were all actually listening to the voices on a small mobile phone. We didn't need that quality. We just wanted what was good enough. So we had 50 Bangladeshi Googlers. 20 of them volunteered. We recorded all of them, where they only recorded for about 30 minutes. Because if you're not a professional, doing this for more than 30 minutes, all kinds of things happen to your mouth. You're too tired, and there's no point. So we did this. And then we also had them rate which voice they thought sounded the best. Because for a non-Bengali speaker, for instance, you can't really tell. You have to be able to know what sounds warm and so on. And they chose one. And it was all done anonmymously. And so we chose one voice, one speaker. And then basically, I think we ended up using, I believe, 12 of the speakers' data and built with 1,200 lines. I think this was 1,200 lines. It was a while ago. But in any event, that created a voice using the parametric synthesis route. And that was good enough for us to actually launch into the Android phones, as well as onto Google Translate. And that allowed us to, again, do a very similar thing, which was to scale. So we were doing multi speaker, single language voices. And then we decided, you know what? There are many people who speak many languages. So why not leverage those sounds that you can produce into those many languages? So then we went from multi speaker to multilingual. Because languages have similarities, why not bootstrap and learn from other languages? So I know this sounds all complicated and super expert. But just so you know, I have an MFA. I had, I think, two years of computer science as an undergrad, many, many years ago. So by this time, I had reached the sort of level right before you become a director. And I was at that level for about four years. And I didn't really want to be a director, because I actually just wanted to work. And I was afraid that being a director would require me to do all kinds of other things. It turns out it's true. I didn't know. Nobody told me. Turns out it's true. But then when people were telling me how there are not enough women in leadership positions, I didn't consider myself to be a leader. I Didn't consider that I would want to actually, quite frankly, be doing this. But the point is that if I didn't, who would? And all the people, as I was saying in the last one, all the people who've helped me along the way-- I don't just represent myself. I represent them. So I felt like I had to take the leap. Avoiding it was becoming a bigger problem than actually trying. So I did. But I went through all sorts of questions of, am I expert enough in this? Do I have enough expertise? And now, do I have to be even more perfect? Because I think one of the things that we talk about-- a couple of weeks ago, I was at one of these leadership summits for women. And Sally Helgesen and Marshall Goldsmith just came out with a book called "How Women Rise and 12 Habits That Keep Women From Progressing." And I think Marshall Goldsmith has a book about what got you here will not get you there. And I thought that was really interesting and important. Because as I was looking through the 12 habits, I definitely embodied all of them. I was like, oh, my gosh. The first one is not claiming your achievements-- giving other people, your team, credit. So yes, that's true. And the other thing was about perfection. Because as you become a leader, you're also managing people. It's about relationships. And if you expect perfection from yourself, first of all, that's not going to happen. If you expect perfection from yourself, that critic that you have-- that inner critic, that judge-- is also criticizing and judging other people. And you can not have a team that's healthy. You don't want to be with co-workers who are always criticizing or only picking out and seeing the negative things. You want, actually, the exact opposite. You want a coach. You want somebody who's there whether it's rain or shine. So very quickly, one of the things I learned was I had to give up the whole idea of perfection and precision. Though it is what got me to this point, it is what got me promoted to the next level. Because I was working really hard. And it takes a lot of work. I think most of you guys know this. I was working really hard to do this. But I think that what's really important is to accept who you are. And part of that is your values and being authentic. And that is what will help you work with other people. Because you won't be as critical. You will accept your own imperfections, because those imperfections are also sometimes what helps you get to where you are, whether you like it or not. And part of my work with research is that we don't consider failure to be failures. Because you need to fail in order to learn. In actual language, the model building, you need to know what didn't work in order to figure out what does work. So all of that-- if you accept that, oh, well, we need to actually fail here, then you understand that there is no such thing as perfection. That's completely in your head. It's a specter that sort of holds you back. So the perfection part, I think, is really important to think about. I think it's also really important to think about your achievements and what you've actually achieved to be able to move forward. Because that's getting to that next level. And then I think the third thing, which I think is really interesting, is leveraging your network. So one of the hard things that I've learned is not all women help each other. And sometimes, in tech, especially, there are sort of the old guards. And it's not always men. Because for whatever reason-- who knows if it's cultural, or it's what not? But the thing is that it's not a competition. It's very important not to compare yourself with other people. You are only you. And this is all part of accepting your perfection, being authentic, understanding your own values. And so you do need to get to that point of appreciating and understanding who your peers and your community is. So one of the outcomes of that leadership summit was really coming up with cohorts. Coming up with not necessarily just one mentor-- mentors are good, definitely. Because you may need to ask questions and so on. But a group of people who are thinking about similar things, and to be able to bounce ideas off of them, and so on. Because you may be in the position where you need somebody to talk to, as well. I think a cohort is really important. And that's something that we can think about. Because the thing is that I have been an outlier from day one. I'm an immigrant child. I couldn't not work. I had to always work. I come from Guam and Alaska. I'm not really from New York. And I used to be so jealous when I went to NYU. And my friends, my classmates, would go home for the weekend to do their laundry. I was like, what? And so I think it's really important to accept and understand that we, as women in tech, are outliers. There is no status quo, really. Nobody has drawn a map or a plan for your future to move forward. It's just you, and what you want, and what motivates you, and what's interesting to you. I've reached this position not because I'm an expert, but because I can see and be creative about how to solve problems that are different from other people. And I, basically, took the risk to take that next step, because I thought it might be exciting. So follow your heart. And then try to solve problems with other people. And I think that's one of the best lessons that I've learned, is that community is really critical to not just us in this room. But it's critical for our culture. And it's critical for the advancement of women in tech. Thank you. [APPLAUSE] I'm not sure what's next. SPEAKER 1: Will you take a few questions? LINNE HA: Yeah, sure. AUDIENCE: Hi, I'm Melissa. You talk a lot about writing being a former passion of yours. Do you still think about it? LINNE HA: I write all the time. AUDIENCE: Oh, awesome. LINNE HA: Yes. SPEAKER 1: Go. AUDIENCE: Oh, me? OK. AUDIENCE: Oh, sorry. Go ahead. AUDIENCE: No, go. [LAUGHTER] AUDIENCE: Hi, I'm Caussie Nebled. So you were mentioning a lot about you have these creative ideas, but you have a very different background. So I was wondering, how do you go about leading a group of people who are experts in the solution that you're trying to facilitate? LINNE HA: Well, I didn't arrive in my position overnight. I learned a lot along the way. And I think being observant is important. The main thing is that if you have a good idea, it doesn't matter who it comes from. And part of being a leader is influencing, and developing the network, and collaborating, and partnerships to get that idea going. So and so thinks that this is a problem, as well. Like, let's try this. AUDIENCE: Hi, I see that in a lot of these conferences, there are people that are looking for transition. And a lot of us are maybe just entering. I, for one, am a new person in the world of tech. I feel like I am. And I was wondering, from your point of view, what do you see when you're facing a group of people that are trying to transition? How do you feel? What draws you when someone comes to you for an opportunity? You were talking about cohorts and mentors. What captures your attention? LINNE HA: The number one rule that I have when I hire somebody, whether they are expert or non-expert, is passion and motivation. Because if you're not motivated, it doesn't matter how good your skills are. There's no way I can get you to do the work that you need to do. And so if you're passionate, you're going to already be thinking about these things and motivated to come up with different ideas. So passion is the number one thing. And you say transitioning into tech. And I understand what you mean from a career perspective. But one of the most important things I think everybody should know is that tech is already in your world. You are already in tech. It's all over. So I think we have to start thinking about it a little bit differently and reframe. The difference is what you do from a work perspective to what you are acknowledging in the world. Tech is all around us. We all have mobile phones. So figure out what part of it is interesting to you and what you do you don't mind spending a lot of time doing. And go in that direction. AUDIENCE: Thank you. AUDIENCE: Hi, my name is Adenomar, and I'm a grad student. And my question to you is you mentioned that failures are necessary for us to learn. And I totally agree with that. But what is your advice in the moment? When you're facing failure, what is your advice to take it in the most positive and to learn the most out of our failures? LINNE HA: I think if you just start to think about, well, what did you learn, what came out of that experience, and what do you want to do next with what you've learned-- it's just another step. I think failure doesn't match your expectation. But you need to reset your expectations. SPEAKER 1: Good question. AUDIENCE: Hi, is this on? SPEAKER 1: Yeah. AUDIENCE: Hi. So you talked a lot about problem solving. Was there any book that helped you frame how you think about problem solving and also a book that influenced how you make decisions? LINNE HA: I do a lot of meditating. [LAUGHTER] So for me, personally, it's not to be so reactive, to actually think about it a little bit, but not think about it too long that it's creating a problem. Some decisions need to be made right away. I think problem solving-- I can't name one particular book off the top of my head. But the book that I was talking about earlier, Sally Hegelsen and Marshall Goldsmith's book about how women rise, I think, is really interesting to look at the habits that we form in getting to a certain level, and what you need to change in order to get to that next level. AUDIENCE: Thank you. [APPLAUSE] [MUSIC PLAYING]
A2 speech voice people basically language data International Women’s Day Celebration - Keynote (IWD2019) 4 1 林宜悉 posted on 2020/03/03 More Share Save Report Video vocabulary