Placeholder Image

Subtitles section Play video

  • In a fraction of a second, Google Translate can make sense of your surroundings.

  • But this isn't the same Google Translate from the early 2000s.

  • Over the past two decades, the technology has gone through a complete overhaul, shifting from a basic pattern matching tool to a sophisticated neural network that handles more than 130 languages.

  • It works by turning language into something computers can understand:

  • Math.

  • Exciting times for people who like language and math.

  • This is the tech behind Google Translate.

  • There's very little code left today from the early days of the phrase-based translation.

  • We have shut down and deleted almost all of it.

  • That Google Translate from two decades ago laid the foundation for what we use today.

  • When it launched in 2006, it worked by playing a matching game.

  • First, the model looked at lots of examples of professional translations scraped from the internet.

  • Then, when users entered sentences for translation, the tool would break them into the longest possible chunks of words it had seen before and combine the chunks.

  • It now uses a much more sophisticated machine learning approach, a so-called transformer model, which is the building block of all modern AI.

  • Transformers turn language into math by assigning numbers to words.

  • The key insight is that a series of numbers can represent a meaning.

  • You can then do math with those vectors that shows something about the relationships of the meanings of words to each other.

  • For each language Google Translate supports, every word gets converted into a vector, which is written like a list of numbers.

  • This way, the computer can do math with them.

  • For instance, king minus man plus woman equals queen.

  • The specific numbers assigned to each word don't really matter, and they're different in different languages.

  • But what matters is how each word relates to every other word.

  • It's all based on machine learning from billions of examples.

  • But most of the time you want to translate something, it's not just an individual word.

  • So the computer also has to figure out how words work together.

  • And this is where transformers, a breakthrough in machine learning, come in.

  • The next generation of neural translation was called the transformer architecture.

  • And this added a level, so it moved from representing the meaning of one word by a row of numbers to putting all the meanings of all the words into a table and doing math on that whole table.

  • And that enables you to do math that talks about not only the meaning of each word, but the importance of the relationships of the words to each other.

  • Say you're trying to translate this Italian sign into English.

  • First, Google Translate would turn each word into a vector.

  • And those vectors would be put into one giant table, or matrix.

  • Then the computer tries to figure out how each word interacts with every other word on this side.

  • Mathematically, this is basically a lot of multiplication.

  • The most important kind of magical step is laying them out in a matrix and doing what's called matrix multiplication.

  • And if you do enough of that, you can solve this problem.

  • All this creates a new list of numbers.

  • This is what's called a context vector.

  • And it's something pretty special.

  • This list of numbers actually represents what the sentence means.

  • Not just the sum of all of its words.

  • At least, if the model has done its job correctly.

  • If you put that together and are very clever, which the people who invented transformers were, and you train on a lot of data, which we do, you can eventually get to a collection of numbers that meaningfully represents the meaning of the sentence.

  • So that's called the encoder stage.

  • Then you have a decoder, which, roughly speaking, is the encoder in reverse.

  • The computer has to decode this back to human language.

  • The decoder now also goes through lots and lots of operations.

  • And finally, you start getting vectors out which can be mapped back to individual words.

  • So we hopefully get closed, four, then holiday.

  • So this is how language becomes math.

  • Getting this math to work requires a lot of training.

  • Lots of the numbers in this math problem are chosen randomly and then refined as the computer learns from billions of examples.

  • Before deploying an update with a set of values and weights, engineers run numerous tests with their AI evaluator and then professional human translators who check accuracy.

  • But since every possible combination of words leads to a unique equation, it's impossible to test everything.

  • Since the model has trained on translations going to or from English, it often requires more steps to go between two non-English languages.

  • For example, if you want to translate something in Japanese to Zulu, it will go from Japanese to English and then English to Zulu.

  • The first thing that happens when you use Google AR Translate is that we have to actually extract the text from the image.

  • And so as you can see here, it detects that now this is Chinese and it translates it to English.

  • It makes information a lot more accessible because for many people typing script in a foreign language is not an option.

  • The key component is a technology called Optical Character Recognition, or OCR.

  • Google has been using that since 2002, when it started digitizing libraries for Google Books.

  • Initially, it would do something very simple like pattern matching.

  • So you can think of it as, is this the same as this?

  • Yes, so it's an A or B or whatnot.

  • But now, Optical Character Recognition also uses transformers.

  • First, Google Lens identifies lines of text and text direction.

  • Then it determines specific characters and words.

  • Instead of dividing the sentence into words and assigning numbers to each word, though, it divides an image into patches of pixels.

  • These are called tokens.

  • The encoder of the transformer is going to process all of these tokens simultaneously to predict the best character and the best word eventually.

  • This means that Google Lens, the company's visual search tool, can often read things even when it can't make out every single letter.

  • With transformers, they're able to pick up on grammar.

  • If there is a spelling mistake, the transformer will also be able to use the context to disambiguate and still extract the right word.

  • After it completes Optical Character Recognition, Google Lens analyzes the layout of all the text.

  • That's how a computer would know to translate this sign as "you matter, don't give up," rather than "you don't matter, give up."

  • When you look at the newspaper, humans are excellent at just glancing at it and understanding what is the reading order, what should you read first.

  • This is a concept that isn't actually easy to solve technically.

  • It's very hard.

  • The key is for Optical Character Recognition to understand something about the meaning of what it's reading.

  • This is also done through extensive training.

  • After the chunks of text are sent to the translator,

  • Google Lens uses painting models to erase the text off different signs or backgrounds.

  • That way, translated text can be placed on top of clean surfaces.

  • Using generative models, it tries to predict and create pixels that match the surrounding pixels so that when we overlay the translated text, it looks very natural and seamless.

  • This doesn't always work seamlessly.

  • This one is not picking up the first line. I'm not sure why.

  • Some translations don't fully account for context, which is why alto on this Mexican stop sign might be mistranslated to high.

  • And while Optical Character Recognition can frequently identify text in bad lighting or with complicated perspective, it has its limits.

  • One of them is with deformable objects.

  • Whenever there is text on like a sweater or cookie wrapper, depending on the pose and the angle, it might be more challenging and difficult to extract the right OCR.

  • Well-formed, grammatically correct, fluent text we're quite good at.

  • Where we have challenges is people using slang, using casual speech in chat and social media.

  • We don't necessarily see as much of that because we don't have access to as much data.

  • Google is working to add some more features, like letting users refine their translations if they want to, similar to how you can ask Google Gemini or ChatGPT to make translation more or less formal,

  • or in Chilean Spanish rather than European Spanish.

  • And it's also working to add more languages.

  • There are an estimated 6,000 to 7,000 languages in the world.

  • Our goal is to support all of them.

In a fraction of a second, Google Translate can make sense of your surroundings.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it