Google I/O '24 in under 10 minutes - VoiceTube: Learn English through videos!

Subtitles section Play video

At Google, we are fully in our Gemini era.
Today, all of our two billion user products use Gemini.
Gemini 1.5 Pro is available today in Workspace Labs.
Let's see how this comes to life with Google Workspace.
People are always searching their emails in Gmail.
We are working to make it much more powerful with Gemini.
Now we can ask Gemini to summarize all recent emails from the school.
Maybe you were traveling this week and you couldn't make the PTA meeting.
The recording of the meeting is an hour long.
If it's from Google Meet, you can ask Gemini to give you the highlights.
People love using photos to search across their life.
With Gemini, you're making that a whole lot easier.
And Ask Photos can also help you search your memories in a deeper way.
For example, you might be reminiscing about your daughter Lucia's early milestones.
You can ask Photos, show me how Lucia's swimming has progressed.
Here, Gemini goes beyond a simple search, recognizing different contexts and photos, packages it up all together in a summary.
Unlocking knowledge across formats is why we built Gemini to be multimodal from the ground up.
It's one model with all the modalities built in.
We've been rolling out Gemini 1.5 Pro with long context in preview over the last few months.
So today, we are expanding the context window to 2 million tokens.
So far, we've talked about two technical advances-- multimodality and long context.
Each is powerful on its own, but together they unlock deeper capabilities and more intelligence.
But what if it could go even further?
That's one of the opportunities we see with AI Agents.
Think about them as intelligence systems that show reasoning, planning, and memory, are able to think multiple steps ahead,
work across software and systems all to get something done on your behalf, and most importantly, under your supervision.
Today we have some exciting new progress to share about the future of AI assistants that we're calling Project Astra.
For a long time, we've wanted to build a universal AI agent that can be truly helpful in everyday life.
Here's a video of our prototype, which you'll see has two parts.
Each part was captured in a single take in real time.
What does that part of the code do?
This code defines encryption and decryption functions.
It seems to use AES-CBC encryption to encode and decode data based on a key and an initialization vector, IV.
Do you remember where you saw my glasses?
Yes, I do.
Your glasses were on the desk near a red apple.
Give me a band name for this duo.
Golden Stripes.
Nice.
Thanks, Gemini.
Today, we're introducing Gemini 1.5 Flash.
Flash is a lighter weight model compared to Pro.
It's designed to be fast and cost efficient to serve at scale, while still featuring multimodal reasoning capabilities and breakthrough long context.
There's one more area I'm really excited to share with you.
Our teams have made some incredible progress in generative video.
Today, I'm excited to announce our newest, most capable generative video model called Veo.
Veo creates high quality 1080p videos from text, image, and video prompts.
It can capture the details of your instructions in different visual and cinematic styles.
For 25 years, we've invested in world class technical infrastructure.
Today, we are excited to announce the sixth generation of CPUs called Trillium.
Trillium delivers a 4.7x improvement in compute performance per chip over the previous generation.
Google search is generative AI at the scale of human curiosity, and it's our most exciting chapter of search yet.
All the advancements you'll see today are made possible by a new Gemini model customized for Google Search.
What really sets this apart is our three unique strengths.
This is search in the Gemini era.
By the end of the year, AI Overviews will come to over a billion people.
We're making AI Overviews even more helpful for your most complex questions, the types that are really more 10 questions in one.
You can ask your entire question with all its sub questions and get an overview in seconds.
I'm really excited to share that soon you'll be able to ask questions with video.
Why will this not stay in place?
And in a near instant, Google gives me an AI Overview.
I get some reasons this might be happening, and steps I can take to troubleshoot.
Since last may, we've been hard at work making Gemini for Workspace even more helpful for businesses and consumers across the world.
Now, I can simply type out my question right here in the mobile card and say something like, compare my roof repair bids by price and availability.
This new Q&A feature makes it so easy to get quick answers on anything in my inbox.
Today, we'll show you how Gemini is delivering our most intelligent AI experience.
We're rolling out a new feature that lets you customize it for your own needs and create personal experts on any topic you want.
We're calling these Gems.
They're really simple to set up.
Just tap to create a Gem, write your instructions once, and come back whenever you need it.
Starting today, Gemini Advanced subscribers get access to Gemini 1.5 Pro with one million tokens.
That is the longest context window of any chatbot in the world.
You can upload a PDF up to 1,500 pages long or multiple files to get insights across a project.
Now, we all know that chatbots can give you ideas for your next vacation.
But there's a lot more that goes into planning a great trip.
It requires reasoning that considers space-time logistics, and the intelligence to prioritize and make decisions.
That reasoning and intelligence all come together in the new trip planning experience in Gemini Advanced.
We've embarked on a multi-year journey to reimagine Android with AI at the core.
Now we're making Gemini context aware so it can anticipate what you're trying to do and provide more helpful suggestions in the moment.
Let me show you how this works.
So my friend Pete is asking if I want to play pickleball this weekend.
But I'm new to this pickleball thing, and I can bring up Gemini to help with that.
Gemini knows I'm looking at a video, so it proactively shows me an ask this video chip, so let me tap on that.
And now I can ask specific questions about the video.
So for example, what is the two bounce rule?
So give it a moment-- and there.
I get a nice, distinct answer.
Starting with Pixel later this year, we'll be expanding what's possible with our latest model, Gemini Nano with multimodality.
This means your phone can understand the world the way you understand it.
So not just through text input, but also through sights, sounds, and spoken language.
Now let's shift gears and talk about Gemma, our family of open models, which are crucial for driving AI innovation and responsibility.
Today's newest member, PaliGemma, our first vision language open model, and it's available right now.
I'm also excited to announce that we have Gemma 2 coming.
It's the next generation of Gemma, and it will be available in June.
So in a few weeks, we'll be adding a new 27 billion parameter model to Gemma 2.
To us, building AI responsibly means both addressing the risks and maximizing the benefits for people and society.
We're improving our models with an industry standard practice called Red Teaming, in which we test our own models and try to break them to identify weaknesses.
I'm excited to introduce LearnLM, our new family of models based on Gemini and fine tuned for learning.
Another example is a new feature in YouTube that uses LearnLM to make educational videos more interactive, allowing you to ask a clarifying question, get a helpful explanation, or take a quiz.
All of this shows the important progress we have made as we take a bold and responsible approach to making AI helpful for everyone.
To everyone here in Shoreline and the millions more watching around the world, here's to the possibilities ahead and creating them together.
Thank you.