Placeholder Image

Subtitles section Play video

  • Google has recently introduced several useful features to their Gemini API, targeting developers specifically.

  • Some of these features are not offered by other API providers.

  • One such feature is code execution that enables the model to generate and run Python code and learn iteratively from the results until it arrives at its final output.

  • This powerful capability allows you to build applications that benefit from code-based reasoning, such as solving equations or processing text.

  • Code execution is an added feature, very similar to function calling, but very different in how it works.

  • To show you why this can be an extremely powerful feature for developers, let's look at an example.

  • So here we have Cloud 3.5 Sonnet, which is one of the best LLMs available today.

  • And I asked it, can you count how many times the letter R is present in the word strawberry?

  • Just like all the other LLMs that I have tested, it gets this wrong.

  • Then I asked it, can you write code to count how many times the letter R is present in the word strawberry?

  • Now this is a trivial code, and the code that it came up with is correct.

  • However, since it doesn't really have the ability to execute the code, it will not be able to figure out whether the answer that it generates is right or wrong.

  • So the output that it comes up with is the letter R appears two times in the word strawberry.

  • Again, the code is correct, but it's just guessing the output.

  • It has no ability to actually execute the code.

  • Next, let's look at Gemini 1.5 Flash, which is a relatively weaker model when you compare it to Cloud 3.5 Sonnet.

  • Now, when I asked the same question, it came up with the wrong answer, which is expected.

  • Then I went to the advanced settings and enabled code execution.

  • So now it will be able to not only generate a code, but can actually execute that code.

  • So when I asked it to write code to do the same thing, it came up with a code that will be able to look for a certain letter, R in this case, and give us a count.

  • And the response is the letter R appears three times in the word strawberry.

  • So since it has this ability to execute the code, it can give you correct answers for problems involving reasoning and code execution.

  • So this can be an extremely powerful feature, especially if you're building agents with Gemini 1.5 Flash or Pro.

  • Google has provided a number of different code examples.

  • We are going to look at some of them in this video.

  • But question is, what is the difference between code execution and normal function calling?

  • So to answer this, I asked Claude to create this quick comparison.

  • Now, when it comes to function calling, that lets you interact with the real world using external APIs or tools.

  • In that case, you will need to provide a list of tools that the model is going to have access to.

  • But once the model picks a tool, you will actually need to run that.

  • That means that you will need to set up your own development environment for that.

  • So that requires a lot of setup.

  • And sometimes for function calling, you will need to make multiple API calls in order to achieve a task.

  • And as I said, you need to provide the function as well.

  • But the good thing is that you have a lot of flexibility in terms of this can be any language, any framework.

  • When it comes to code execution, the LLM itself decides whether it needs to write a code to perform a certain operation or not.

  • And it is able to run the code in the API backend.

  • So you don't really have access to the environment in which it's running.

  • It's a completely isolated environment, but it's fixed.

  • Now, it's much simpler to use.

  • And actually, Google recommends to use the code execution if you can perform a task using this feature.

  • Now, the great thing is it's a single API request.

  • So irrespective of how complex your request is, the code execution is going to happen in the backend, and it can actually iterate on it.

  • And I'll show you some examples of how that iteration takes place.

  • So the API writes and run the code.

  • The problem is it's only limited to Python at the moment, and also very specific functions or libraries.

  • So at the moment, the environment in which it execute the codes has only access to NumPy and SymPy.

  • But I think for most of the cases, this is good enough.

  • Now, there are some limitations which I want to highlight before we look at some code examples.

  • So the model can only generate and execute code.

  • It cannot return artifacts like media files.

  • The other thing you need to keep in mind is the feature doesn't support FileIO or use cases that involves non-text output.

  • So for example, data plots.

  • However, I have figured out a way in which you can generate plots using the code it generates.

  • Basically, you run them locally, but I'll show you an example of that.

  • And the code execution can run for a maximum of 30 seconds before timing out.

  • So whatever task that you ask it to do, it needs to finish within 30 seconds.

  • That is a limitation, but I think in most of the cases for agentic workflows, this should be good enough.

  • Now, in some cases, enabling code execution can lead to regression in other areas of the model output.

  • So for example, if you're asking it to write story, that can have an impact because you're using a completely irrelevant feature, which is code execution in that case.

  • Now, different models or different variations of the models have different abilities.

  • So for example, Gemini 1.5 Pro is best performing model based on their internal testing, which is expected because this is a bigger model compared to Gemini 1.5 Flash.

  • Okay, so let's look at a few interesting code examples of how you can use this in your own workflows.

  • Now, the first part of this notebook is based on the code provided by Google.

  • And in the later part, I added some examples just to test different abilities of this feature.

  • So we will first need to install the Google Generative AI client for Python.

  • And currently the one that we are using is version 0.7.1.

  • Then you need to provide your API key.

  • You can set this as environment variable.

  • You can get that from Google AI Studio.

  • In my case, I just set this as a secret in my notebook, okay?

  • Then there's a code that will just give us a pretty nice looking output in well-formatted markdown.

  • Now, before showing you the power of code execution, let's look at a few prompts without code execution.

  • So here we are creating a client model and we are using the smaller 1.5 Flash, right?

  • So we create the model, then we call generate content function on it.

  • The first prompt is, what's the sum of the sum of first 200 prime numbers?

  • Make sure to get all 200, okay?

  • Now, since it doesn't really have the ability to generate or execute the code, so it came up with a correct pseudocode or basically an algorithm using which you can actually do this yourself.

  • But we specifically asked it to generate those numbers and give us all of them.

  • But since it cannot execute that code, so the answer, really correct, but not what we are looking for.

  • Similarly, when I asked it to count the number of times the letter R appears in the word strawberry, again, it got this wrong.

  • But you can enable code execution and it's extremely simple or easy to do.

  • All you need to do is in the list of tools, just provide code execution.

  • If you do that, the model will have access to this code execution feature.

  • So if we run the same prompt, which is looking for the sum of the sum of the first 200 prime numbers, then it can actually write code for it.

  • So here's just looking at internally what is happening is it generated a text output, then it executed some, like it came up with actually some code, executed that code and got the results.

  • Then it came up with some more text, executed the code again, or actually wrote the code and then executed the code and got the results and show us the text.

  • So let me show you what the final output looks like and that will kind of explain what actually happened.

  • So if we look at the results, it says I can calculate the sum of the first 200 prime numbers and the way it did was that it first wrote code to generate all the first 200 prime numbers.

  • So here is the code that it's used to generate the prime numbers that we needed or requested, right?

  • So that is the first step in which it generated some text, then it generated some code and executed that code.

  • Then it generated some further text, which says, no, we have the first 200 prime numbers.

  • Let's calculate the sum.

  • So it calculated the sum here and here's the final output of the sum.

  • So as you can see, it's actually doing things sequentially.

  • There are multiple steps in which it needs to go through and do different operations.

  • Everything is happening on the backend.

  • All we get is the code that it came up with, what were the results and the final conclusion, which is pretty impressive.

  • I think it's a very neat implementation and can be extremely helpful.

  • Now, when you look at the strawberry example, again, it came up with the code and it's able to run that code and this time it got the answer right that there are three letter R present in the word strawberry.

  • Now you can use the same functionality as a part of a chat session as well, which I think is going to be extremely helpful if you're creating agentic workflows.

  • So now we're going to create a chat client.

  • You can provide a prompt.

  • So for example, in this case, the user is asking to do BOGO sort on this list of numbers.

  • It's a sorting technique, so it comes up with the code that it will need to run and based on the code, it comes up with the final sorted list and then you get some output text as a part of it.

  • But then since it's a chat session, you can actually ask it to do subsequent things.

  • So now modify the code to count the number of iterations.

  • How many iterations does it take?

  • So not only it has access to the previous code that it generated, because that is part of the chat history, but it can iterate on that code as well.

  • So if you are working on an agent and that is supposed to update its existing dataset or code base, I think this can be a very powerful tool to do that.

  • It iterated on the previous code, basically updated that code with this new addition that we are requesting.

  • And again, it gave us the sorted list, but it now also included the number of iterations it took.

  • Okay, let's look at a more interesting example where we're sending an image as a part of the prompt and asking it to write some code for us.

  • So this is an example of the classic Monty Hall problem.

  • The prompt is, run a simulation of Monty Hall problem with 1000 trials.

  • Now here's how it works as a reminder.

  • So we're just telling it what Monty Hall is and how the problem is structured.

  • So you are on a game show with three doors.

  • Behind one is a car and behind the others are goats.

  • You pick a door.

  • The host who knows what's behind the doors opens a different door to reveal a goat.

  • Should you switch to the remaining unopened door?

  • That's kind of the classical problem that the system has to solve.

  • Okay, so we basically give it the problem statement and then say the answer has always been a little difficult for me to understand when people solve it with math.

  • So please run a simulation with Python to show me what the best strategy is.

  • And they also added a thank you, which is a nice way of interacting with LLMs.

  • Now it came up with a code which basically runs that simulation for us.

  • And it says that win percentages with switching is going to be 65.8%.

  • If you don't switch, the win percentage is just 31%.

  • And it also comes up with the explanation of how it works and why you should switch.

  • So pretty powerful because now you can provide these other artifacts, although it will not be able to directly use them, but still you can run or create code based on those artifacts and run those.

  • Now it works with streaming as well.

  • So I'm not going to go over on this a lot, but you can provide the same image plus the prompt and ask it to stream the responses.

  • Okay, next to actually test some of its capabilities, I came up with a number of different examples.

  • So the first one is simple mathematics.

  • So that's basically calculating the sum of different numbers.

  • It should be able to do that.

  • Then there is this one on string manipulation, and we're going to look at them one by one.

  • So given the string, hello world, that's welcome to Gemini API.

  • It has to perform multiple tasks.

  • So convert the string to uppercase, then count the number of vowels, then reverse the string.

  • Next, I wanted it to do data analysis to generate numbers between 100 and 1,000, calculate the mean, median, and mode.

  • What's the minimum and maximum value?

  • And one thing I wanted to see was whether it's going to be able to generate plots for us or not.

  • So I asked it to create a histogram as well.

  • Here's a web scraping example.

  • So basically, it's supposed to do web scraping for us.

  • And the last one is to come up with a machine learning model.

  • So it has to generate synthetic data, and the features are square footage, number of bedrooms, and the year built.

  • And the problem is to predict house prices based on that data.

  • So it has to split the data into training and test set, create and train a linear regression model, evaluate the model's performance on the test set, and then use that model to predict the housing prices.

  • So these are different problems that I want to see whether the Gemini Flash API can help me solve with this code execution or not.

  • Now, for the simple mathematics, this was a straightforward task.

  • We have seen a couple of examples before.

  • So it was able to do that.

  • For string manipulation, again, even though it's a multi-step process, it was able to do it without any issues whatsoever.

  • Same is for data analysis.

  • It had absolutely no problems in actually doing the data analysis.

  • Now, it did write the code to generate the plots.

  • So here's the code to generate the plots, but it's not able to actually run that and give us the output in terms of the plot.

  • So what I had to do was I just took that code, ran it myself, because you get the code as an output, and then you can generate that plot in here.

  • So this is pretty neat.

  • Next time, I actually might ask it to also give us the actual data that it generated.

  • I think that's going to be an interesting test to do.

  • Now, for the web scraping one, it actually came up with some HTML code, and based on that code, it wrote some Python code to do the extraction.

  • Now, it's not able to install the beautiful soup package.

  • So even though the code is correct, we will not be able to see the output.

  • So here I took the code that it generated with that example HTML page, right, ran it locally, and the outputs are correct.

  • So it seems to be working.

  • And when it comes to the training and machine learning model, this was actually the surprising one because it uses scikit-learn, which is a machine learning package or library for training and evaluating traditional machine learning models.

  • This is for the people who are not familiar with pre-deep learning or LLMs.

  • So it's a very famous package, which is still being used a lot in the industry.

  • So using that, it seems to be able to generate data and train a logistic regression model, which is pretty impressive.

  • So I think it does have some packages in there, apart from NumPy and SymPy, and it seems to be working.

  • It not only trained the model, but it also generated the results.

  • And what I did was I basically verified that the code works based on the generated output, and this seems to be working.

  • Okay, before we wrap it up, let's just quickly look at the pricing.

  • Now there is a free tier, and that's the one that I was using.

  • So you can run this for free.

  • There are some limitations in terms of how many requests that you can send per minute, but if you're looking for a paid version, you also have the ability to do that.

  • So I think it's definitely worth checking out, especially with these new features that Google is adding.

  • One of the features that I'm going to be covering in my next video is going to be context caching, which I think is another great feature that developers definitely need to check out if you are concerned about the cost of running these LLM API calls.

  • Anyways, do check out code execution from Gemini APIs.

  • I think it's a really good feature, and I hope the other API providers will be able to implement something like this.

  • If you found this content useful and you're new here, make sure to subscribe to the channel.

  • I create a lot of technical content and cover LLM training, fine tuning, putting them into production, doing inference, and everything in between.

  • I hope you found this video useful.

  • Thanks for watching, and as always, see you in the next one.

Google has recently introduced several useful features to their Gemini API, targeting developers specifically.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it