Subtitles section Play video
So yes, today I'm going to talk about eBPF, as Sven said, I work at Aqua Security where we build tools to help enterprises with their cloud-native, securing their cloud-native deployments, and eBPF is one of the technologies that we're starting to leverage, in particular with a project called Tracy.
正如斯文所說,我在 Aqua Security 公司工作,我們開發工具幫助企業實現雲原生,確保雲原生部署的安全,而 eBPF 是我們開始利用的技術之一,尤其是在一個名為 Tracy 的項目中。
Now, I have done a talk about eBPF before, but today I'm going to do something new in that I am writing my code, or at least my user space code, with Go today, it's the first time I've done the talk using Go, so that's the new twist that we're going to tackle today.
現在,我曾經做過一次關於 eBPF 的演講,但今天我要做一些新的事情,因為我今天要使用 Go 編寫我的代碼,或者至少是我的用戶空間代碼,這是我第一次使用 Go 進行演講,所以這也是我們今天要解決的新問題。
So, before I get into the point where I'm going to start using Go and start writing some code, I should probably start by talking a little bit about what eBPF is.
是以,在開始使用 Go 和編寫代碼之前,我應該先介紹一下什麼是 eBPF。
Key thing is, it lets you run custom programs of your choice in the kernel, so it's a Linux kernel feature technology that lets you, on the fly, add and change code programs that are going to run in response to events, and you can change them dynamically without having to reboot the machine, so much, much more powerful than writing a kernel module.
最關鍵的是,它能讓你在內核中運行自己選擇的自定義程序,是以它是一種 Linux 內核功能技術,能讓你在運行過程中添加和更改代碼程序,這些程序將根據事件響應運行,你可以動態地更改它們,而無需重啟機器,這比編寫內核模塊要強大得多。
We've seen eBPF becoming really a hot technology over the last few years, because it's so powerful and because you can hook it into so many different events, we're seeing it used in lots of observability tools, we're starting to see it used for security as well.
在過去幾年裡,我們看到 eBPF 成為了一項熱門技術,因為它功能強大,而且可以與許多不同的事件掛鉤,我們看到它被用於許多可觀察性工具,我們也開始看到它被用於安全領域。
Today, I'm going to talk more about how it works and hopefully give you enough grounding that you can go away and start writing your own eBPF code.
今天,我將詳細介紹它是如何工作的,希望能給你足夠的基礎,讓你可以開始編寫自己的 eBPF 代碼。
Now, if we're going to run some code in the kernel, but as developers, we're usually writing applications in user space, so how do we communicate between user space and kernel?
現在,如果我們要在內核中運行一些代碼,但作為開發人員,我們通常在用戶空間中編寫應用程序,那麼我們如何在用戶空間和內核之間進行通信呢?
The answer is system calls.
答案就是系統調用。
System calls provide the interface between user space and kernel.
系統調用提供了用戶空間和內核之間的接口。
So, I can make a pretty good guess that there would be a system call related to eBPF, and there certainly is.
是以,我可以很肯定地猜測,會有一個與 eBPF 相關的系統調用,而且肯定是有的。
If we look up the man page for BPF, we'll find lots of really helpful information about what BPF is and how we use it.
如果我們查找 BPF 的 man 頁面,就會發現很多關於 BPF 是什麼以及如何使用它的有用資訊。
So, first of all, BPF stands for Berkeley Packet Filters.
首先,BPF 代表伯克利數據包過濾器。
I'm going to use the word eBPF and BBF pretty interchangeably.
我打算把 eBPF 和 BBF 互換使用。
Historically, it was about filtering network packets, running custom code when a network packet arrived.
從歷史上看,它就是過濾網絡數據包,在網絡數據包到達時運行自定義代碼。
That's been extended.
已經延期。
You can now run your BPF programs in response to lots and lots of different types of events, not just the arrival of a network packet.
現在,您可以運行 BPF 程序來響應大量不同類型的事件,而不僅僅是網絡數據包的到達。
So, whether it's classic BPF or eBPF doesn't really matter these days.
是以,無論是傳統的 BPF 還是 eBPF,如今都已經不重要了。
But in both cases, you're running code in the kernel, and you really, really don't want the kernel to crash or hang.
但在這兩種情況下,你都是在內核中運行代碼,你真的真的不希望內核崩潰或掛起。
So, when we want to run some eBPF code, it goes through a verification step to make sure that it's safe to run.
是以,當我們要運行一些 eBPF 代碼時,它需要經過一個驗證步驟,以確保可以安全運行。
So, that's something we'll talk a little bit more about later.
所以,這一點我們稍後再談。
So, we have some eBPF code that's going to run in the kernel.
是以,我們有一些 eBPF 代碼將在內核中運行。
We have user space application code that, as developers, is normally where we're used to writing code.
我們有用戶空間應用代碼,作為開發人員,這通常是我們習慣編寫代碼的地方。
And there's a system call interface between the two.
兩者之間有一個系統調用接口。
And I think if we look at the system calls, that can actually be quite helpful for understanding really what's happening when we're talking about inserting code into the kernel.
我認為,如果我們看一下系統調用,這對於理解在內核中插入代碼時到底發生了什麼很有幫助。
So, I'm going to start by using an example.
所以,我先舉一個例子。
I'm going to use BPF trace.
我要使用 BPF 跟蹤器。
This is quite a widely used tool for running BPF scripts on a system.
這是在系統上運行 BPF 腳本時使用相當廣泛的工具。
And so, I'm going to show this partly as an example to show the kind of things you can do with BPF, and partly so that we can examine the system calls that happen when we run it.
是以,我將舉例說明 BPF 可以實現的功能,並檢查運行時發生的系統調用。
So, let's explore some BPF trace and the system calls it uses.
是以,讓我們來探索一些 BPF 跟蹤及其使用的系統調用。
So, I have an example that I'm going to use.
所以,我要舉一個例子。
It doesn't really matter that much what my example is.
我的例子是什麼並不重要。
But in this case, I'm setting up a script that's going to run on a trace point.
但在這種情況下,我要設置一個在跟蹤點上運行的腳本。
It's running when we call the sysenter.
當我們呼叫系統中心時,它正在運行。
The sysenter function actually gets triggered for every single system call.
實際上,每次系統調用都會觸發 sysenter 函數。
So, I'm going to run a script every time any process on my virtual machine calls a system call.
是以,每次虛擬機上的任何進程調用系統調用時,我都要運行一個腳本。
And it's going to run this script.
它將運行這個腳本。
What that script actually does is it takes, it sets up a counter for the number of times each different command makes a system call.
該腳本的實際作用是,為每個不同命令進行系統調用的次數設置一個計數器。
So, if I try to run that, well, you have to be a privileged user to do it.
是以,如果我嘗試運行它,那麼你必須是有權限的用戶才能運行。
That kind of makes sense.
這也說得通。
You don't want every unprivileged user running code in your kernel.
你不會希望每個無權限用戶都在你的內核中運行代碼。
So, I can use sudo.
是以,我可以使用 sudo。
I just run that for a few seconds, and then I'll interrupt it.
我只運行幾秒鐘,然後就中斷它。
And it's going to show us for each different kind of command that's running on my machine, the number of system calls.
它會顯示在我的機器上運行的每種不同命令的系統調用次數。
I mean, that's quite interestingly powerful that we can get to such fine granularity as counting every single system call that's happening on my machine.
我的意思是,我們能做到如此精細的粒度,就像計算我的機器上發生的每一個系統調用一樣,這非常有趣,也非常強大。
So, I'm going to run the same thing again.
所以,我要再做一次同樣的事情。
But this time, let's have a look at the BPF system calls that are being called.
但這次,讓我們來看看正在調用的 BPF 系統調用。
So, I do that with strace.
所以,我就用strace來做這個。
I'm just going to look for BPF system calls and quit it after a few seconds.
我只是要查找 BPF 系統調用,幾秒鐘後就退出。
And the thing that I want to show you is these BPF system calls that are being called.
我想向大家展示的是正在調用的 BPF 系統調用。
So, we see a few map create, we see map update element, and we see program load.
是以,我們可以看到一些地圖創建、地圖更新元素和程序加載。
So, that tells us something or indicates something about a couple of concepts we need to know about.
是以,這告訴了我們一些事情,或者說表明了我們需要了解的幾個概念。
Those are BPF programs and BPF maps.
這些是 BPF 程序和 BPF 地圖。
So, we use the same call, but with a different parameter to manipulate programs and maps.
是以,我們使用相同的調用,但使用不同的參數來操作程序和地圖。
So, starting with the programs, what are those programs?
那麼,從計劃開始,這些計劃是什麼?
Well, I mean, they're programs.
我的意思是,它們是程序。
They run on the CPU.
它們在 CPU 上運行。
They're essentially machine code instructions.
它們本質上是機器碼指令。
But for BPF, we're restricted in what we can run because of that requirement for BPF code to be safe.
但對於 BPF 而言,由於要求 BPF 代碼必須安全,我們在運行方面受到了限制。
It mustn't crash, it mustn't loop.
不能崩潰,不能循環。
So, we typically write our BPF programs in C, a restricted set of C, which we don't use any loops.
是以,我們通常用 C 語言編寫 BPF 程序,這是一套受限的 C 語言,我們不使用任何循環。
We always have to check that a pointer is not null before we dereference it.
在取消引用指針之前,我們必須檢查指針是否為空。
And then we use the Clang compiler to convert it into an eBPF object, a set of bytecode instructions that are going to get run inside a BPF virtual machine inside the kernel.
然後,我們使用 Clang 編譯器將其轉換為 eBPF 對象,即一組字節碼指令,這些指令將在內核中的 BPF 虛擬機內運行。
So, we're going to write the kernel code in C.
是以,我們要用 C 語言編寫內核代碼。
We get some helper functions that give us some useful contextual information.
我們會得到一些輔助函數,為我們提供一些有用的上下文資訊。
So, for example, we can print debugging messages with a helper function.
例如,我們可以使用輔助函數打印調試信息。
We can get information about the current running command.
我們可以獲取當前運行命令的資訊。
That's how BPF trace knows which command is running, as we saw in the previous example.
這就是 BPF 跟蹤如何知道哪個命令正在運行的,正如我們在前面的示例中所看到的。
Lots of, I guess, a few dozen of these helper functions that can help us with contextual information.
很多,我猜有幾十個這樣的輔助功能可以幫助我們獲取上下文資訊。
The other thing I talked about was maps, or the other thing we saw in our system course was maps.
我談到的另一件事是地圖,或者說我們在系統課程中看到的另一件事是地圖。
And maps are really how we get information between our eBPF program running in the kernel and user space.
而映射實際上就是我們在內核和用戶空間運行的 eBPF 程序之間獲取信息的方式。
We'll come back to a bit more detail about maps shortly.
稍後我們會再詳細介紹地圖。
And then the last kind of conceptual thing we really need to know about is the fact that these programs are triggered by an event happening.
我們真正需要了解的最後一種概念性的東西是,這些程序是由發生的事件觸發的。
We saw an event, an example where we run a program in response to system calls, in response to triggering a hook at the entry to the function called sysenter.
我們看到了一個事件,一個我們運行程序以響應系統調用的例子,一個在名為 sysenter 的函數入口處觸發鉤子的例子。
There are tons of these hooks already predefined, essentially every function entry and exit, every system call, every trace point in the kernel, every time a network packet arrives.
我們已經預定義了大量這樣的鉤子,基本上包括每個函數的進入和退出、每個系統調用、內核中的每個跟蹤點以及每次網絡數據包到達時的鉤子。
All of these are possible points where you can trigger an eBPF program.
所有這些都是觸發 eBPF 程序的可能點。
And we also have the term k-probe and u-probe.
我們還有 k-探針和 u-探針。
The k-probe is the entry to a kernel function.
k 探針是內核函數的入口。
A k-rep probe is the exit from a kernel function, and correspondingly the same for user space.
k-rep 探針是內核函數的出口,相應地,用戶空間也是如此。
Combination of all these different types of events means we can really run eBPF code in response to pretty much anything that's happening in your Linux machine.
將所有這些不同類型的事件結合起來,意味著我們真的可以運行 eBPF 代碼,以響應 Linux 機器中發生的幾乎任何事情。
So how do we attach the program to an event?
那麼,我們如何將程序附加到活動中呢?
And again, I think it's a little bit helpful to look at the system calls that are happening.
再說一遍,我認為查看正在發生的系統調用會有點幫助。
We're going to not just use the BPF call, but also a couple more system calls per event open, which sets up a trace point.
我們不僅要使用 BPF 調用,還要在每個事件打開時再使用幾個系統調用,這樣就能設置一個跟蹤點。
So program load gives us a file descriptor that I've called x here.
是以,程序加載為我們提供了一個文件描述符,我在這裡稱之為 x。
The trace point comes back as y, and then there's an IO control event that associates the trace point with the program that should be triggered.
跟蹤點的返回值為 y,然後會出現一個 IO 控制事件,將跟蹤點與應觸發的程序關聯起來。
So again we can take a look at that in our BPF trace example.
是以,我們可以在 BPF 跟蹤示例中再次看到這一點。
Again, I just trace out those additional system calls to event, oops, event open, and let's see.
同樣,我只需追蹤出這些額外的系統調用事件,哎呀,事件打開,讓我們來看看。
Again, it doesn't really matter too much exactly what's happening.
同樣,到底發生了什麼並不重要。
I just really want to show you this program load BPF call that comes back with the file descriptor of nine.
我只是想向你展示一下這個程序加載 BPF 調用,它返回的文件描述符是 9。
There should be perf event open.
應該有灌水活動開放。
Yeah, this perf event open here, which comes back with a file descriptor of eight.
是的,這個 perf 事件在這裡打開,返回的文件描述符為 8。
And then here is the IO control that associates eight and nine and says, this is the BPF program that I want you to run when we hit that trace point.
然後這裡是 IO 控制,它將 8 和 9 聯繫起來,並說:"這是 BPF 程序,當我們到達跟蹤點時,我希望你運行它。
So loading the program and associating the program with the trace point is something we're going to have to do from user space.
是以,加載程序並將程序與跟蹤點關聯起來,是我們必須在用戶空間完成的工作。
Okay, so if we want to write hello world in eBPF, what do we need to do?
好了,如果我們想用 eBPF 寫 hello world,需要做些什麼呢?
What do we need to have in place?
我們需要準備些什麼?
We know we're going to have to write some C code that's going to run in the kernel and that's going to get compiled by Clang.
我們知道,我們需要編寫一些 C 代碼,這些代碼將在內核中運行,並由 Clang 進行編譯。
And we're going to have to write some user space code that gets the tracing, the hello world message from the kernel and displays it.
我們需要編寫一些用戶空間代碼,從內核獲取並顯示 "hello world "資訊。
And we can write that, at least in theory, we can write it in any language of our choice.
至少在理論上,我們可以用我們選擇的任何語言來寫。
For most of us, we don't typically interact with system calls very often when we're writing user space applications.
對於我們大多數人來說,在編寫用戶空間應用程序時,通常不會經常與系統調用交互。
There's usually some level of abstraction.
通常會有一定程度的抽象。
And in fact, many of us don't know that system calls exist.
事實上,我們很多人都不知道系統調用的存在。
We don't have to deal with them on a day-to-day basis.
我們不必每天與他們打交道。
For BPF, there is, we would want a library, a BPF library that gives us a higher level of abstraction over those BPF system calls and things like the perf event open that we just saw.
對於 BPF 來說,我們需要一個庫,一個 BPF 庫,它能為我們提供更高級別的 BPF 系統調用抽象,比如我們剛才看到的 perf 事件打開。
And the library that I'm going to use today is called libbpf-go.
我今天要使用的庫叫做 libbpf-go。
And we actually wrote this as part of a tool called Tracy that's an eBPF security event detection tool we're working on.
實際上,我們將其作為一個名為 Tracy 的工具的一部分來編寫,該工具是我們正在開發的一個 eBPF 安全事件檢測工具。
And we've isolated the libbpf wrapper.
我們已經隔離了 libbpf 封裝程序。
So there's a C library called libbpf, which is a wrapper for the system calls.
是以,有一個名為 libbpf 的 C 語言庫,它是系統調用的包裝器。
And libbpf-go is a pretty thin go wrapper, giving us go bindings around those libbpf interface.
libbpf-go 是一個很薄的 go 封裝器,為我們提供了 libbpf 接口的 go 綁定。
So we're going to write some go code that uses libbpf-go.
是以,我們要編寫一些使用 libbpf-go 的 go 代碼。
And we're also going to write some C code, which we're going to compile into eBPF objects using the Clang compiler.
我們還將編寫一些 C 代碼,並使用 Clang 編譯器將其編譯成 eBPF 對象。
And then the go code is going to read that object file, get the contents out, insert it into the kernel.
然後,go 代碼將讀取該對象文件,取出其中的內容,並將其插入內核。
So we have an object file that has the eBPF code and the definition of any maps.
是以,我們有一個包含 eBPF 代碼和地圖定義的對象文件。
Talk about maps a bit more later.
稍後再談地圖。
We have our user space code that's driving our system calls and has the kind of logic around what programs we want to run, what we want to attach them to.
我們有用戶空間代碼來驅動我們的系統調用,並圍繞我們想要運行的程序和我們想要將它們附加到的程序進行邏輯設計。
When the user space code calls that BPF program load, it sends the program to the kernel.
當用戶空間代碼調用 BPF 程序加載時,它會將程序發送到內核。
The kernel will verify it, make sure that it's safe to run.
內核會對其進行驗證,確保可以安全運行。
And if it is, it will start running it inside this BPF virtual machine.
如果是,它就會開始在這個 BPF 虛擬機內運行。
So we're going to build two objects.
是以,我們要創建兩個對象。
We've got two different compilation steps.
我們有兩個不同的編譯步驟。
We've got to use the go compiler, go build, to create a go executable.
我們必須使用 go 編譯器 go build 來創建 go 可執行文件。
And we're going to use Clang to build the BPF object file.
我們將使用 Clang 來構建 BPF 對象文件。
All right.
好吧
I think we have enough to actually start writing some code.
我想我們已經有足夠的能力開始編寫一些代碼了。
So let's go to my editor.
那就去找我的編輯吧
And this is my make file.
這是我的製作文件。
It's pretty much exactly what I just showed you on the slide.
這和我剛才在幻燈片上給你們演示的幾乎一模一樣。
So we have a go build step and a Clang step for building the eBPF object.
是以,我們有一個 go 生成步驟和一個 Clang 生成步驟來生成 eBPF 對象。
And let's start with the C code.
讓我們從 C 代碼開始。
So I'm going to write a function called hello.
是以,我要編寫一個名為 hello 的函數。
Exit context pointer, they all do.
退出上下文指針,它們都是如此。
I'm going to just do some hello world tracing.
我要做一些 hello world 跟蹤。
So let's say hello, go topia.
所以,讓我們打個招呼吧,走吧,topia。
And we'll return zero exit code.
我們將返回零退出代碼。
The other thing I have to do is define an object code section.
我要做的另一件事是定義對象代碼部分。
This tells the, essentially, the object loader what kind of BPF program this is going to be.
這主要是告訴對象加載器這將是一個什麼樣的 BPF 程序。
This is kind of a level of detail we don't need to worry about too much today.
今天,我們不需要過多地擔心這些細節。
But you can write, you can use different helper functions and do different things depending on the type of program you're running and the type of event you're attaching it to.
但你可以編寫、使用不同的輔助函數,並根據運行的程序類型和附加的事件類型做不同的事情。
So I'm going to attach to a K probe the entry point to a function in a kernel.
是以,我要在 K 探針上附加內核函數的入口點。
And I'm actually going to run this whenever the system call exec V gets triggered.
實際上,只要系統調用 exec V 被觸發,我就會運行這個程序。
So that's my C code.
這就是我的 C 代碼。
And let's compile that.
讓我們來彙編一下。
So I'm just going to run the make on my BPF object file target to start with.
是以,我將首先在我的 BPF 對象文件目標上運行 make。
And that should give me an object file that I can look at.
這樣我就可以查看對象文件了。
And there are a couple of interesting things to look at in this object file.
在這個對象文件中,有幾個有趣的地方值得關注。
So, first of all, it's a little engine machine.
首先,它是一臺小型發動機。
I will need that in a moment.
我馬上就需要。
And this object file is designed, it's compiled to run in a Linux BPF virtual machine.
這個對象文件經過設計和編譯,可以在 Linux BPF 虛擬機中運行。
We can see here's the section declaration for the fact that it's running as a K probe on sysex exec V.
我們可以看到這裡的部分聲明,它是作為 K 探針在 sysex exec V 上運行的。
And here is the function name.
這裡是函數名稱。
So I might say eBPF program.
是以,我可以說是 eBPF 計劃。
A program is really a function.
程序其實就是函數。
So that information from the L file is what is going to help the Go code know how to insert it into the kernel.
是以,L 文件中的資訊將幫助 Go 代碼知道如何將其插入內核。
So let's write some Go code.
那麼,讓我們來編寫一些 Go 代碼吧。
Right.
對
I already have a reference to libBPF Go here.
我已經在這裡引用了 libBPF Go。
And I have a convenience function called must that I'm going to use to trap any errors and panic crash if we see any errors.
我有一個名為 must 的便利函數,用來捕獲任何錯誤,並在出現任何錯誤時立即崩潰。
Hopefully we won't hit that.
希望我們不會遇到這種情況。
Don't do that in production.
不要在生產中這樣做。
Really bad idea.
真是個餿主意。
But it will be fine for demo purposes.
但用於演示還是沒問題的。
So the first thing I'm going to do is I'm going to open this file.
所以我要做的第一件事就是打開這個文件。
I'm going to do new module from file.
我要從文件中創建新模塊。
And reading from that object file that we just built.
然後讀取我們剛剛創建的對象文件。
We want to catch any errors.
我們希望發現任何錯誤。
And I'm going to use a defer.
我將使用延遲。
I don't know if we have any Go programmers here.
我不知道這裡有沒有圍棋程序員。
If you're not familiar with Go, this defer keyword may be new to you.
如果你對 Go 不熟悉,這個 defer 關鍵字對你來說可能很陌生。
Basically make sure that on the exit from whatever function we're in, run this code.
基本上,無論我們在哪個函數中,都要確保在退出時運行這段代碼。
In this case, I want to make sure that we tidy up and we close our file at the exit from this function.
在這種情況下,我要確保我們在退出該函數時整理並關閉文件。
And just for fun, I'm going to write cleaning up here so that we know when we're exiting from the statement that's been printed.
為了好玩,我要在這裡寫上清理,以便我們知道何時退出已打印的語句。
Okay.
好的
So I've opened my object file.
是以,我打開了我的對象文件。
And I now need to load that into the kernel.
現在我需要將其加載到內核中。
And that has to succeed.
這必須取得成功。
Okay.
好的
Now I can I want to get the hello function program.
現在我可以獲取 hello 函數程序了。
And I want to attach it to a kprobe.
我想把它連接到 kprobe 上。
So first of all, I need to get the program.
所以,首先,我需要拿到程序。
Got a nice function to get that program.
有一個很好的功能來獲取該程序。
And we know it's called hello.
我們知道這叫 "你好"。
And we need to attach that to kprobe.
我們需要將其連接到 kprobe。
So I'm attaching it to, yes, the P is my program.
所以,我把它附在了我的程序上,是的,P 就是我的程序。
And I'm attaching it to the function call that relates to exec VE.
我將其附加到與執行 VE 有關的函數調用中。
Now, on this particular kernel, the function name is this.
現在,在這個特定的內核上,函數名稱是這樣的。
And this could return me an error.
這可能會返回錯誤信息。
So I need to catch that error.
是以,我需要抓住這個錯誤。
Okay.
好的
So I've got my object opened.
所以我打開了我的對象。
I've got the program from inside that object.
我已經從該對象內部獲取了程序。
And I have associated it with the exec VE system call.
我把它與 exec VE 系統調用聯繫起來。
The C code is writing some tracing information whenever it sees that system call.
每當 C 代碼看到系統調用時,就會寫入一些跟蹤資訊。
And I need to do something in user space to print it out.
我需要在用戶空間做一些事情才能打印出來。
And there is a convenient function.
還有一個方便的功能。
Trace.
追蹤。
There we go.
好了
Trace print.
跟蹤打印。
Now, this will basically block and print out whatever it receives from the debug tracing.
現在,這基本上會阻塞並打印出從調試跟蹤中接收到的任何資訊。
Okay.
好的
So I think we should be able to make this and run it.
是以,我認為我們應該能夠製作並運行它。
I have to run it as a privileged user.
我必須以特權用戶身份運行它。
And hooray!
萬歲
We have the equivalent of hello world.
我們有了 "hello world"。
Every time exec VE is running on this machine, we're getting the trace written out.
每次執行 VE 在這臺機器上運行時,我們都會得到寫出的跟蹤記錄。
Now, something didn't happen.
現在,有些事情沒有發生。
And that something is, we never saw the cleaning up line that I put.
那就是,我們從未見過我寫的清理臺詞。
So remember, I've got this here.
所以記住,我有這個。
And that's because when I interrupted the program, well, just interrupted the program and it stopped.
這是因為當我中斷程序時,好吧,只是中斷程序,它就停止了。
In fact, it was blocked somewhere in here in trace print and got interrupted.
事實上,它被擋在了痕跡打印的某個地方,被打斷了。
If I want to clean up properly, I'm gonna have to catch that interrupt.
如果我想好好清理一下,就得接住那個中斷。
Which I can do quite conveniently in Go.
在圍棋中,我可以很方便地做到這一點。
And this is also gonna illustrate Go channels, which we're gonna use a bit in a moment.
這也是 Go 通道的示意圖,我們稍後會用到。
So I'm gonna make a channel.
所以我要開一個頻道
This is a really nice feature of Go channels.
這是 Go 頻道的一大特色。
And this channel receives one item at a time and that item is signals from the operating system.
該通道每次接收一個項目,該項目是來自作業系統的信號。
And I want to be notified whenever there is an interrupt signal.
我想在出現中斷信號時收到通知。
That's gonna say if someone triggers interrupt, send a message on this signal channel.
如果有人觸發中斷,就會在這個信號通道上發送一條資訊。
Actually send the interrupt into the signal channel.
實際上是將中斷髮送到信號通道。
And I'm gonna block on that here.
我要在這裡堵住它。
This is essentially wait until you get an event on that signal and then throw it away on that channel.
這主要是等待該信號上出現事件,然後將其丟棄在該通道上。
And the last thing I need to do is send this blocking function off into its own Go routine.
最後,我需要做的就是把這個阻塞函數發送到它自己的圍棋例程中。
This is how Go handles concurrency.
這就是 Go 處理併發的方式。
This basically means it's doing its own thing in another thread.
這基本上意味著它在另一個線程中做自己的事情。
So this won't block anymore.
這樣就不會再堵塞了。
So I haven't done anything very different in terms program.
是以,我在計劃方面沒有做什麼大的改變。
But it should now run a bit more cleanly in that if I hit control C, we now see our cleaning up message.
不過,現在運行起來應該更乾淨利落了,因為如果我按下控制 C,就會看到我們的清理資訊。
And we know that things like my defer function will be executed.
我們知道,像我的 defer 函數這樣的東西會被執行。
Because it will there's nothing interrupting it before it gets to complete the function.
因為在它完成功能之前,沒有任何東西會打斷它。
All right.
好的
So that's hello world.
這就是 hello world。
But it's not terribly useful.
但它的用處不大。
In particular, this print K function is writing data to one well known pipe location on the machine.
特別是,該打印 K 功能正在將數據寫入機器上的一個已知管道位置。
If I ran any number of EPF programs and they all call print K, they'd all be writing to the same pipe.
如果我運行任意多個 EPF 程序,而且它們都調用打印 K,那麼它們都會向同一個管道寫入數據。
Which is not very useful for real world.
這在現實世界中並不是很有用。
So we're gonna have to go back and think a bit more about maps.
所以我們得回去再好好想想地圖的問題。
So I mentioned before, maps are the way we can share data between the kernel code and whatever's happening in user space.
我之前提到過,映射是我們在內核代碼和用戶空間之間共享數據的方式。
There are lots of different types of map.
地圖有很多不同類型。
I'm gonna use a thing called the perf event array.
我要使用一個叫 perf 事件數組的東西。
And this is nice partly because we can write an arbitrary blob of data.
這一點很好,部分原因是我們可以寫入任意 blob 數據。
So any kind of data we want to write, we can write it into this perf event buffer.
是以,我們想寫入的任何數據,都可以寫入這個 perf 事件緩衝區。
And on the user space side, there's a perf buffer implementation that can receive these data blobs on a go channel.
在用戶空間方面,有一個 perf 緩衝區實現,可以通過 go 通道接收這些數據塊。
So it's very sort of idiomatic way of receiving data from the EPF code.
是以,這是一種從 EPF 代碼接收數據的慣用方式。
Right.
對
So let's use BPF perf event output.
是以,讓我們使用 BPF perf 事件輸出。
So we're gonna do that here.
所以我們要在這裡做。
BPF perf event output.
BPF perf 事件輸出。
Get rid of my tracing call.
別再打我的追蹤電話了
And what does this require?
這需要什麼?
This requires context.
這需要有背景。
We need a map.
我們需要一張地圖。
I'm gonna just call it, I'll define it in a second, but we'll call it GoTopia.
我把它叫做 "GoTopia",稍後再給它下定義。
I have to pass a flag that indicates it's the current CPU.
我必須傳遞一個標誌,表明這是當前的 CPU。
And I'm gonna pass some data.
我要傳一些數據
I have to say how big the data is.
我不得不說數據有多大。
So let's make things easy.
所以,讓我們把事情簡單化吧。
Let's pass some data.
讓我們傳遞一些數據。
We'll just pass a value, make up a value and pass it.
我們只需傳遞一個值,編造一個值然後傳遞。
So whenever execve gets called, we're gonna pass this value into the perf buffer.
是以,每當 execve 被調用時,我們就會把這個值傳入 perf 緩衝區。
It just remains for me to define the perf buffer here.
我只需要在這裡定義 perf 緩衝區。
Perf output.
Perf 輸出。
And that is called GoTopia.
這就是 GoTopia。
Okay.
好的
And if I were to make that object and have a quick look at it again.
如果我把這個對象做出來,再快速看一遍。
Oh, read elf.
哦,讀精靈。
Hello.
你好
And this time we can see in addition to the function name, we've also got an object defined called GoTopia.
這次我們可以看到,除了函數名稱外,我們還定義了一個名為 GoTopia 的對象。
That's the definition of the map that has to exist in the object file.
這就是對象文件中必須存在的地圖定義。
Okay.
好的
So the kernel side is writing this data into my perf buffer.
是以,內核會將這些數據寫入我的 perf 緩衝區。
And I need to read it from the Go side.
我需要從圍棋的角度來解讀它。
So I'm not gonna be using trace print anymore.
所以我不會再使用跟蹤打印了。
But I am going to be using a perf buffer.
但我將使用灌注緩衝器。
So we'll call it PB.
所以我們稱之為 PB。
And we init a perf buffer.
我們啟動了一個灌注緩衝器。
And it's called GoTopia.
它的名字叫 GoTopia。
And I need to pass in a channel for the events that we're gonna receive.
我需要為我們要接收的事件輸入一個通道。
I'll define that in a second.
我馬上就給它下定義。
I'm going to ignore any lost events.
我會忽略任何丟失的事件。
Page size that I know works.
我所知道的有效頁面大小。
Okay.
好的
That has to succeed.
這必須成功。
And we have to start, oops, PB, not PS.
我們必須開始,哎呀,是 PB,不是 PS。
PB.
PB.
The perf buffer.
灌注緩衝器
And when we get to cleaning up, we're going to stop it.
當我們開始清理時,我們要阻止它。
I need to define this events channel.
我需要定義這個事件通道。
So we'll make a channel.
所以我們要做一個頻道。
And the type of data that we get from here is a slice of bytes.
我們從這裡獲得的數據類型是一片字節。
So that's what we need to see.
這就是我們需要看到的。
And we'll just say some arbitrary length.
我們就隨便說個長度。
Okay.
好的
So that's set up the perf buffer.
這樣就設置好了灌注緩衝區。
We now need to receive these events.
我們現在需要接收這些事件。
I'm gonna do it in a Go routine again.
我要再用圍棋的方式來做一次。
Which I'll call inline.
我稱之為內聯。
And we're going to loop reading information out of this channel.
我們將從這個通道循環讀取資訊。
So every time data arrives on that channel, it will get assigned to data.
是以,每次數據到達該通道時,都會被分配給數據。
And let's print it out.
讓我們把它打印出來。
Got something.
有發現
Now, we know that it's an unsigned 64-bit integer.
現在,我們知道這是一個無符號 64 位整數。
So I can convert my slice of bytes.
這樣我就可以轉換我的字節片了。
Our little endian.
我們的 "小 endian"。
We saw that before.
我們以前見過。
US64 data.
US64 數據。
Okay.
好的
So let's see if that builds.
讓我們拭目以待。
It does.
確實如此。
And hopefully this time, every time exec VE gets called by any process on the machine, it's sending us 64-bit integer.
希望這次機器上的任何進程每次調用執行 VE 時,都能向我們發送 64 位整數。
So we're using this perf buffer, but we're not doing anything very useful with that.
是以,我們使用了這個灌注緩衝區,但並沒有做任何有用的事情。
It's just some number that I've decided to pass.
這只是我決定通過的一些數字。
How about we get some information about the current context, like what's the name of the command?
不如我們來獲取一些關於當前上下文的資訊,比如命令的名稱是什麼?
The current command being called that triggered this, the syscall, the exec VE syscall.
當前調用的觸發命令、系統調用、執行 VE 系統調用。
So we, instead of passing numeric data, let's make this into some characters.
是以,我們不傳遞數字數據,而是將其轉換成一些字元。
Again, a bit of an arbitrary length.
同樣,長度有點隨意。
Size of that data.
數據的大小。
Okay.
好的
And we write it into the perf buffer in exactly the same way.
我們以完全相同的方式將其寫入 perf 緩衝區。
And I just need to change this so that converts my string of bytes, my series of bytes into a string.
我只需要修改一下,就能將我的字節串、字節系列轉換成字符串。
So now I should be able to say that.
所以,現在我應該可以這麼說了。
So this time, every time exec VE gets called, we should see the name of the command.
這樣,每次調用 exec VE 時,我們都能看到命令的名稱。
And yeah, we can see, I happen to have my Kubernetes and Docker running on this virtual machine.
是的,我們可以看到,我的 Kubernetes 和 Docker 恰好運行在這臺虛擬機上。
So they're spawning quite a few new processes.
是以,它們會產生大量新進程。
So this is starting to get pretty close to the example we saw with BPF traffic.
是以,這與我們看到的 BPF 流量示例非常接近。
So let's do a little bit of trace.
那麼,讓我們來做一點追蹤。
If you remember that example, well, it was summing up a counter for each different command.
如果你還記得那個例子,那麼,它是對每個不同命令的計數器求和。
So I think that would be pretty easy to implement in Go.
是以,我認為這在 Go 中很容易實現。
Let's make, so we're going to have a map of the command name, which is a string and the counter.
讓我們製作一個命令名稱(字符串)和計數器的映射。
Make some arbitrary size of that.
任意確定其大小。
And instead of printing out the name of the command, we can just increment the counter for that name.
與其打印出命令名稱,我們還不如直接遞增該名稱的計數器。
Okay.
好的
And when we finish, we will loop over that counter and print out the results.
完成後,我們將在計數器上循環並打印出結果。
So the key and the value from my counter will print out the name and the value.
是以,我的計數器中的鍵和值將打印出名稱和值。
Name and the value.
名稱和數值。
Excuse me.
打擾一下
Right.
對
So have I missed anything there?
那麼,我是否錯過了什麼?
I think that's okay.
我覺得這沒什麼。
So this is still associated with the exec vehicle.
是以,這仍然與執行車輛有關。
Let's see if it works.
讓我們看看它是否有效。
So we'll just run that for a couple of seconds.
是以,我們只需運行幾秒鐘。
When we interrupt it, it should print out the counters.
當我們中斷它時,它應該打印出計數器。
Okay.
好的
The last thing we need to do is change the attachment point.
我們需要做的最後一件事就是更改連接點。
So currently we're attached to a K-probe at the entry of execve.
是以,目前我們在 execve 的入口處連接了一個 K-探針。
I'm going to change it so that it's associated with a trace point for sysenter.
我要將其改為與 sysenter 的跟蹤點相關聯。
And as I mentioned before, sysenter gets called every time any system call is invoked.
正如我之前提到的,每次調用任何系統調用時,都會調用 sysenter。
So I need to change that in two places.
是以,我需要修改兩個地方。
I need to change the section declaration here.
我需要修改這裡的章節聲明。
So this becomes raw trace point sysenter.
是以,這就成了原始跟蹤點系統中心。
And I need to change where we attached it here.
我需要改一下這裡的連接位置。
A raw trace point called sysenter.
名為 sysenter 的原始跟蹤點。
And with a bit of luck, this is going to recreate that BPF script.
如果運氣好的話,這將重現 BPF 腳本。
So we'll just run it for a few seconds.
所以我們先運行幾秒鐘。
And then when we interrupt it, we should see a counter.
然後,當我們中斷它時,應該會看到一個計數器。
And those counters should tell us how many system calls have been invoked by each of those different commands.
這些計數器應該能告訴我們這些不同的命令分別調用了多少次系統調用。
So we've recreated that BPF trace command.
是以,我們重新創建了 BPF 跟蹤命令。
We've done it in just under 60 lines of Go code and a handful of lines of C.
我們只用了不到 60 行 Go 代碼和幾行 C 代碼就做到了這一點。
I hope that's given an illustration of the kind of things you could do.
我希望這能說明你可以做哪些事情。
Now, obviously, you can attach to many different trace points.
現在,你顯然可以連接到許多不同的跟蹤點。
I've only slightly scratched the surface of the things that you can do with BPF helper functions and all the different range of contextual information you could then observe and manipulate and pass up to user space.
對於使用 BPF 輔助函數所能做的事情,以及可以觀察、處理並傳遞到用戶空間的各種上下文資訊,我只是略微觸及皮毛。
I've had to gloss over lots and lots of details in the interest of time.
由於時間關係,我不得不略去很多細節。
But the code that I've written is available on GitHub.
但我編寫的代碼可以在 GitHub 上找到。
And I'm really hoping that you have some questions for me.
我真的希望你們能問我一些問題。
Thank you.
謝謝。