使用 Golang 开发 eBPF 的应用

文摘 2024-07-15 10:07 上海

这篇文章来自我在 Go Konf Istanbul '24[1] 会议上的演讲。

大多数时候,我们在开发软件或使用软件时,都在操作系统的安全边界内运行。我们甚至不知道IP数据包是如何从网络接口被接收的,或者当我们保存文件时,inode是如何被文件系统处理的。

这个边界被称为用户空间,我们在这里编写应用程序、库和工具。但还有另一个世界,即内核空间。操作系统内核就驻留在这里,负责管理系统资源,如内存、CPU和I/O设备。

通常我们不需要深入到socket或文件描述符之下,但有时我们需要这样做。比如说,你想分析一个应用程序以查看它消耗了多少资源。

如果从用户空间分析应用程序,你不仅会错过许多有用的细节,而且还会消耗大量资源进行分析本身,因为每一层都会在CPU或内存上引入一些开销。

深入内核的需求

假设你想深入到内核堆栈中,以某种方式将自定义代码插入内核,以分析应用程序、跟踪系统调用或监控网络数据包。你会怎么做呢?

传统上你有两个选择。

选项1:编辑内核源代码

如果你想修改Linux内核源代码,然后将同一内核发布给客户机器,你需要说服Linux内核社区这个更改是必需的。然后,你需要等待几年时间,等待新的内核版本被Linux发行版采用。

对于大多数情况来说,这不是一种实用的方法,仅仅为了分析一个应用程序或监控网络数据包,这也有点过头了。

选项2:编写内核模块

你可以编写内核模块,这是一段可以加载到内核中并执行的代码。这是一种更实用的方法,但也有自己的风险和缺点。

首先,你需要编写内核模块,这并不容易。然后,你需要定期维护它,因为内核是一个不断变化的东西。如果你不维护内核模块,它就会过时,无法与新的内核版本一起工作。

其次,你有可能破坏Linux内核,因为内核模块没有安全边界。如果你编写的内核模块有bug,它可能会导致整个系统崩溃。

eBPF的引入

eBPF(Extended Berkeley Packet Filter)是一项革命性技术,允许你在几分钟内重新编程Linux内核,甚至无需重启系统。

eBPF允许你跟踪系统调用、用户空间函数、库函数、网络数据包等等。它是一个强大的工具,可用于系统性能、监控、安全等多个领域。

但是如何做到呢?

eBPF是由几个组件组成的系统:

eBPF程序
eBPF钩子
BPF映射表
eBPF验证器
eBPF虚拟机

注意,我在文中使用了"BPF"和"eBPF"这两个术语。eBPF代表"Extended Berkeley Packet Filter"。BPF最初被引入到Linux中用于过滤网络数据包,但eBPF扩展了原始BPF,允许它用于其他目的。如今它与Berkeley无关,也不仅仅用于过滤数据包。

下图说明了eBPF在用户空间和内核空间下的工作原理。eBPF程序使用高级语言(如C)编写,然后编译为eBPF字节码。之后,eBPF字节码被加载到内核中,由eBPF虚拟机执行。

eBPF程序被附加到内核中特定的代码路径上,例如系统调用。这些代码路径被称为"钩子"。当钩子被触发时,eBPF程序就会执行,现在它执行你编写的自定义逻辑。通过这种方式,我们可以在内核空间中运行自定义代码。

eBPF Hello World示例

在深入细节之前,让我们编写一个简单的eBPF程序来跟踪execve系统调用。我们将用C语言编写程序,用Go编写用户空间程序,然后运行用户空间程序将eBPF程序加载到内核中,并在实际执行execve系统调用之前轮询我们将从eBPF程序发出的自定义事件。

编写eBPF程序

让我们首先编写eBPF程序。我将分部分编写以更好地解释细节,但您可以在我的GitHub存储库中找到整个程序: ozansz/intro-ebpf-with-go[2] 。

 1#include "vmlinux.h"
 2#include <bpf/bpf_helpers.h>
 3
 4struct event {
 5    u32 pid;
 6    u8  comm[100];
 7};
 8
 9struct {
10	__uint(type, BPF_MAP_TYPE_RINGBUF);
11	__uint(max_entries, 1000);
12} events SEC(".maps");

在这里,我们导入vmlinux.h头文件,其中包含内核的数据结构和函数原型。然后我们包含bpf_helpers.h头文件,其中包含eBPF程序的辅助函数。

然后我们定义一个struct来保存事件数据,然后我们定义一个 BPF映射[3] 来存储事件。我们将使用此映射在eBPF程序(将在内核空间运行)和用户空间程序之间通信事件。

稍后我们将深入探讨BPF映射的细节,所以如果您不理解为什么我们使用BPF_MAP_TYPE_RINGBUF,或者SEC(".maps")是什么,请不要担心。

我们现在准备编写第一个程序并定义它将附加到的钩子:

 1SEC("kprobe/sys_execve")
 2int hello_execve(struct pt_regs *ctx) {
 3    u64 id = bpf_get_current_pid_tgid();
 4    pid_t pid = id >> 32;
 5    pid_t tid = (u32)id;
 6
 7    if (pid != tid)
 8        return 0;
 9
10    struct event *e;
11
12	e = bpf_ringbuf_reserve(&events, sizeof(struct event), 0);
13	if (!e) {
14		return 0;
15	}
16
17	e->pid = pid;
18	bpf_get_current_comm(&e->comm, 100);
19
20	bpf_ringbuf_submit(e, 0);
21
22	return 0;
23}

在这里,我们定义一个函数hello_execve,并使用kprobe钩子将其附加到sys_execve系统调用。kprobe是eBPF提供的许多钩子之一,用于跟踪内核函数。此钩子将在执行sys_execve系统调用之前触发我们的hello_execve函数。

在hello_execve函数内部,我们首先获取进程ID和线程ID,然后检查它们是否相同。如果它们不相同,那意味着我们在一个线程中,我们不想跟踪线程,所以我们通过返回零退出eBPF程序。

然后,我们在events映射中预留空间来存储事件数据,然后我们用进程ID和进程的命令名称填充事件数据。然后我们将事件提交到events映射。

到目前为止还算简单,对吗?

编写用户空间程序

在开始编写用户空间程序之前,让我先简要解释一下程序在用户空间需要做什么。我们需要一个用户空间程序来将eBPF程序加载到内核中,创建BPF映射,附加到BPF映射,然后从BPF映射中读取事件。

要执行这些操作,我们需要使用一个特定的系统调用。这个系统调用称为bpf(),用于执行几个eBPF相关操作,例如读取BPF映射的内容。

我们自己也可以从用户空间调用这个系统调用,但这意味着太多低级操作。谢天谢地,有一些库提供了对bpf()系统调用的高级接口。其中之一是 Cilium[4] 的 ebpf-go[5] 包,我们将在本例中使用它。

让我们深入研究一些Go代码。

 1//go:generate go run github.com/cilium/ebpf/cmd/bpf2go -type event ebpf hello_ebpf.c
 2
 3func main() {
 4	stopper := make(chan os.Signal, 1)
 5	signal.Notify(stopper, os.Interrupt, syscall.SIGTERM)
 6
 7	// Allow the current process to lock memory for eBPF resources.
 8	if err := rlimit.RemoveMemlock(); err != nil {
 9		log.Fatal(err)
10	}
11
12	objs := ebpfObjects{}
13	if err := loadEbpfObjects(&objs, nil); err != nil {
14		log.Fatalf("loading objects: %v", err)
15	}
16	defer objs.Close()
17
18	kp, err := link.Kprobe(kprobeFunc, objs.HelloExecve, nil)
19	if err != nil {
20		log.Fatalf("opening kprobe: %s", err)
21	}
22	defer kp.Close()
23
24	rd, err := ringbuf.NewReader(objs.Events)
25	if err != nil {
26		log.Fatalf("opening ringbuf reader: %s", err)
27	}
28	defer rd.Close()
29
30    ...

第一行是Go编译器指令go:generate。在这里,我们告诉Go编译器从github.com/cilium/ebpf/cmd/bpf2go包运行bpf2go工具,并从hello_ebpf.c文件生成一个Go文件。

生成的Go文件将包括eBPF程序的Go表示、我们在eBPF程序中定义的类型和结构等。然后我们将在Go代码中使用这些表示来将eBPF程序加载到内核中,并与BPF映射交互。

然后我们使用生成的类型加载eBPF程序(loadEbpfObjects)、附加到kprobe钩子(link.Kprobe)和从BPF映射读取事件(ringbuf.NewReader)。所有这些函数都使用生成的类型。

是时候与内核端交互了:

 1    ...
 2
 3	go func() {
 4		<-stopper
 5
 6		if err := rd.Close(); err != nil {
 7			log.Fatalf("closing ringbuf reader: %s", err)
 8		}
 9	}()
10
11	log.Println("Waiting for events..")
12
13	var event ebpfEvent
14	for {
15		record, err := rd.Read()
16		if err != nil {
17			if errors.Is(err, ringbuf.ErrClosed) {
18				log.Println("Received signal, exiting..")
19				return
20			}
21			log.Printf("reading from reader: %s", err)
22			continue
23		}
24
25		if err := binary.Read(bytes.NewBuffer(record.RawSample), binary.LittleEndian, &event); err != nil {
26			log.Printf("parsing ringbuf event: %s", err)
27			continue
28		}
29
30		procName := unix.ByteSliceToString(event.Comm[:])
31		log.Printf("pid: %d\tcomm: %s\n", event.Pid, procName)
32	}
33}

在这里,我们使用events.Reader从BPF映射中读取事件。每次有新事件时,我们都会打印出进程ID和命令名称。我们将无限期地运行这个循环,直到用户中断程序。

就是这样!我们编写了一个简单的eBPF程序来跟踪execve系统调用,并编写了一个用户空间程序来加载eBPF程序并从BPF映射中读取事件。

您可以在我的GitHub存储库中找到完整的代码示例。在下一节中,我们将深入探讨BPF映射以及如何使用它们在内核和用户空间之间传递数据。

我们开始一个goroutine来监听stopper通道,这个通道我们在前面的Go代码片段中定义。当我们收到中断信号时,这个通道将用于优雅地停止程序。

然后我们开始一个循环从BPF映射中读取事件。我们使用ringbuf.Reader类型来读取事件,然后我们使用binary.Read函数将事件数据解析到ebpfEvent类型中,这个类型是从eBPF程序生成的。

接着我们将进程ID和进程命令名称打印到标准输出。

运行程序

现在我们已经准备好运行程序了。首先,我们需要编译eBPF程序,然后运行用户空间程序。

1$ go generate
2Compiled /Users/sazak/workspace/gocode/src/github.com/ozansz/intro-ebpf-with-go/0x01-helloworld/ebpf_bpfel.o
3Stripped /Users/sazak/workspace/gocode/src/github.com/ozansz/intro-ebpf-with-go/0x01-helloworld/ebpf_bpfel.o
4Wrote /Users/sazak/workspace/gocode/src/github.com/ozansz/intro-ebpf-with-go/0x01-helloworld/ebpf_bpfel.go
5Compiled /Users/sazak/workspace/gocode/src/github.com/ozansz/intro-ebpf-with-go/0x01-helloworld/ebpf_bpfeb.o
6Stripped /Users/sazak/workspace/gocode/src/github.com/ozansz/intro-ebpf-with-go/0x01-helloworld/ebpf_bpfeb.o
7Wrote /Users/sazak/workspace/gocode/src/github.com/ozansz/intro-ebpf-with-go/0x01-helloworld/ebpf_bpfeb.go
8
9$ go build -o hello_ebpf

我们首先运行go generate命令来编译eBPF程序,然后运行go build命令来编译用户空间程序。

然后我们运行用户空间程序:

1sudo ./hello_ebpf
2hello_ebpf: 01:20:54 Waiting for events..

我正在 Lima[6] 中的一个虚拟机里运行这个程序,为什么不打开另一个shell看看会发生什么?

1limactl shell intro-ebpf
2
3$

同时在第一个shell中:

1hello_ebpf: 01:22:22 pid: 3360	comm: sshd
2hello_ebpf: 01:22:22 pid: 3360	comm: bash
3hello_ebpf: 01:22:22 pid: 3361	comm: bash
4hello_ebpf: 01:22:22 pid: 3362	comm: bash
5hello_ebpf: 01:22:22 pid: 3363	comm: bash
6hello_ebpf: 01:22:22 pid: 3366	comm: bash
7hello_ebpf: 01:22:22 pid: 3367	comm: lesspipe
8hello_ebpf: 01:22:22 pid: 3369	comm: lesspipe
9hello_ebpf: 01:22:22 pid: 3370	comm: bash

如预期,我们看到sshd进程正在启动,然后是bash进程,然后是lesspipe进程,等等。

这是一个简单的例子,说明我们如何使用eBPF来跟踪execve系统调用,然后在用户空间中从BPF映射读取事件。我们编写了一个相当简单但功能强大的程序,并且在不修改内核源代码或重启系统的情况下拦截了execve系统调用。

eBPF钩子和映射

那么,在前面的例子中实际发生了什么?我们使用kprobe钩子将eBPF程序附加到sys_execve系统调用上,以便在执行原始系统调用代码之前每次调用sys_execve系统调用时运行hello_execve函数。

eBPF是事件驱动的,这意味着它期望我们将eBPF程序附加到内核中特定的代码路径上。这些代码路径被称为"钩子",eBPF提供了几种类型的钩子。最常见的是:

kprobe,kretprobe: 跟踪内核函数
uprobe,uretprobe: 跟踪用户空间函数
tracepoint: 跟踪内核中预定义的跟踪点
xdp: 快速数据路径,用于过滤和重定向网络数据包
usdt: 用户静态定义跟踪,用于更高效地跟踪用户空间函数

钩子kprobe和uprobe用于在函数/系统调用执行之前调用附加的eBPF程序,而kretprobe和uretprobe用于在函数/系统调用执行之后调用附加的eBPF程序。

我们还使用了一个BPF映射来存储事件。BPF映射是用于存储和传递不同类型数据的数据结构。我们也用它们来进行状态管理。支持太多种类的BPF映射,我们为不同的目的使用不同类型的映射。一些最常见的BPF映射类型是:

BPF_MAP_TYPE_HASH: 哈希映射
BPF_MAP_TYPE_ARRAY: 数组
BPF_MAP_TYPE_RINGBUF: 环形缓冲区
BPF_MAP_TYPE_STACK: 栈
BPF_MAP_TYPE_QUEUE: 队列
BPF_MAP_TYPE_LRU_HASH: 最近最少使用哈希映射

其中一些映射类型也有每CPU变体,例如BPF_MAP_TYPE_PERCPU_HASH,它是一个哈希映射,每个CPU内核都有一个单独的哈希表。

更进一步:跟踪传入的IP数据包

让我们再进一步,编写一个更复杂的eBPF程序。这次我们将使用XDP钩子在网络接口将网络数据包发送到内核之后立即调用eBPF程序,甚至在内核处理数据包之前。

编写eBPF程序

我们将编写一个eBPF程序来统计按源IP地址和端口号计算的传入IP数据包数量,然后我们将在用户空间中读取BPF映射中的计数。我们将解析每个数据包的以太网、IP和TCP/UDP头,并将有效的TCP/UDP数据包的计数存储在BPF映射中。

首先,eBPF程序:

 1#include "vmlinux.h"
 2#include <bpf/bpf_helpers.h>
 3#include <bpf/bpf_endian.h>
 4
 5#define MAX_MAP_ENTRIES 100
 6
 7/* Define an LRU hash map for storing packet count by source IP and port */
 8struct {
 9	__uint(type, BPF_MAP_TYPE_LRU_HASH);
10	__uint(max_entries, MAX_MAP_ENTRIES);
11	__type(key, u64); // source IPv4 addresses and port tuple
12	__type(value, u32); // packet count
13} xdp_stats_map SEC(".maps");

与第一个示例一样,我们将包含vmlinux.h和BPF帮助程序头文件。我们还定义了一个映射xdp_stats_map,用于存储IP:ports和数据包计数信息。然后我们将在钩子函数中填充此映射,并在用户空间程序中读取其内容。

我所说的IP:ports基本上是一个u64值,其中打包了源IP、源端口和目标端口。IP地址(IPv4,特别是)为32位长,每个端口号为16位长,因此我们需要恰好64位来存储这三个 - 这就是我们在这里使用u64的原因。我们只处理入站(传入)数据包,因此不需要存储目标IP地址。

与上一个示例不同,我们现在使用BPF_MAP_TYPE_LRU_HASH作为映射类型。此类型的映射允许我们将(key, value)对作为具有LRU变体的哈希映射存储。

看看我们是如何定义映射的,我们明确设置了最大条目数,以及映射键和值的类型。对于键,我们使用64位无符号整数,对于值,我们使用32位无符号整数。

u32的最大值是2^32 - 1,对于本示例而言,这已经足够多的数据包了。

要了解IP地址和端口号,我们首先需要解析数据包并读取以太网、IP,然后是TCP/UDP头。

由于XDP位于网络接口卡之后,我们将以字节形式获得原始数据包数据,因此我们需要手动遍历字节数组并解组以太网、IP和TCP/UDP头。

希望我们在vmlinux.h头文件中有所有的头定义(struct ethhdr、struct iphdr、struct tcphdr和struct udphdr)。我们将使用这些结构体在一个单独的函数parse_ip_packet中提取IP地址和端口号信息:

 1#define ETH_P_IP		0x0800	/* Internet Protocol packet	*/
 2
 3#define PARSE_SKIP 			0
 4#define PARSED_TCP_PACKET	1
 5#define PARSED_UDP_PACKET	2
 6
 7static __always_inline int parse_ip_packet(struct xdp_md *ctx, u64 *ip_metadata) {
 8	void *data_end = (void *)(long)ctx->data_end;
 9	void *data     = (void *)(long)ctx->data;
10
11	// First, parse the ethernet header.
12	struct ethhdr *eth = data;
13	if ((void *)(eth + 1) > data_end) {
14		return PARSE_SKIP;
15	}
16
17	if (eth->h_proto != bpf_htons(ETH_P_IP)) {
18		// The protocol is not IPv4, so we can't parse an IPv4 source address.
19		return PARSE_SKIP;
20	}
21
22	// Then parse the IP header.
23	struct iphdr *ip = (void *)(eth + 1);
24	if ((void *)(ip + 1) > data_end) {
25		return PARSE_SKIP;
26	}
27
28	u16 src_port, dest_port;
29	int retval;
30
31	if (ip->protocol == IPPROTO_TCP) {
32		struct tcphdr *tcp = (void*)ip + sizeof(*ip);
33		if ((void*)(tcp+1) > data_end) {
34			return PARSE_SKIP;
35		}
36		src_port = bpf_ntohs(tcp->source);
37		dest_port = bpf_ntohs(tcp->dest);
38		retval = PARSED_TCP_PACKET;
39	} else if (ip->protocol == IPPROTO_UDP) {
40		struct udphdr *udp = (void*)ip + sizeof(*ip);
41		if ((void*)(udp+1) > data_end) {
42			return PARSE_SKIP;
43		}
44		src_port = bpf_ntohs(udp->source);
45		dest_port = bpf_ntohs(udp->dest);
46		retval = PARSED_UDP_PACKET;
47	} else {
48		// The protocol is not TCP or UDP, so we can't parse a source port.
49		return PARSE_SKIP;
50	}
51
52	// Return the (source IP, destination IP) tuple in network byte order.
53	// |<-- Source IP: 32 bits -->|<-- Source Port: 16 bits --><-- Dest Port: 16 bits -->|
54	*ip_metadata = ((u64)(ip->saddr) << 32) | ((u64)src_port << 16) | (u64)dest_port;
55	return retval;
56}

该函数:

检查数据包是否具有有效的以太网头、IP头和TCP或UDP头。这些检查是通过使用struct ethhdr的h_proto和struct iphdr的protocol完成的。每个头部都存储它所包装的内部数据包的协议。
从IP头中提取IP地址,从TCP/UDP头中提取端口号,并在64位无符号整数(u64)中形成一个IP:ports元组
返回一个代码,告诉调用者该数据包是TCP数据包、UDP数据包还是其他(PARSE_SKIP)

注意函数签名开头的__always_inline。这告诉编译器始终将此函数内联为静态代码,这样可以节省我们执行函数调用的开销。

现在是时候编写钩子函数并使用parse_ip_packet了:

 1SEC("xdp")
 2int xdp_prog_func(struct xdp_md *ctx) {
 3	u64 ip_meta;
 4	int retval = parse_ip_packet(ctx, &ip_meta);
 5	
 6	if (retval != PARSED_TCP_PACKET) {
 7		return XDP_PASS;
 8	}
 9
10	u32 *pkt_count = bpf_map_lookup_elem(&xdp_stats_map, &ip_meta);
11	if (!pkt_count) {
12		// No entry in the map for this IP tuple yet, so set the initial value to 1.
13		u32 init_pkt_count = 1;
14		bpf_map_update_elem(&xdp_stats_map, &ip_meta, &init_pkt_count, BPF_ANY);
15	} else {
16		// Entry already exists for this IP tuple,
17		// so increment it atomically.
18		__sync_fetch_and_add(pkt_count, 1);
19	}
20
21	return XDP_PASS;
22}

xdp_prog_func相当简单,因为我们已经在parse_ip_packet中编写了大部分程序逻辑。我们在这里做的是:

使用parse_ip_packet解析数据包
如果不是TCP或UDP数据包,则通过返回XDP_PASS跳过计数
使用bpf_map_lookup_elem帮助程序函数在BPF映射键中查找IP:ports元组
如果第一次看到IP:ports元组,则将值设置为1,否则将其加1。__sync_fetch_and_add是一个LLVM内置函数

最后,我们使用SEC("xdp")宏将此函数附加到XDP子系统。

编写用户空间程序

现在是时候深入研究Go代码了。

 1//go:generate go run github.com/cilium/ebpf/cmd/bpf2go ebpf xdp.c
 2
 3var (
 4    ifaceName = flag.String("iface", "", "network interface to attach XDP program to")
 5)
 6
 7func main() {
 8	log.SetPrefix("packet_count: ")
 9	log.SetFlags(log.Ltime | log.Lshortfile)
10    flag.Parse()
11
12	// Subscribe to signals for terminating the program.
13	stop := make(chan os.Signal, 1)
14	signal.Notify(stop, os.Interrupt, syscall.SIGTERM)
15
16	iface, err := net.InterfaceByName(*ifaceName)
17	if err != nil {
18		log.Fatalf("network iface lookup for %q: %s", *ifaceName, err)
19	}
20
21	// Load pre-compiled programs and maps into the kernel.
22	objs := ebpfObjects{}
23	if err := loadEbpfObjects(&objs, nil); err != nil {
24		log.Fatalf("loading objects: %v", err)
25	}
26	defer objs.Close()
27
28	// Attach the program.
29	l, err := link.AttachXDP(link.XDPOptions{
30		Program:   objs.XdpProgFunc,
31		Interface: iface.Index,
32	})
33	if err != nil {
34		log.Fatalf("could not attach XDP program: %s", err)
35	}
36	defer l.Close()
37
38	log.Printf("Attached XDP program to iface %q (index %d)", iface.Name, iface.Index)
39
40    ...

在这里,我们首先使用loadEbpfObjects函数加载生成的eBPF程序和映射。然后,我们使用link.AttachXDP函数将程序附加到指定的网络接口。与上一个示例一样,我们使用一个通道来监听中断信号并正常关闭程序。

接下来,我们将每秒读取一次映射内容并将数据包计数打印到标准输出:

 1    ...
 2
 3    ticker := time.NewTicker(time.Second)
 4	defer ticker.Stop()
 5	for {
 6		select {
 7		case <-stop:
 8			if err := objs.XdpStatsMap.Close(); err != nil {
 9				log.Fatalf("closing map reader: %s", err)
10			}
11			return
12		case <-ticker.C:
13			m, err := parsePacketCounts(objs.XdpStatsMap, excludeIPs)
14			if err != nil {
15				log.Printf("Error reading map: %s", err)
16				continue
17			}
18			log.Printf("Map contents:\n%s", m)
19			srv.Submit(m)
20		}
21	}
22}

我们将使用一个实用函数 parsePacketCounts 来读取映射内容并解析数据包计数。该函数将在循环中读取映射内容。

由于我们将从映射中获取原始字节,我们需要解析字节并将其转换为人类可读的格式。我们将定义一个新类型 PacketCounts 来存储解析后的映射内容。

 1type IPMetadata struct {
 2	SrcIP   netip.Addr
 3	SrcPort uint16
 4	DstPort uint16
 5}
 6
 7func (t *IPMetadata) UnmarshalBinary(data []byte) (err error) {
 8	if len(data) != 8 {
 9		return fmt.Errorf("invalid data length: %d", len(data))
10	}
11	if err = t.SrcIP.UnmarshalBinary(data[4:8]); err != nil {
12		return
13	}
14	t.SrcPort = uint16(data[3])<<8 | uint16(data[2])
15	t.DstPort = uint16(data[1])<<8 | uint16(data[0])
16	return nil
17}
18
19func (t IPMetadata) String() string {
20	return fmt.Sprintf("%s:%d => :%d", t.SrcIP, t.SrcPort, t.DstPort)
21}
22
23type PacketCounts map[string]int
24
25func (i PacketCounts) String() string {
26	var keys []string
27	for k := range i {
28		keys = append(keys, k)
29	}
30	sort.Strings(keys)
31
32	var sb strings.Builder
33	for _, k := range keys {
34		sb.WriteString(fmt.Sprintf("%s\t| %d\n", k, i[k]))
35	}
36
37	return sb.String()
38}

我们定义了一个新类型 IPMetadata 来存储 IP:ports 元组。我们还定义了一个 UnmarshalBinary 方法来解析原始字节并将其转换为人类可读的格式。我们还定义了一个 String 方法来以人类可读的格式打印 IP:ports 元组。

然后,我们定义了一个新类型 PacketCounts 来存储解析后的映射内容。我们还定义了一个 String 方法来以人类可读的格式打印映射内容。

最后,我们将使用 PacketCounts 类型来解析映射内容并打印数据包计数:

 1func parsePacketCounts(m *ebpf.Map, excludeIPs map[string]bool) (PacketCounts, error) {
 2	var (
 3		key    IPMetadata
 4		val    uint32
 5		counts = make(PacketCounts)
 6	)
 7	iter := m.Iterate()
 8	for iter.Next(&key, &val) {
 9		if _, ok := excludeIPs[key.SrcIP.String()]; ok {
10			continue
11		}
12		counts[key.String()] = int(val)
13	}
14	return counts, iter.Err()
15}

运行程序

我们首先需要编译 eBPF 程序,然后运行用户空间程序。

1$ go generate
2Compiled /Users/sazak/workspace/gocode/src/github.com/ozansz/intro-ebpf-with-go/0x03-packet-count/ebpf_bpfel.o
3Stripped /Users/sazak/workspace/gocode/src/github.com/ozansz/intro-ebpf-with-go/0x03-packet-count/ebpf_bpfel.o
4Wrote /Users/sazak/workspace/gocode/src/github.com/ozansz/intro-ebpf-with-go/0x03-packet-count/ebpf_bpfel.go
5Compiled /Users/sazak/workspace/gocode/src/github.com/ozansz/intro-ebpf-with-go/0x03-packet-count/ebpf_bpfeb.o
6Stripped /Users/sazak/workspace/gocode/src/github.com/ozansz/intro-ebpf-with-go/0x03-packet-count/ebpf_bpfeb.o
7Wrote /Users/sazak/workspace/gocode/src/github.com/ozansz/intro-ebpf-with-go/0x03-packet-count/ebpf_bpfeb.go
8
9$ go build -o packet_count

现在我们可以运行它:

1$ sudo ./packet_count --iface eth0
2packet_count: 22:11:10 main.go:107: Attached XDP program to iface "eth0" (index 2)
3packet_count: 22:11:10 main.go:132: Map contents:
4192.168.5.2:58597 => :22	| 51
5packet_count: 22:11:11 main.go:132: Map contents:
6192.168.5.2:58597 => :22	| 52
7packet_count: 22:11:11 main.go:132: Map contents:
8192.168.5.2:58597 => :22	| 53

来自 IP 地址 192.168.5.2 到端口 22 的数据包是 SSH 数据包,因为我在虚拟机内部运行这个程序,我正在通过 SSH 连接到它。

让我们在另一个终端中在虚拟机内运行 curl,看看会发生什么:

1$ curl https://www.google.com/

同时在第一个终端中:

 1packet_count: 22:14:07 main.go:132: Map contents:
 2172.217.22.36:443 => :38324	| 12
 3192.168.5.2:58597 => :22	| 551
 4packet_count: 22:14:08 main.go:132: Map contents:
 5172.217.22.36:443 => :38324	| 12
 6192.168.5.2:58597 => :22	| 552
 7packet_count: 22:14:08 main.go:132: Map contents:
 8172.217.22.36:443 => :38324	| 30
 9192.168.5.2:58597 => :22	| 570
10packet_count: 22:14:09 main.go:132: Map contents:
11172.217.22.36:443 => :38324	| 30
12192.168.5.2:58597 => :22	| 571

我们看到来自 IP 地址 172.217.22.36 到端口 38324 的数据包是来自 curl 命令的数据包。

结论

eBPF 在许多方面都非常强大,我认为在系统编程、可观测性或安全性方面投资时间学习它是一个不错的选择。在本文中,我们已经看到了 eBPF 是什么、它是如何工作的,以及我们如何开始使用 Go 来使用它。

我希望您喜欢这篇文章并学到了一些新东西。如果您有任何疑问,欢迎随时 ping[7] 我。

资源

系统性能,Brendan Gregg
学习 eBPF,Liz Rice
docs.kernel.org
ebpf.io
cilium.io
iovisor.org
brendangregg.com

参考链接

1. Go Konf Istanbul '24: https://sazak.io/talks/an-applied-introduction-to-ebpf-with-go-2024-02-17
2. ozansz/intro-ebpf-with-go: https://github.com/ozansz/intro-ebpf-with-go/tree/main/0x01-helloworld
3. BPF映射: https://docs.kernel.org/bpf/maps.html
4. Cilium: https://cilium.io/
5. ebpf-go: https://sazak.io/articles/github.com/cilium/ebpf
6. Lima: https://github.com/lima-vm/lima
7. ping: https://twitter.com/oznszk

http://mp.weixin.qq.com/s?__biz=MzU2MTgxODgwNA==&mid=2247488665&idx=1&sn=d8e051a4505c07d3eda144c4b93adfd0

DeepNoMind

你好，我是俞凡，在Motorola做过研发，现在在Mavenir做技术工作，对通信、网络、后端架构、云原生、DevOps、CICD、区块链、AI等技术始终保持着浓厚的兴趣，平时喜欢阅读、思考，相信持续学习、终身成长，欢迎一起交流学习。