对内核 skb 进行 trace 是一种时髦,使用 eBPF 进行抓包是另一种时髦。本文介绍 vista 是如何赶另一种时髦的。
了解到 eBPF 能够实现抓包后,心痒不已,而且 vista 已支持对 XDP/tc-bpf 进行 trace 的能力,于是便给 vista 支持了对 XDP/tc-bpf 抓包的能力。
抓包效果
$ sudo ./vista --filter-trace-xdp --filter-trace-tc --output-meta --output-tuple --output-limit-lines 4 --pcap-file vista.pcapng icmp
2024/05/25 13:08:37 Tracing tc progs..
2024/05/25 13:08:37 Tracing xdp progs..
2024/05/25 13:08:37 Listening for events..
SKB/SK CPU PROCESS FUNC
0xffff990282314000 2 [<empty>(0)] dummy(xdp) netns=4026531840 mark=0x0 iface=2(ens33) proto=0x0000 mtu=1500 len=98 pkt_type=HOST 192.168.241.1->192.168.241.133(icmp request id=23089 seq=0)
Saving this packet to vista.pcapng..
0xffff990282314000 2 [<empty>(0)] dummy(xdp) netns=4026531840 mark=0x0 iface=2(ens33) proto=0x0000 mtu=1500 len=98 pkt_type=HOST 192.168.241.1->192.168.241.133(icmp request id=23089 seq=0)
Saving this packet to vista.pcapng..
0xffff990282314000 2 [<empty>(0)] dummy(tc) netns=4026531840 mark=0x0 iface=2(ens33) proto=0x0800 mtu=1500 len=98 pkt_type=HOST 192.168.241.1->192.168.241.133(icmp request id=23089 seq=0)
Saving this packet to vista.pcapng..
0xffff990282314000 2 [<empty>(0)] dummy(tc) netns=4026531840 mark=0x0 iface=2(ens33) proto=0x0800 mtu=1500 len=98 pkt_type=HOST 192.168.241.1->192.168.241.133(icmp request id=23089 seq=0)
Saving this packet to vista.pcapng..
2024/05/25 13:08:39 Printed 4 events, exiting program..
vista.pcapng
文件内容如下:
从上图中,可以看到 vista 抓到了四个 ICMP 请求包;其中展示的第二个包包含有 XDP prog 的返回值 XDP_PASS
。
对比 tcpdump 的 pcapng 文件,可以看到 vista 的 pcapng 文件中包含了更多的元数据信息:
Packet comments
:包含 prog 返回值、bpf prog 是 XDP 还是 tc、网口名称、trace 方式是fentry
还是fexit
。Interface queue
:收包的 queue ID。Verdict
:XDP/tc prog 的返回值。
eBPF 抓包
使用 eBPF 进行抓包,基本依赖于 bpf_xdp_outpu()
和 bpf_skb_output()
这 2 个 helper 函数。bpf_xdp_output()
用于 XDP prog,bpf_skb_output()
用于 tc prog。
bpf_xdp_output()[1] since 5.7 kernel. bpf_skb_output()[2] since 5.5 kernel.
bpf_xdp_output()
的用法如下:
// https://github.com/Asphaltt/vista/blob/a913db0d623202a0d5e94522c7d604d0c3197250/bpf/vista.c#L733
static __always_inline void
output_xdp_pcap_event(struct xdp_buff *xdp, struct event_t *event, u32 len, int action, bool is_fexit) {
set_xdp_pcap_meta(xdp, &event->pcap, len, action, is_fexit);
u64 flags = (((u64) event->pcap.cap_len) << 32) | BPF_F_CURRENT_CPU;
bpf_xdp_output(xdp, &pcap_events, flags, event, __sizeof_pcap_event);
}
bpf_skb_output()
的用法如下:
// https://github.com/Asphaltt/vista/blob/a913db0d623202a0d5e94522c7d604d0c3197250/bpf/vista.c#L573
static __always_inline void
output_skb_pcap_event(struct sk_buff *skb, struct event_t *event, int action, bool is_fexit) {
u64 flags;
set_skb_pcap_meta(skb, &event->pcap, action, is_fexit);
flags = (((u64) event->pcap.cap_len) << 32) | BPF_F_CURRENT_CPU;
bpf_skb_output(skb, &pcap_events, flags, event, __sizeof_pcap_event);
}
在用户态程序里,网络包数据在 perf event 里的后半部分,即读取到 event 数据后,就能读取到网络包数据。
+-----------------+-----------------+
| event data | packet data |
+-----------------+-----------------+
所以,在处理完 event 数据后,就能读取到网络包数据,然后写入 pcap 文件。
使用 gopacket 保存网络包
据了解,gopacket[3] 并不支持在 WritePacket()
时写入更多 pcapng 的元数据信息,即使 google/gopacket[4] 亦是如此。
因此,需要自己实现 pcapng 的元数据信息写入。
-// WritePacket writes out packet with the given data and capture info. The given InterfaceIndex must already be added to the file. InterfaceIndex 0 is automatically added by the NewWriter* methods.
-func (w *NgWriter) WritePacket(ci gopacket.CaptureInfo, data []byte) error {
+// WritePacket writes out packet with the given data, capture info and some
+// options. The given InterfaceIndex must already be added to the file.
+// InterfaceIndex 0 is automatically added by the NewWriter* methods. The
+// additional options are written with the packet. But the additional option can
+// not be empty, or it will panic.
+func (w *NgWriter) WritePacket(ci gopacket.CaptureInfo, data []byte, options ...NgOption) error {
if ci.InterfaceIndex >= int(w.intf) || ci.InterfaceIndex < 0 {
return fmt.Errorf("Can't send statistics for non existent interface %d; have only %d interfaces", ci.InterfaceIndex, w.intf)
}
@@ -367,6 +371,7 @@ func (w *NgWriter) WritePacket(ci gopacket.CaptureInfo, data []byte) error {
length := uint32(len(data)) + 32
padding := (4 - length&3) & 3
length += padding
+ length += prepareNgOptions(options)
ts := ci.Timestamp.UnixNano()
@@ -387,7 +392,15 @@ func (w *NgWriter) WritePacket(ci gopacket.CaptureInfo, data []byte) error {
}
binary.LittleEndian.PutUint32(w.buf[:4], 0)
- _, err := w.w.Write(w.buf[4-padding : 8]) // padding + length
+ if _, err := w.w.Write(w.buf[4-padding : 4]); err != nil { // padding
+ return err
+ }
+
+ if err := w.writeOptions(options); err != nil {
+ return err
+ }
+
+ _, err := w.w.Write(w.buf[4:8]) // length
return err
}
此 diff 给 NgWriter
的 WritePacket()
方法增加了 options
参数,用于写入 pcapng 的元数据信息。
与此同时,新增几个 NewOptionXXX()
函数,用于生成 pcapng 的元数据信息。
更多细节,请看 commit Write packet with options for NgWriter[5]。
最终,在 vista 里,生成 options
并写入 pcapng 文件里:
// https://github.com/Asphaltt/vista/blob/main/internal/vista/output_pcap.go
func (p *pcapWriter) meta2options(ev *Event, meta *PcapMeta, iface string) []pcapgo.NgOption {
var opts []pcapgo.NgOption
info := map[string]string{}
isFexit := meta.IsFexit == 1
if isFexit {
info["tracing"] = "fexit"
} else {
info["tracing"] = "fentry"
}
isXdp := ev.Type == eventTypeTracingXdp
if isXdp {
info["bpf"] = "xdp"
} else {
info["bpf"] = "tc"
}
if isFexit {
action := meta.Action
if isXdp {
info["action"] = xdpAction(action).Action()
} else {
info["action"] = tcAction(action).Action()
}
}
if iface != "" {
info["iface"] = iface
}
data, _ := json.Marshal(info)
comment := string(data)
opts = append(opts, pcapgo.NewOptionComment(comment))
if isFexit {
// Ref:
// https://www.ietf.org/archive/id/draft-tuexen-opsawg-pcapng-03.html#section-4.3-19.2.1
//
// The verdict type can be: Hardware (type octet = 0, size = variable),
// Linux_eBPF_TC (type octet = 1, size = 8 (64-bit unsigned integer),
// value = TC_ACT_* as defined in the Linux pck_cls.h include),
// Linux_eBPF_XDP (type octet = 2, size = 8 (64-bit unsigned integer),
// value = xdp_action as defined in the Linux pbf.h include).
verdict := [9]byte{}
if isXdp {
verdict[0] = 2
} else {
verdict[0] = 1
}
binary.NativeEndian.PutUint64(verdict[1:], uint64(meta.Action))
opts = append(opts, pcapgo.NewOptionEnhancedPacketVerdict(verdict[:]))
}
opts = append(opts, pcapgo.NewOptionEnhancedPacketQueueID(meta.RxQueue))
return opts
}
func (p *pcapWriter) writePacket(ev OutputEvent, iface string) error {
meta := ev.Event.Pcap()
info := gopacket.CaptureInfo{
Timestamp: time.Now(),
CaptureLength: int(meta.CapLen),
Length: int(ev.Event.Meta.Len),
}
var err error
if p.ngw != nil {
opts := p.meta2options(ev.Event, meta, iface)
err = p.ngw.WritePacket(info, ev.Packet, opts...)
}
// ...
return nil
}
其实,支持写入 comment 元数据信息的功能后,能新增更多的元数据信息,毕竟 comment 的内容是自定义的。
总结
vista 支持对 XDP/tc-bpf 进行抓包,这是一种时髦的做法。使用 eBPF 进行抓包,基本依赖于 bpf_xdp_output()
和 bpf_skb_output()
这 2 个 helper 函数。而使用 gopacket 保存网络包时,需要增强其写入 pcapng 的元数据信息的能力。
更多 XDP 资料,请加入「eBPF Talk」知识星球来学习《XDP 进阶手册》吧。
bpf_xdp_output(): https://github.com/torvalds/linux/commit/d831ee84bfc9173eecf30dbbc2553ae81b996c60
[2]bpf_skb_output(): https://github.com/torvalds/linux/commit/a7658e1a4164ce2b9eb4a11aadbba38586e93bd6
[3]gopacket: https://github.com/gopacket/gopacket
[4]google/gopacket: https://github.com/google/gopacket
[5]Write packet with options for NgWriter: https://github.com/Asphaltt/gopacket/commit/7b4421d025150a7fd41a93c15a3db5f9686f4c05