lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALz3k9jM0eqgw1=RKQPFpn8nk4MZRadEC3ge0kRutfvN2WVbwg@mail.gmail.com>
Date: Fri, 15 Mar 2024 16:17:12 +0800
From: 梦龙董 <dongmenglong.8@...edance.com>
To: Jiri Olsa <olsajiri@...il.com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>, Andrii Nakryiko <andrii@...nel.org>, 
	Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Martin KaFai Lau <martin.lau@...ux.dev>, Eddy Z <eddyz87@...il.com>, Song Liu <song@...nel.org>, 
	Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>, 
	KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>, 
	Alexander Gordeev <agordeev@...ux.ibm.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>, 
	Sven Schnelle <svens@...ux.ibm.com>, "David S. Miller" <davem@...emloft.net>, 
	David Ahern <dsahern@...nel.org>, Dave Hansen <dave.hansen@...ux.intel.com>, 
	X86 ML <x86@...nel.org>, Steven Rostedt <rostedt@...dmis.org>, 
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Quentin Monnet <quentin@...valent.com>, 
	bpf <bpf@...r.kernel.org>, 
	linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>, LKML <linux-kernel@...r.kernel.org>, 
	linux-riscv <linux-riscv@...ts.infradead.org>, linux-s390 <linux-s390@...r.kernel.org>, 
	Network Development <netdev@...r.kernel.org>, linux-trace-kernel@...r.kernel.org, 
	"open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@...r.kernel.org>, linux-stm32@...md-mailman.stormreply.com
Subject: Re: [External] Re: [PATCH bpf-next v2 1/9] bpf: tracing: add support
 to record and check the accessed args

On Thu, Mar 14, 2024 at 2:29 PM Jiri Olsa <olsajiri@...il.com> wrote:
>
> On Wed, Mar 13, 2024 at 05:25:35PM -0700, Alexei Starovoitov wrote:
> > On Tue, Mar 12, 2024 at 6:53 PM 梦龙董 <dongmenglong.8@...edance.com> wrote:
> > >
> > > On Wed, Mar 13, 2024 at 12:42 AM Alexei Starovoitov
> > > <alexei.starovoitov@...il.com> wrote:
> > > >
> > > > On Mon, Mar 11, 2024 at 7:42 PM 梦龙董 <dongmenglong.8@...edance.com> wrote:
> > > > >
> > > [......]
> > > >
> > > > I see.
> > > > I thought you're sharing the trampoline across attachments.
> > > > (since bpf prog is the same).
> > >
> > > That seems to be a good idea, which I hadn't thought before.
> > >
> > > > But above approach cannot possibly work with a shared trampoline.
> > > > You need to create individual trampoline for all attachment
> > > > and point them to single bpf prog.
> > > >
> > > > tbh I'm less excited about this feature now, since sharing
> > > > the prog across different attachments is nice, but it won't scale
> > > > to thousands of attachments.
> > > > I assumed that there will be a single trampoline with max(argno)
> > > > across attachments and attach/detach will scale to thousands.
> > > >
> > > > With individual trampoline this will work for up to a hundred
> > > > attachments max.
> > >
> > > What does "a hundred attachments max" means? Can't I
> > > trace thousands of kernel functions with a bpf program of
> > > tracing multi-link?
> >
> > I mean what time does it take to attach one program
> > to 100 fentry-s ?
> > What is the time for 1k and for 10k ?
> >
> > The kprobe multi test attaches to pretty much all funcs in
> > /sys/kernel/tracing/available_filter_functions
> > and it's fast enough to run in test_progs on every commit in bpf CI.
> > See get_syms() in prog_tests/kprobe_multi_test.c
> >
> > Can this new multi fentry do that?
> > and at what speed?
> > The answer will decide how applicable this api is going to be.
> > Generating different trampolines for every attach point
> > is an approach as well. Pls benchmark it too.
> >
> > > >
> > > > Let's step back.
> > > > What is the exact use case you're trying to solve?
> > > > Not an artificial one as selftest in patch 9, but the real use case?
> > >
> > > I have a tool, which is used to diagnose network problems,
> > > and its name is "nettrace". It will trace many kernel functions, whose
> > > function args contain "skb", like this:
> > >
> > > ./nettrace -p icmp
> > > begin trace...
> > > ***************** ffff889be8fbd500,ffff889be8fbcd00 ***************
> > > [1272349.614564] [dev_gro_receive     ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614579] [__netif_receive_skb_core] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614585] [ip_rcv              ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614592] [ip_rcv_core         ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614599] [skb_clone           ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614616] [nf_hook_slow        ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614629] [nft_do_chain        ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614635] [ip_rcv_finish       ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614643] [ip_route_input_slow ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614647] [fib_validate_source ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614652] [ip_local_deliver    ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614658] [nf_hook_slow        ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614663] [ip_local_deliver_finish] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614666] [icmp_rcv            ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614671] [icmp_echo           ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614675] [icmp_reply          ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614715] [consume_skb         ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614722] [packet_rcv          ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > > [1272349.614725] [consume_skb         ] ICMP: 169.254.128.15 ->
> > > 172.27.0.6 ping request, seq: 48220
> > >
> > > For now, I have to create a bpf program for every kernel
> > > function that I want to trace, which is up to 200.
> > >
> > > With this multi-link, I only need to create 5 bpf program,
> > > like this:
> > >
> > > int BPF_PROG(trace_skb_1, struct *skb);
> > > int BPF_PROG(trace_skb_2, u64 arg0, struct *skb);
> > > int BPF_PROG(trace_skb_3, u64 arg0, u64 arg1, struct *skb);
> > > int BPF_PROG(trace_skb_4, u64 arg0, u64 arg1, u64 arg2, struct *skb);
> > > int BPF_PROG(trace_skb_5, u64 arg0, u64 arg1, u64 arg2, u64 arg3, struct *skb);
> > >
> > > Then, I can attach trace_skb_1 to all the kernel functions that
> > > I want to trace and whose first arg is skb; attach trace_skb_2 to kernel
> > > functions whose 2nd arg is skb, etc.
> > >
> > > Or, I can create only one bpf program and store the index
> > > of skb to the attachment cookie, and attach this program to all
> > > the kernel functions that I want to trace.
> > >
> > > This is my use case. With the multi-link, now I only have
> > > 1 bpf program, 1 bpf link, 200 trampolines, instead of 200
> > > bpf programs, 200 bpf link and 200 trampolines.
> >
> > I see. The use case makes sense to me.
> > Andrii's retsnoop is used to do similar thing before kprobe multi was
> > introduced.
> >
> > > The shared trampoline you mentioned seems to be a
> > > wonderful idea, which can make the 200 trampolines
> > > to one. Let me have a look, we create a trampoline and
> > > record the max args count of all the target functions, let's
> > > mark it as arg_count.
> > >
> > > During generating the trampoline, we assume that the
> > > function args count is arg_count. During attaching, we
> > > check the consistency of all the target functions, just like
> > > what we do now.
> >
> > For one trampoline to handle all attach points we might
> > need some arch support, but we can start simple.
> > Make btf_func_model with MAX_BPF_FUNC_REG_ARGS
> > by calling btf_distill_func_proto() with func==NULL.
> > And use that to build a trampoline.
> >
> > The challenge is how to use minimal number of trampolines
> > when bpf_progA is attached for func1, func2, func3
> > and bpf_progB is attached to func3, func4, func5.
> > We'd still need 3 trampolines:
> > for func[12] to call bpf_progA,
> > for func3 to call bpf_progA and bpf_progB,
> > for func[45] to call bpf_progB.
> >
> > Jiri was trying to solve it in the past. His slides from LPC:
> > https://lpc.events/event/16/contributions/1350/attachments/1033/1983/plumbers.pdf
> >
> > Pls study them and his prior patchsets to avoid stepping on the same rakes.
>
> yep, I refrained from commenting not to take you down the same path
> I did, but if you insist.. ;-)
>
> I managed to forgot almost all of it, but the IIRC the main pain point
> was that at some point I had to split existing trampoline which caused
> the whole trampolines management and error paths to become a mess
>
> I tried to explain things in [1] changelog and the latest patchset is in [0]
>
> feel free to use/take anything, but I advice strongly against it ;-)
> please let me know if I can help

I have to say that I have not gone far enough to encounter
this problem, and I didn't dig enough to be aware of the
complexity.

I suspect that I can't overcome this challenge. The only thing that
I thought when I hear about the "shared trampoline" is to fallback
and not use the shared trampoline for the kernel functions who
already have a trampoline.

Anyway, let's have a try on it, based on your research.

Thanks!
Menglong Dong

>
> jirka
>
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/log/?h=bpf/batch
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git/commit/?h=bpf/batch&id=52a1d4acdf55df41e99ca2cea51865e6821036ce

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ