lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3577705.QJadu78ljV@7950hx>
Date: Mon, 03 Nov 2025 19:24:24 +0800
From: Menglong Dong <menglong.dong@...ux.dev>
To: Menglong Dong <menglong8.dong@...il.com>,
 Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Alexei Starovoitov <ast@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
 Daniel Borkmann <daniel@...earbox.net>,
 John Fastabend <john.fastabend@...il.com>,
 Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>,
 Eduard <eddyz87@...il.com>, Song Liu <song@...nel.org>,
 Yonghong Song <yonghong.song@...ux.dev>, KP Singh <kpsingh@...nel.org>,
 Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
 Matt Bobrowski <mattbobrowski@...gle.com>,
 Steven Rostedt <rostedt@...dmis.org>, Masami Hiramatsu <mhiramat@...nel.org>,
 Leon Hwang <leon.hwang@...ux.dev>, jiang.biao@...ux.dev,
 bpf <bpf@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
 linux-trace-kernel <linux-trace-kernel@...r.kernel.org>
Subject:
 Re: [PATCH bpf-next v3 4/7] bpf,x86: add tracing session supporting for
 x86_64

On 2025/11/1 01:57, Alexei Starovoitov wrote:
> On Thu, Oct 30, 2025 at 8:36 PM Menglong Dong <menglong8.dong@...il.com> wrote:
> >
> > On Fri, Oct 31, 2025 at 9:42 AM Alexei Starovoitov
> > <alexei.starovoitov@...il.com> wrote:
> > >
> > > On Sat, Oct 25, 2025 at 8:02 PM Menglong Dong <menglong8.dong@...il.com> wrote:
> > > >
> > > > Add BPF_TRACE_SESSION supporting to x86_64. invoke_bpf_session_entry and
> > > > invoke_bpf_session_exit is introduced for this purpose.
> > > >
> > > > In invoke_bpf_session_entry(), we will check if the return value of the
> > > > fentry is 0, and set the corresponding session flag if not. And in
> > > > invoke_bpf_session_exit(), we will check if the corresponding flag is
> > > > set. If set, the fexit will be skipped.
> > > >
> > > > As designed, the session flags and session cookie address is stored after
> > > > the return value, and the stack look like this:
> > > >
> > > >   cookie ptr    -> 8 bytes
> > > >   session flags -> 8 bytes
> > > >   return value  -> 8 bytes
> > > >   argN          -> 8 bytes
> > > >   ...
> > > >   arg1          -> 8 bytes
> > > >   nr_args       -> 8 bytes
> > > >   ...
> > > >   cookieN       -> 8 bytes
> > > >   cookie1       -> 8 bytes
> > > >
> > > > In the entry of the session, we will clear the return value, so the fentry
> > > > will always get 0 with ctx[nr_args] or bpf_get_func_ret().
> > > >
> > > > Before the execution of the BPF prog, the "cookie ptr" will be filled with
> > > > the corresponding cookie address, which is done in
> > > > invoke_bpf_session_entry() and invoke_bpf_session_exit().
> > >
> > > ...
> > >
> > > > +       if (session->nr_links) {
> > > > +               for (i = 0; i < session->nr_links; i++) {
> > > > +                       if (session->links[i]->link.prog->call_session_cookie)
> > > > +                               stack_size += 8;
> > > > +               }
> > > > +       }
> > > > +       cookies_off = stack_size;
> > >
> > > This is not great. It's all root and such,
> > > but if somebody attaches 64 progs that use session cookies
> > > then the trampoline will consume 64*8 of stack space just for
> > > these cookies. Plus more for args, cookie, ptr, session_flag, etc.
> >
> > The session cookie stuff does take a lot of stack memory.
> > For fprobe, it will store the cookie into its shadow stack, which
> > can free the stack.
> >
> > How about we remove the session cookie stuff? Therefore,
> > only 8-bytes(session flags) are used on the stack. And if we reuse
> > the nr_regs slot, no stack will be consumed. However, it will make
> > thing complex, which I don't think we should do.
> >
> > > Sigh.
> > > I understand that cookie from one session shouldn't interfere
> > > with another, but it's all getting quite complex
> > > especially when everything is in assembly.
> > > And this is just x86 JIT. Other JITs would need to copy
> > > this complex logic :(
> >
> > Without the session cookie, it will be much easier to implement
> > in another arch. And with the hepler of AI(such as cursor), it can
> > be done easily ;)
> 
> The reality is the opposite. We see plenty of AI generated garbage.
> Please stay human.

It's not wised to make the AI generate all the things for us, and
I find the AI is not good at planing and designing, but good at
implement a single thing, such as generating a instruction or
machine code from the C code. Of course I can generate it by
myself with clang, but it still save a lot efforts.

> 
> >
> > > At this point I'm not sure that "symmetry with kprobe_multi_session"
> > > is justified as a reason to add all that.
> > > We don't have a kprobe_session for individual kprobes after all.
> >
> > As for my case, the tracing session can make my code much
> > simpler, as I always use the fentry+fexit to hook a function. And
> > the fexit skipping according to the return value of fentry can also
> > achieve better performance.
> 
> I don't buy the argument that 'if (cond) goto skip_fexit_prog'
> in the generated trampoline is measurably faster than
> 'if (cond) return' inside the fexit program.

For now, there maybe no performance improvement. And I
were playing to implement such logic in the next step:

If there are no fexit and modify_return progs in the trampoline,
I'll check if "session_flags == ALL_SKIP" after the entry of the
fsession, and skip the origin call in such case and return directly
in such case. Therefore, the performance of fsession is almost
the same as fentry. According to our testing, the performance
of fexit is half of fentry. So the performance in this case has a
100% increasing.

This is a just rough thought, not sure if it works.

In fact, the performance improvement can be achieved more in the
bpf prog. For example, I want to trace the return value of skb_clone()
with the TCP port being 8080, I have to write following code:

SEC("fentry/skb_clone")
int clone_entry(struct sk_buff *skb)
{
    /* parse the skb and do some filter
     *    |
     * return 0 if not TCP + 80 port
     *    |
     * save the address of "skb" to the hash table "m_context"
     *    |
     * output the skb + timestamp
     */
    return 0;
}

SEC("fexit/skb_clone")
int clone_exit(struct sk_buff *skb, u64 ret)
{
    /* lookup if skb in the "m_context", return 0 if not
     *    |
     * output the skb + return value + timestamp
     *    |
     * delete the "skb" from the m_context
     */
    return 0;
}

I have to maintain a hash table "m_context" to check if
the exit of skb_clone() is what I want. It works, but it has
extra overhead in the hash table lookup and deleting. What's
more, it's not stable on some complex case.

The problem is that we don't has a way to pair the
fentry/fexit on the stack(do we?).

> 
> > AFAIT, the mast usage of session cookie in kprobe is passing the
> > function arguments to the exit. For tracing, we can get the args
> > in the fexit. So the session cookie in tracing is not as important as
> > in kprobe.
> 
> Since kprobe_multi was introduced, retsnoop and tetragon adopted
> it to do mass attach, and both use bpf_get_attach_cookie().
> While both don't use bpf_session_cookie().
> Searching things around I also didn't find a single real user
> of bpf_session_cookie() other than selftests/bpf and Jiri's slides :)
> 
> So, doing all this work in trampoline for bpf_session_cookie()
> doesn't seem warranted, but with that doing session in trampoline
> also doesn't look useful, since the only benefit vs a pair
> of fentry/fexit is skip of fexit, which can be done already.
> Plus complexity in all JITs... so, I say, let's shelve the whole thing for now.

Yeah, the session cookie is not useful in my case too, I'm OK to
skip it.

The pair of fentry/fexit have some use cases(my nettrace and
Leon's bpfsnoop, at least). Not sure if the reason above is sufficient,
please ignore the message if it is not :)

The way we implement the trampoline makes it arch related and
complex. I tried to rewrite it with C code, but failed. It's too difficult
and I suspect it's impossible :/

Thanks!
Menglong Dong

> 
> 





Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ