netdev - Re: [PATCH bpf-next v9 07/11] bpf,x86: add fsession support for x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4BzZZSUkMbv=7DcBubGjnABHNnAZjT3-A5XKB-UW58a=6jg@mail.gmail.com>
Date: Wed, 14 Jan 2026 11:05:56 -0800
From: Andrii Nakryiko <andrii.nakryiko@...il.com>
To: Menglong Dong <menglong.dong@...ux.dev>
Cc: Menglong Dong <menglong8.dong@...il.com>, ast@...nel.org, andrii@...nel.org, 
	daniel@...earbox.net, martin.lau@...ux.dev, eddyz87@...il.com, 
	song@...nel.org, yonghong.song@...ux.dev, john.fastabend@...il.com, 
	kpsingh@...nel.org, sdf@...ichev.me, haoluo@...gle.com, jolsa@...nel.org, 
	davem@...emloft.net, dsahern@...nel.org, tglx@...utronix.de, mingo@...hat.com, 
	jiang.biao@...ux.dev, bp@...en8.de, dave.hansen@...ux.intel.com, 
	x86@...nel.org, hpa@...or.com, bpf@...r.kernel.org, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next v9 07/11] bpf,x86: add fsession support for x86_64

On Tue, Jan 13, 2026 at 7:27 PM Menglong Dong <menglong.dong@...ux.dev> wrote:
>
> On 2026/1/14 09:25 Andrii Nakryiko <andrii.nakryiko@...il.com> write:
> > On Sat, Jan 10, 2026 at 6:12 AM Menglong Dong <menglong8.dong@...il.com> wrote:
> > >
> > > Add BPF_TRACE_FSESSION supporting to x86_64, including:
> [...]
> > >
> > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > > index d94f7038c441..0671a434c00d 100644
> > > --- a/arch/x86/net/bpf_jit_comp.c
> > > +++ b/arch/x86/net/bpf_jit_comp.c
> > > @@ -3094,12 +3094,17 @@ static int emit_cond_near_jump(u8 **pprog, void *func, void *ip, u8 jmp_cond)
> > >  static int invoke_bpf(const struct btf_func_model *m, u8 **pprog,
> > >                       struct bpf_tramp_links *tl, int stack_size,
> > >                       int run_ctx_off, bool save_ret,
> > > -                     void *image, void *rw_image)
> > > +                     void *image, void *rw_image, u64 func_meta)
> > >  {
> > >         int i;
> > >         u8 *prog = *pprog;
> > >
> > >         for (i = 0; i < tl->nr_links; i++) {
> > > +               if (tl->links[i]->link.prog->call_session_cookie) {
> > > +                       /* 'stack_size + 8' is the offset of func_md in stack */
> >
> > not func_md, don't invent new names, "func_meta" (but it's also so
>
>
> Ah, it should be func_meta here, it's a typo.
>
>
> > backwards that you have stack offsets as positive... and it's not even
> > in verifier's stack slots, just bytes... very confusing to me)
>
>
> Do you mean the offset to emit_store_stack_imm64()? I'll convert it
> to negative after modify the emit_store_stack_imm64() as you suggested.
>

yes

>
> >
> > > +                       emit_store_stack_imm64(&prog, stack_size + 8, func_meta);
> > > +                       func_meta -= (1 << BPF_TRAMP_M_COOKIE);
> >
> > was this supposed to be BPF_TRAMP_M_IS_RETURN?... and why didn't AI catch this?
>
>
> It should be BPF_TRAMP_M_COOKIE here. I'm decreasing and
> compute the offset of the session cookie for the next bpf
> program.
>
>
> This part correspond to the 5th patch. It will be more clear if you
> combine it to the 5th patch. Seems that it's a little confusing
> here :/
>

It is confusing. And invoke_bpf is partly provided with opaque
func_meta, but also partly knows its structure and does extra
adjustments, I don't like it. I think it would be simpler to just pass
nr_args and cookies_offset and let invoke_bpf construct func_meta for
each program invocation, IMO.

>
> Maybe some comment is needed here.
>
>
> >
> > > +               }
> > >                 if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size,
> > >                                     run_ctx_off, save_ret, image, rw_image))
> > >                         return -EINVAL;
> > > @@ -3222,7 +3227,9 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
> > >         struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
> > >         struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
> > >         void *orig_call = func_addr;
> > > +       int cookie_off, cookie_cnt;
> > >         u8 **branches = NULL;
> > > +       u64 func_meta;
> > >         u8 *prog;
> > >         bool save_ret;
> > >
> > > @@ -3290,6 +3297,11 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
> > >
> > >         ip_off = stack_size;
> > >
> > > +       cookie_cnt = bpf_fsession_cookie_cnt(tlinks);
> > > +       /* room for session cookies */
> > > +       stack_size += cookie_cnt * 8;
> > > +       cookie_off = stack_size;
> > > +
> > >         stack_size += 8;
> > >         rbx_off = stack_size;
> > >
> > > @@ -3383,9 +3395,19 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
> > >                 }
> > >         }
> > >
> > > +       if (bpf_fsession_cnt(tlinks)) {
> > > +               /* clear all the session cookies' value */
> > > +               for (int i = 0; i < cookie_cnt; i++)
> > > +                       emit_store_stack_imm64(&prog, cookie_off - 8 * i, 0);
> > > +               /* clear the return value to make sure fentry always get 0 */
> > > +               emit_store_stack_imm64(&prog, 8, 0);
> > > +       }
> > > +       func_meta = nr_regs + (((cookie_off - regs_off) / 8) << BPF_TRAMP_M_COOKIE);
> >
> > func_meta conceptually is a collection of bit fields, so using +/-
> > feels weird, use | and &, more in line with working with bits?
>
>
> It's not only for bit fields. For nr_args and cookie offset, they are
> byte fields. Especially for cookie offset, arithmetic operation is performed
> too. So I think it make sense here, right?
>
>
> >
> > (also you defined that BPF_TRAMP_M_NR_ARGS but you are not using it
> > consistently...)
>
>
> I'm not sure if we should define it. As we use the least significant byte for
> the nr_args, the shift for it is always 0. If we use it in the inline, unnecessary
> instruction will be generated, which is the bit shift instruction.
>
>
> I defined it here for better code reading. Maybe we can do some comment
> in the inline of bpf_get_func_arg(), instead of defining such a unused
> macro?

I think I just wouldn't define NR_ARGS macro at all then, given inline
implementation implicitly encodes that knowledge anyways.

>
>
> Thanks!
> Menglong Dong
>
>
> >
> >
> >
> >
> > > +
> > >         if (fentry->nr_links) {
> > >                 if (invoke_bpf(m, &prog, fentry, regs_off, run_ctx_off,
> > > -                              flags & BPF_TRAMP_F_RET_FENTRY_RET, image, rw_image))
> > > +                              flags & BPF_TRAMP_F_RET_FENTRY_RET, image, rw_image,
> > > +                              func_meta))
> > >                         return -EINVAL;
> > >         }
> > >
> > > @@ -3445,9 +3467,14 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im
> > >                 }
> > >         }
> > >
> > > +       /* set the "is_return" flag for fsession */
> > > +       func_meta += (1 << BPF_TRAMP_M_IS_RETURN);
> > > +       if (bpf_fsession_cnt(tlinks))
> > > +               emit_store_stack_imm64(&prog, nregs_off, func_meta);
> > > +
> > >         if (fexit->nr_links) {
> > >                 if (invoke_bpf(m, &prog, fexit, regs_off, run_ctx_off,
> > > -                              false, image, rw_image)) {
> > > +                              false, image, rw_image, func_meta)) {
> > >                         ret = -EINVAL;
> > >                         goto cleanup;
> > >                 }
> > > --
> > > 2.52.0
> > >
> >
>
>
>
>
>