linux-kernel - Re: [PATCH bpf-next] bpf,x86: do RSB balance for trampoline

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5053516.31r3eYUQgx@7950hx>
Date: Thu, 06 Nov 2025 09:40:15 +0800
From: Menglong Dong <menglong.dong@...ux.dev>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
 Menglong Dong <menglong8.dong@...il.com>,
 Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
 Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>,
 Eduard <eddyz87@...il.com>, Song Liu <song@...nel.org>,
 Yonghong Song <yonghong.song@...ux.dev>,
 John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
 Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
 Jiri Olsa <jolsa@...nel.org>, "David S. Miller" <davem@...emloft.net>,
 David Ahern <dsahern@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
 Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
 Dave Hansen <dave.hansen@...ux.intel.com>, X86 ML <x86@...nel.org>,
 "H. Peter Anvin" <hpa@...or.com>, jiang.biao@...ux.dev,
 bpf <bpf@...r.kernel.org>, Network Development <netdev@...r.kernel.org>,
 LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH bpf-next] bpf,x86: do RSB balance for trampoline

On 2025/11/6 07:31, Alexei Starovoitov wrote:
> On Tue, Nov 4, 2025 at 11:47 PM Menglong Dong <menglong.dong@...ux.dev> wrote:
> >
> > On 2025/11/5 15:13, Menglong Dong wrote:
> > > On 2025/11/5 10:12, Alexei Starovoitov wrote:
> > > > On Tue, Nov 4, 2025 at 5:30 PM Menglong Dong <menglong.dong@...ux.dev> wrote:
> > > > >
> > > > > On 2025/11/5 02:56, Alexei Starovoitov wrote:
> > > > > > On Tue, Nov 4, 2025 at 2:49 AM Menglong Dong <menglong8.dong@...il.com> wrote:
> > > > > > >
> > > > > > > In origin call case, we skip the "rip" directly before we return, which
> > > > > > > break the RSB, as we have twice "call", but only once "ret".
> > > > > >
> > > > > > RSB meaning return stack buffer?
> > > > > >
> > > > > > and by "breaks RSB" you mean it makes the cpu less efficient?
> > > > >
> > > > > Yeah, I mean it makes the cpu less efficient. The RSB is used
> > > > > for the branch predicting, and it will push the "rip" to its hardware
> > > > > stack on "call", and pop it from the stack on "ret". In the origin
> > > > > call case, there are twice "call" but once "ret", will break its
> > > > > balance.
> > > >
> > > > Yes. I'm aware, but your "mov [rbp + 8], rax" screws it up as well,
> > > > since RSB has to be updated/invalidated by this store.
> > > > The behavior depends on the microarchitecture, of course.
> > > > I think:
> > > > add rsp, 8
> > > > ret
> > > > will only screw up the return prediction, but won't invalidate RSB.
> > > >
> > > > > Similar things happen in "return_to_handler" in ftrace_64.S,
> > > > > which has once "call", but twice "ret". And it pretend a "call"
> > > > > to make it balance.
> > > >
> > > > This makes more sense to me. Let's try that approach instead
> > > > of messing with the return address on stack?
> > >
> > > The way here is similar to the "return_to_handler". For the ftrace,
> > > the origin stack before the "ret" of the traced function is:
> > >
> > >     POS:
> > >     rip   ---> return_to_handler
> > >
> > > And the exit of the traced function will jump to return_to_handler.
> > > In return_to_handler, it will query the real "rip" of the traced function
> > > and the it call a internal function:
> > >
> > >     call .Ldo_rop
> > >
> > > And the stack now is:
> > >
> > >     POS:
> > >     rip   ----> the address after "call .Ldo_rop", which is a "int3"
> > >
> > > in the .Ldo_rop, it will modify the rip to the real rip to make
> > > it like this:
> > >
> > >     POS:
> > >     rip   ---> real rip
> > >
> > > And it return. Take the target function "foo" for example, the logic
> > > of it is:
> > >
> > >     call foo -> call ftrace_caller -> return ftrace_caller ->
> > >     return return_to_handler -> call Ldo_rop -> return foo
> > >
> > > As you can see, the call and return address for ".Ldo_rop" is
> > > also messed up. So I think it works here too. Compared with
> > > a messed "return address", a missed return maybe have
> > > better influence?
> > >
> > > And the whole logic for us is:
> > >
> > >     call foo -> call trampoline -> call origin ->
> > >     return origin -> return POS -> return foo
> >
> > The "return POS" will miss the RSB, but the later return
> > will hit it.
> >
> > The origin logic is:
> >
> >      call foo -> call trampoline -> call origin ->
> >      return origin -> return foo
> >
> > The "return foo" and all the later return will miss the RBS.
> >
> > Hmm......Not sure if I understand it correctly.
> 
> Here another idea...
> hack tr->func.ftrace_managed = false temporarily
> and use BPF_MOD_JUMP in bpf_arch_text_poke()
> when installing trampoline with fexit progs.
> and also do:
> @@ -3437,10 +3437,6 @@ static int __arch_prepare_bpf_trampoline(struct
> bpf_tramp_image *im, void *rw_im
> 
>         emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, -rbx_off);
>         EMIT1(0xC9); /* leave */
> -       if (flags & BPF_TRAMP_F_SKIP_FRAME) {
> -               /* skip our return address and return to parent */
> -               EMIT4(0x48, 0x83, 0xC4, 8); /* add rsp, 8 */
> -       }
>         emit_return(&prog, image + (prog - (u8 *)rw_image));
> 
> Then RSB is perfectly matched without messing up the stack
> and/or extra calls.
> If it works and performance is good the next step is to
> teach ftrace to emit jmp or call in *_ftrace_direct()

Good idea. I saw the "return_to_handler" used "JMP_NOSPEC", and
the jmp is converted to the "fake call" to be nice to IBT in this commit:

e52fc2cf3f66 ("x86/ibt,ftrace: Make function-graph play nice")

It's not indirect branch in our case, but let me do more testing to
see if there are any unexpected effect if we use "jmp" here.

Thanks!
Menglong Dong

>