linux-kernel - Re: [PATCH bpf-next 1/4] selftests/bpf: Emit nop,nop5 instructions for x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aSa6d8bm7Jm6Ojn7@krava>
Date: Wed, 26 Nov 2025 09:29:43 +0100
From: Jiri Olsa <olsajiri@...il.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Andrii Nakryiko <andrii@...nel.org>, bpf@...r.kernel.org,
	linux-kernel@...r.kernel.org, Song Liu <songliubraving@...com>,
	Yonghong Song <yhs@...com>,
	John Fastabend <john.fastabend@...il.com>
Subject: Re: [PATCH bpf-next 1/4] selftests/bpf: Emit nop,nop5 instructions
 for x86_64 usdt probe

On Mon, Nov 24, 2025 at 09:29:01AM -0800, Andrii Nakryiko wrote:
> On Mon, Nov 17, 2025 at 12:36 AM Jiri Olsa <jolsa@...nel.org> wrote:
> >
> > We can currently optimize uprobes on top of nop5 instructions,
> > so application can define USDT_NOP to nop5 and use USDT macro
> > to define optimized usdt probes.
> 
> Thanks for working on this and sorry for the delay, I've been
> travelling last week.
> 
> >
> > This works fine on new kernels, but could have performance penalty
> > on older kernels, that do not have the support to optimize and to
> > emulate nop5 instruction.
> >
> >   execution of the usdt probe on top of nop:
> >   - nop -> trigger usdt -> emulate nop -> continue
> >
> >   execution of the usdt probe on top of nop5:
> >   - nop5 -> trigger usdt -> single step nop5 -> continue
> >
> > Note the 'single step nop5' as the source of performance regression.
> 
> nit: I get what you are saying, but I don't think the above
> explanation is actually as clear as it could be. Try to simplify the
> reasoning maybe by saying that until Linux vX.Y kerne's uprobe
> implementation treated nop5 as an instruction that needs to be
> single-stepped. Newer kernels, on the other hand, can handle nop5
> very-very fast (when uprobe is installed on top of them). Which
> creates a dilemma where we want nop5 on new kernels, nop1 on old ones,
> but we can't know upfront which kernel we'll run on. And thus the
> whole patch set that's trying to have the cake and eat it too ;)

ok, will try ;-)

> 
> >
> > To workaround that we change the USDT macro to emit nop,nop5 for
> > the probe (instead of default nop) and make record of that in
> > USDT record (more on that below).
> >
> > This can be detected by application (libbpf) and it can place the
> > uprobe either on nop or nop5 based on the optimization support in
> > the kernel.
> >
> > We make record of using the nop,nop5 instructions in the USDT ELF
> > note data.
> >
> > Current elf note format is as follows:
> >
> >   namesz (4B) | descsz (4B) | type (4B) | name | desc
> >
> > And current usdt record (with "stapsdt" name) placed in the note's
> > desc data look like:
> >
> >   loc_addr  | 8 bytes
> >   base_addr | 8 bytes
> >   sema_addr | 8 bytes
> >   provider  | zero terminated string
> >   name      | zero terminated string
> >   args      | zero terminated string
> >
> > None of the tested parsers (bpftrace-bcc, libbpf) checked that the args
> > zero terminated byte is the actual end of the 'desc' data. As Andrii
> > suggested we could use this and place extra zero byte right there as an
> > indication for the parser we use the nop,nop5 instructions.
> >
> > It's bit tricky, but the other way would be to introduce new elf note type
> > or note name and change all existing parsers to recognize it. With the change
> > above the existing parsers would still recognize such usdt probes.
> 
> ... and use safer (performance-wise) nop1 as uprobe target.
> 
> We can treat this extra zero as a backwards-compatible extension of
> provider+name+args section. If we ever need to have some extra flags
> or extra information (e.g., argument names or whatnot), we can treat
> this as either a fourth string or think about this as a single-byte
> length prefix for a fixed binary extra information that should follow
> (right now it's zero, so forward-compatible). For now just extra zero
> is the least amount of work but good enough to solve the problem,
> while being extendable for the future.

ok, makes sense

> 
> >
> > Note we do not emit this extra byte if app defined its own nop through
> > USDT_NOP macro.
> >
> > Suggested-by: Andrii Nakryiko <andrii@...nel.org>
> > Signed-off-by: Jiri Olsa <jolsa@...nel.org>
> > ---
> >  tools/testing/selftests/bpf/usdt.h | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> >
> > diff --git a/tools/testing/selftests/bpf/usdt.h b/tools/testing/selftests/bpf/usdt.h
> > index 549d1f774810..57fa2902136c 100644
> > --- a/tools/testing/selftests/bpf/usdt.h
> > +++ b/tools/testing/selftests/bpf/usdt.h
> > @@ -312,9 +312,16 @@ struct usdt_sema { volatile unsigned short active; };
> >  #ifndef USDT_NOP
> >  #if defined(__ia64__) || defined(__s390__) || defined(__s390x__)
> >  #define USDT_NOP                       nop 0
> > +#elif defined(__x86_64__)
> > +#define USDT_NOP                       .byte 0x90, 0x0f, 0x1f, 0x44, 0x00, 0x0 /* nop, nop5 */
> >  #else
> >  #define USDT_NOP                       nop
> >  #endif
> > +#else
> > +/*
> > + * User define its own nop instruction, do not emit extra note data.
> > + */
> > +#define __usdt_asm_extra
> 
> I'd guard this with ifndef, just in case user do want custom USDT_NOP
> while emitting that extra zero (e.g., if they have nop1 + nop5 + some
> extra they need for logging or whatever).

ok

> 
> >  #endif /* USDT_NOP */
> >
> >  /*
> > @@ -403,6 +410,15 @@ struct usdt_sema { volatile unsigned short active; };
> >         __asm__ __volatile__ ("" :: "m" (sema));
> >  #endif
> >
> > +#ifndef __usdt_asm_extra
> > +#ifdef __x86_64__
> > +#define __usdt_asm_extra                                                                       \
> > +       __usdt_asm1(            .ascii "\0")
> 
> nit: keep it single line
> 
> 
> btw, the source of truth for usdt.h is at Github, please send a proper
> PR with these change there, and then we can just sync upstream version
> into selftests?

yes, will do that

thanks,
jirka