linux-kernel - Re: [PATCH v5] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3c0df437-f6e5-47c6-aed5-f4cc26fe627a@efficios.com>
Date: Fri, 9 Jan 2026 15:21:19 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Steven Rostedt <rostedt@...dmis.org>,
 Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
 Linux trace kernel <linux-trace-kernel@...r.kernel.org>,
 bpf <bpf@...r.kernel.org>, Masami Hiramatsu <mhiramat@...nel.org>,
 "Paul E. McKenney" <paulmck@...nel.org>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v5] tracing: Guard __DECLARE_TRACE() use of
 __DO_TRACE_CALL() with SRCU-fast

On 2026-01-09 14:19, Steven Rostedt wrote:
> On Fri, 9 Jan 2026 11:10:16 -0800
> Alexei Starovoitov <alexei.starovoitov@...il.com> wrote:
> 
> \> >
>>> We also have to consider that migrate disable is *not* cheap at all
>>> compared to preempt disable.
>>
>> Looks like your complaint comes from lack of engagement in kernel
>> development.
> 
> No need to make comments like that. The Linux kernel is an ocean of code.
> It's very hard to keep up on everything that is happening. I knew of work
> being done on migrate_disable but I didn't know what the impacts of that
> work was. Mathieu is still very much involved and engaged in kernel
> development.

Thanks Steven. I guess Alexei missed my recent involvement in other
areas of the kernel.

As Steven pointed out, the kernel is vast, so I cannot keep up with
the progress on every single topic. That being said, I very recently
(about 1 month ago) tried using migrate disable for the RSS tracking
improvements (hierarchical percpu counters), and found that the overhead
of migrate disable was large compared to preempt disable. The generated
assembler is also orders of magnitude larger (on x86-64).

Creating small placeholder functions which just call preempt/migrate
disable and enable for a preempt RT build:

0000000000002a20 <test_preempt_disable>:
     2a20:       f3 0f 1e fa             endbr64
     2a24:       65 ff 05 00 00 00 00    incl   %gs:0x0(%rip)        # 2a2b <test_preempt_disable+0xb>
     2a2b:       e9 00 00 00 00          jmp    2a30 <test_preempt_disable+0x10>

0000000000002a40 <test_preempt_enable>:
     2a40:       f3 0f 1e fa             endbr64
     2a44:       65 ff 0d 00 00 00 00    decl   %gs:0x0(%rip)        # 2a4b <test_preempt_enable+0xb>
     2a4b:       74 05                   je     2a52 <test_preempt_enable+0x12>
     2a4d:       e9 00 00 00 00          jmp    2a52 <test_preempt_enable+0x12>
     2a52:       e8 00 00 00 00          call   2a57 <test_preempt_enable+0x17>
     2a57:       e9 00 00 00 00          jmp    2a5c <test_preempt_enable+0x1c>

0000000000002920 <test_migrate_disable>:
     2920:       f3 0f 1e fa             endbr64
     2924:       65 48 8b 15 00 00 00    mov    %gs:0x0(%rip),%rdx        # 292c <test_migrate_disable+0xc>
     292b:       00
     292c:       0f b7 82 38 07 00 00    movzwl 0x738(%rdx),%eax
     2933:       66 85 c0                test   %ax,%ax
     2936:       74 0f                   je     2947 <test_migrate_disable+0x27>
     2938:       83 c0 01                add    $0x1,%eax
     293b:       66 89 82 38 07 00 00    mov    %ax,0x738(%rdx)
     2942:       e9 00 00 00 00          jmp    2947 <test_migrate_disable+0x27>
     2947:       65 ff 05 00 00 00 00    incl   %gs:0x0(%rip)        # 294e <test_migrate_disable+0x2e>
     294e:       65 48 8b 05 00 00 00    mov    %gs:0x0(%rip),%rax        # 2956 <test_migrate_disable+0x36>
     2955:       00
     2956:       83 80 00 00 00 00 01    addl   $0x1,0x0(%rax)
     295d:       b8 01 00 00 00          mov    $0x1,%eax
     2962:       66 89 82 38 07 00 00    mov    %ax,0x738(%rdx)
     2969:       65 ff 0d 00 00 00 00    decl   %gs:0x0(%rip)        # 2970 <test_migrate_disable+0x50>
     2970:       74 05                   je     2977 <test_migrate_disable+0x57>
     2972:       e9 00 00 00 00          jmp    2977 <test_migrate_disable+0x57>
     2977:       e8 00 00 00 00          call   297c <test_migrate_disable+0x5c>
     297c:       e9 00 00 00 00          jmp    2981 <test_migrate_disable+0x61>

00000000000029a0 <test_migrate_enable>:
     29a0:       f3 0f 1e fa             endbr64
     29a4:       65 48 8b 15 00 00 00    mov    %gs:0x0(%rip),%rdx        # 29ac <test_migrate_enable+0xc>
     29ab:       00
     29ac:       0f b7 82 38 07 00 00    movzwl 0x738(%rdx),%eax
     29b3:       66 85 c0                test   %ax,%ax
     29b6:       74 0f                   je     29c7 <test_migrate_enable+0x27>
     29b8:       83 c0 01                add    $0x1,%eax
     29bb:       66 89 82 38 07 00 00    mov    %ax,0x738(%rdx)
     29c2:       e9 00 00 00 00          jmp    29c7 <test_migrate_enable+0x27>
     29c7:       65 ff 05 00 00 00 00    incl   %gs:0x0(%rip)        # 29ce <test_migrate_enable+0x2e>
     29ce:       65 48 8b 05 00 00 00    mov    %gs:0x0(%rip),%rax        # 29d6 <test_migrate_enable+0x36>
     29d5:       00
     29d6:       83 80 00 00 00 00 01    addl   $0x1,0x0(%rax)
     29dd:       b8 01 00 00 00          mov    $0x1,%eax
     29e2:       66 89 82 38 07 00 00    mov    %ax,0x738(%rdx)
     29e9:       65 ff 0d 00 00 00 00    decl   %gs:0x0(%rip)        # 29f0 <test_migrate_enable+0x50>
     29f0:       74 05                   je     29f7 <test_migrate_enable+0x57>
     29f2:       e9 00 00 00 00          jmp    29f7 <test_migrate_enable+0x57>
     29f7:       e8 00 00 00 00          call   29fc <test_migrate_enable+0x5c>
     29fc:       e9 00 00 00 00          jmp    2a01 <test_migrate_enable+0x61>

> 
>> migrate_disable _was_ not cheap.
>> Try to benchmark it now.
>> It's inlined. It's a fraction of extra overhead on top of preempt_disable.
> 
> It would be good to have a benchmark of the two. What about fast_srcu? Is
> that fast enough to replace the preempt_disable()? If so, then could we
> just make this the same for both RT and !RT?

I've modified kernel/rcu/refscale.c to compare those:

AMD EPYC 9654 96-Core Processor, kernel baseline: v6.18.1
CONFIG_PREEMPT=y
# CONFIG_PREEMPT_LAZY is not set
# CONFIG_PREEMPT_RT is not set

* preempt disable/enable pair:                                     1.1 ns
* srcu-fast lock/unlock:                                           1.5 ns

CONFIG_RCU_REF_SCALE_TEST=y
* migrate disable/enable pair:                                     3.0 ns
* calls to migrate disable/enable pair within noinline functions: 17.0 ns

CONFIG_RCU_REF_SCALE_TEST=m
* migrate disable/enable pair:                                    22.0 ns

When I attempted using migrate disable, I configured refscale as
a module, which gave me the appalling 22 ns overhead. It looks like
the implementation of migrate disable/enable now differs depending on
whether it's used from the core kernel or from a module. That's rather
unexpected.

It seems to be done on purpose though (INSTANTIATE_EXPORTED_MIGRATE_DISABLE)
to work around the fact that it is not possible to export the runqueues
variable.

That's the kind of compilation context dependent overhead variability I'd
rather avoid in the implementation of the tracepoint instrumentation API.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com