linux-kernel - Re: BUG: scheduling while atomic with PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <727212f9d3c324787ddd9ede9e2d800a02b629b2.camel@web.de>
Date: Sun, 08 Jun 2025 17:53:10 +0200
From: Bert Karwatzki <spasswolf@....de>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Steven Rostedt
	 <rostedt@...dmis.org>
Cc: linux-kernel@...r.kernel.org, linux-next@...r.kernel.org, 
	bpf@...r.kernel.org, linux-rt-users@...r.kernel.org, 
	linux-rt-devel@...ts.linux.dev, Thomas Gleixner <tglx@...utronix.de>,
 Alexei Starovoitov <alexei.starovoitov@...il.com>, spasswolf@....de
Subject: Re: BUG: scheduling while atomic with PREEMPT_RT=y and bpf selftests

Am Sonntag, dem 08.06.2025 um 10:45 +0200 schrieb Bert Karwatzki:
> Am Donnerstag, dem 05.06.2025 um 14:51 +0200 schrieb Sebastian Andrzej Siewior:
> > On 2025-06-05 08:48:38 [-0400], Steven Rostedt wrote:
> > > On Thu,  5 Jun 2025 11:19:03 +0200
> > > Bert Karwatzki <spasswolf@....de> wrote:
> > > 
> > > > This patch seems to create so much output that the orginal error message and
> > > > backtrace often get lost, so I needed several runs to get a meaningful message
> > > > when running
> > > 
> > > Are you familiar with preempt count tracing?
> > 
> > I have an initial set of patches to tackle this problem, I'm going to
> > send them after the merge window.
> > 
> > Sebastian
> 
> I've found the reason for the "mysterious" increase of preempt_count:
> 
> [   70.821750] [   T2746] bpf_link_settle calling fd_install() preemt_count = 0
> [   70.821751] [   T2746] preempt_count_add 5898: preempt_count = 0x0 counter = 0x1b232c
> [   70.821752] [   T2746] preempt_count_add 5900: preempt_count = 0x1 counter = 0x1b232d
> [   70.821754] [   T2746] preempt_count_sub 5966: preempt_count = 0x1 counter = 0x1b232e
> [   70.821755] [   T2746] preempt_count_sub 5968: preempt_count = 0x0 counter = 0x1b232f
> [   70.821761] [   T2746] __bpf_trace_sys_enter 18: preempt_count = 0x0
> [   70.821762] [   T2746] __bpf_trace_sys_enter 18: preempt_count = 0x1
> [   70.821764] [   T2746] __bpf_trace_run: preempt_count = 1
> [   70.821765] [   T2746] bpf_prog_run: preempt_count = 1
> [   70.821766] [   T2746] __bpf_prog_run: preempt_count = 1
> 
> It's caused by this macro from include/trace/bpf_probe.h (with my pr_err()):
> 
> #define __BPF_DECLARE_TRACE_SYSCALL(call, proto, args) \
> static notrace void \
> __bpf_trace_##call(void *__data, proto) \
> { \
>  might_fault(); \
>  if (!strcmp(get_current()->comm, "test_progs")) \
>  pr_err("%s %d: preempt_count = 0x%x", __func__, __LINE__, preempt_count());\
>  preempt_disable_notrace(); \
>  if (!strcmp(get_current()->comm, "test_progs")) \
>  pr_err("%s %d: preempt_count = 0x%x", __func__, __LINE__, preempt_count());\
>  CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args)); \
>  preempt_enable_notrace(); \
> }
> 
> The preempt_{en,dis}able_notrace were introduced in
> commit 4aadde89d81f ("tracing/bpf: disable preemption in syscall probe")
> This commit is present in v6.14 and v6.15, but the bug already appears in
> v6.12 so in that case preemption is disable somewhere else. 
> 
> Bert Karwatzki

After reading this 
https://lore.kernel.org/bpf/CAADnVQJf535hwud5XtQKStOge9=pYVYWSiq_8Q2YAvN5rba==A@mail.gmail.com/
I tried using migrate_{en,disable} like this (in v6.15)

diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index 183fa2aa2935..49257cb90209 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -58,9 +58,9 @@ static notrace void							\
 __bpf_trace_##call(void *__data, proto)					\
 {									\
 	might_fault();							\
-	preempt_disable_notrace();					\
+	migrate_disable();					\
 	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args));	\
-	preempt_enable_notrace();					\
+	migrate_enable();					\
 }
 
 #undef DECLARE_EVENT_SYSCALL_CLASS
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 187dc37d61d4..ec0326405fc3 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2350,7 +2350,7 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
 	struct bpf_run_ctx *old_run_ctx;
 	struct bpf_trace_run_ctx run_ctx;
 
-	cant_sleep();
+	cant_migrate();
 	if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) {
 		bpf_prog_inc_misses_counter(prog);
 		goto out;
diff --git a/tools/testing/selftests/bpf/progs/dynptr_success.c b/tools/testing/selftests/bpf/progs/dynptr_success.c
index e1fba28e4a86..7cfb9473a526 100644
--- a/tools/testing/selftests/bpf/progs/dynptr_success.c
+++ b/tools/testing/selftests/bpf/progs/dynptr_success.c
@@ -7,6 +7,7 @@
 #include <bpf/bpf_helpers.h>
 #include <bpf/bpf_tracing.h>
 #include "bpf_misc.h"
+#include "bpf_kfuncs.h"
 #include "errno.h"
 
 char _license[] SEC("license") = "GPL";


This fixes the warnings when running the bpf cgroup examples:

./test_progs -a "cgrp_local_storage/cgrp1*"

but I still get a warning from another example (I don't know which, yet):
[ T4696] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[ T4696] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 4696, name: test_progs
[ T4696] preempt_count: 1, expected: 0
[ T4696] RCU nest depth: 0, expected: 0
[ T4696] 2 locks held by test_progs/4696:
[ T4696]  #0: ffffffff91d30860 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0xbb/0x250
[ T4696]  #1: ffff9ca88e6741c8 ((&c->lock)){+.+.}-{3:3}, at: ___slab_alloc+0x68/0xde0
[ T4696] irq event stamp: 100
[ T4696] hardirqs last  enabled at (99): [<ffffffff91199098>] do_syscall_64+0x38/0xfa0
[ T4696] hardirqs last disabled at (100): [<ffffffff907bc57b>] __bpf_async_init+0xdb/0x310
[ T4696] softirqs last  enabled at (0): [<ffffffff9057b694>] copy_process+0xc84/0x3840
[ T4696] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ T4696] CPU: 1 UID: 0 PID: 4696 Comm: test_progs Tainted: G           O        6.15.0-bpf-00003-g5197b534e6ad #4 PREEMPT_{RT,(full)} 
[ T4696] Tainted: [O]=OOT_MODULE
[ T4696] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-158L, BIOS E158LAMS.10F 11/11/2024
[ T4696] Call Trace:
[ T4696]  <TASK>
[ T4696]  dump_stack_lvl+0x6d/0xb0
[ T4696]  __might_resched.cold+0xe1/0xf3
[ T4696]  rt_spin_lock+0x5f/0x190
[ T4696]  ? ___slab_alloc+0x68/0xde0
[ T4696]  ? srso_alias_return_thunk+0x5/0xfbef5
[ T4696]  ___slab_alloc+0x68/0xde0
[ T4696]  ? find_held_lock+0x2b/0x80
[ T4696]  ? try_to_wake_up+0x47b/0xbb0
[ T4696]  ? srso_alias_return_thunk+0x5/0xfbef5
[ T4696]  ? bpf_map_kmalloc_node+0x72/0x220
[ T4696]  ? srso_alias_return_thunk+0x5/0xfbef5
[ T4696]  ? _raw_spin_unlock_irqrestore+0x40/0x80
[ T4696]  ? srso_alias_return_thunk+0x5/0xfbef5
[ T4696]  ? try_to_wake_up+0x47b/0xbb0
[ T4696]  ? bpf_map_kmalloc_node+0x72/0x220
[ T4696]  __kmalloc_node_noprof+0xee/0x490
[ T4696]  bpf_map_kmalloc_node+0x72/0x220
[ T4696]  __bpf_async_init+0x107/0x310
[ T4696]  bpf_timer_init+0x33/0x40
[ T4696]  bpf_prog_7e15f1bc7d1d26d0_start_cb+0x5d/0x91
[ T4696]  bpf_prog_d85f43676fabf521_start_timer+0x65/0x8a
[ T4696]  bpf_prog_test_run_syscall+0x103/0x250
[ T4696]  ? fput+0x3f/0x90
[ T4696]  __sys_bpf+0xd33/0x26d0
[ T4696]  __x64_sys_bpf+0x21/0x30
[ T4696]  do_syscall_64+0x72/0xfa0
[ T4696]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ T4696] RIP: 0033:0x7f6e173fa779
[ T4696] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff
73 01 c3 48 8b 0d 4f 86 0d 00 f7 d8 64 89 01 48
[ T4696] RSP: 002b:00007f6e0f95a878 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[ T4696] RAX: ffffffffffffffda RBX: 00007f6e0f95bcdc RCX: 00007f6e173fa779
[ T4696] RDX: 0000000000000050 RSI: 00007f6e0f95a8b0 RDI: 000000000000000a
[ T4696] RBP: 00007f6e0f95a890 R08: 0000000000000003 R09: 00007f6e0f95a8b0
[ T4696] R10: 00007ffcf6d23f80 R11: 0000000000000202 R12: 0000000000000020
[ T4696] R13: 000000000000005f R14: 00007ffcf6d23d70 R15: 00007f6e0f15b000
[ T4696]  </TASK>

Soon after that (1-2 min) while still running the bpf test_progs I got a kernel panic
(flashing capslock and lockup) and needed to reboot.

Bert Karwatzki