linux-kernel - Re: [BUG almost bisected] Splat in dequeue_rt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b9064ed8-387d-47ce-ad0a-7642ad180fc3@paulmck-laptop>
Date: Mon, 14 Oct 2024 11:55:05 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Tomas Glozar <tglozar@...hat.com>
Cc: Valentin Schneider <vschneid@...hat.com>, Chen Yu <yu.c.chen@...el.com>,
	Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
	sfr@...b.auug.org.au, linux-next@...r.kernel.org,
	kernel-team@...a.com
Subject: Re: [BUG almost bisected] Splat in dequeue_rt_stack() and build error

On Thu, Oct 10, 2024 at 04:28:38PM -0700, Paul E. McKenney wrote:
> On Thu, Oct 10, 2024 at 08:01:35AM -0700, Paul E. McKenney wrote:
> > On Thu, Oct 10, 2024 at 01:24:11PM +0200, Tomas Glozar wrote:
> > > st 2. 10. 2024 v 11:01 odesílatel Tomas Glozar <tglozar@...hat.com> napsal:
> > > >
> > > > FYI I have managed to reproduce the bug on our infrastructure after 21
> > > > hours of 7*TREE03 and I will continue with trying to reproduce it with
> > > > the tracers we want.
> > > >
> > > > Tomas
> > > 
> > > I successfully reproduced the bug also with the tracers active after a
> > > few 8-hour test runs on our infrastructure:
> > > 
> > > [    0.000000] Linux version 6.11.0-g2004cef11ea0-dirty (...) #1 SMP
> > > PREEMPT_DYNAMIC Wed Oct  9 12:13:40 EDT 2024
> > > [    0.000000] Command line: debug_boot_weak_hash panic=-1 selinux=0
> > > initcall_debug debug console=ttyS0 rcutorture.n_barrier_cbs=4
> > > rcutorture.stat_interval=15 rcutorture.shutdown_secs=25200
> > > rcutorture.test_no_idle_hz=1 rcutorture.verbose=1
> > > rcutorture.onoff_interval=200 rcutorture.onoff_holdoff=30
> > > rcutree.gp_preinit_delay=12 rcutree.gp_init_delay=3
> > > rcutree.gp_cleanup_delay=3 rcutree.kthread_prio=2 threadirqs
> > > rcutree.use_softirq=0
> > > trace_event=sched:sched_switch,sched:sched_wakeup
> > > ftrace_filter=dl_server_start,dl_server_stop trace_buf_size=2k
> > > ftrace=function torture.ftrace_dump_at_shutdown=1
> > > ...
> > > [13550.127541] WARNING: CPU: 1 PID: 155 at
> > > kernel/sched/deadline.c:1971 enqueue_dl_entity+0x554/0x5d0
> > > [13550.128982] Modules linked in:
> > > [13550.129528] CPU: 1 UID: 0 PID: 155 Comm: rcu_torture_rea Tainted: G
> > >        W          6.11.0-g2004cef11ea0-dirty #1
> > > [13550.131419] Tainted: [W]=WARN
> > > [13550.131979] Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-2.el9 04/01/2014
> > > [13550.133230] RIP: 0010:enqueue_dl_entity+0x554/0x5d0
> > > ...
> > > [13550.151286] Call Trace:
> > > [13550.151749]  <TASK>
> > > [13550.152141]  ? __warn+0x88/0x130
> > > [13550.152717]  ? enqueue_dl_entity+0x554/0x5d0
> > > [13550.153485]  ? report_bug+0x18e/0x1a0
> > > [13550.154149]  ? handle_bug+0x54/0x90
> > > [13550.154792]  ? exc_invalid_op+0x18/0x70
> > > [13550.155484]  ? asm_exc_invalid_op+0x1a/0x20
> > > [13550.156249]  ? enqueue_dl_entity+0x554/0x5d0
> > > [13550.157055]  dl_server_start+0x36/0xf0
> > > [13550.157709]  enqueue_task_fair+0x220/0x6b0
> > > [13550.158447]  activate_task+0x26/0x60
> > > [13550.159131]  attach_task+0x35/0x50
> > > [13550.159756]  sched_balance_rq+0x663/0xe00
> > > [13550.160511]  sched_balance_newidle.constprop.0+0x1a5/0x360
> > > [13550.161520]  pick_next_task_fair+0x2f/0x340
> > > [13550.162290]  __schedule+0x203/0x900
> > > [13550.162958]  ? enqueue_hrtimer+0x35/0x90
> > > [13550.163703]  schedule+0x27/0xd0
> > > [13550.164299]  schedule_hrtimeout_range_clock+0x99/0x120
> > > [13550.165239]  ? __pfx_hrtimer_wakeup+0x10/0x10
> > > [13550.165954]  torture_hrtimeout_us+0x7b/0xe0
> > > [13550.166624]  rcu_torture_reader+0x139/0x200
> > > [13550.167284]  ? __pfx_rcu_torture_timer+0x10/0x10
> > > [13550.168019]  ? __pfx_rcu_torture_reader+0x10/0x10
> > > [13550.168764]  kthread+0xd6/0x100
> > > [13550.169262]  ? __pfx_kthread+0x10/0x10
> > > [13550.169860]  ret_from_fork+0x34/0x50
> > > [13550.170424]  ? __pfx_kthread+0x10/0x10
> > > [13550.171020]  ret_from_fork_asm+0x1a/0x30
> > > [13550.171657]  </TASK>
> > > 
> > > Unfortunately, the following rcu stalls appear to have resulted in
> > > abnormal termination of the VM, which led to the ftrace buffer not
> > > being dumped into the console. Currently re-running the same test with
> > > the addition of "ftrace_dump_on_oops panic_on_warn=1" and hoping for
> > > the best.
> > 
> > Another approach would be rcupdate.rcu_cpu_stall_suppress=1.
> > 
> > We probably need to disable RCU CPU stall warnings automatically while
> > dumping ftrace buffers, but the asynchronous nature of printk() makes
> > it difficult to work out when to automatically re-enable them...
> 
> And in the meantime, for whatever it is worth...
> 
> The pattern of failures motivated me to add to rcutorture a real-time
> task that randomly preempts a randomly chosen online CPU.  Here are
> the (new and not-to-be-trusted) commits on -rcu's "dev" branch:
> 
> d1b99fa42af7 ("torture: Add dowarn argument to torture_sched_setaffinity()")
> aed555adc22a ("rcutorture: Add random real-time preemption")
> b09bcf8e1406 ("rcutorture: Make the TREE03 scenario do preemption")
> 
> Given these, the following sort of command, when run on dual-socket
> systems, reproduces a silent failure within a few minutes:
> 
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 4h --configs "4*TREE03" --kconfig "CONFIG_NR_CPUS=4" --trust-make
> 
> But on my laptop, a 30-minute run resulted in zero failures.  I am now
> retrying with a four-hour laptop run.

And this silent failure was me hurting myself with a change to scripting
to better handle test hosts disappearing (it does sometimes happen).
With the scripting fixed, I am getting simple too-short grace periods,
though only a few per 8-hour 400*TREE03 4-CPU guest-OS run.

> I am also adjusting the preemption duration and frequency to see if a
> more edifying failure mode might make itself apparent.  :-/

But no big wins thus far, so this will be a slow process.  My current test
disables CPU hotplug.  I will be disabling other things in the hope of
better identifying the code paths that should be placed under suspicion.

							Thanx, Paul