lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <70b06926-5ca5-4fd8-b88f-64179f63425b@paulmck-laptop>
Date: Wed, 2 Oct 2024 05:07:18 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Tomas Glozar <tglozar@...hat.com>
Cc: Valentin Schneider <vschneid@...hat.com>, Chen Yu <yu.c.chen@...el.com>,
	Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
	sfr@...b.auug.org.au, linux-next@...r.kernel.org,
	kernel-team@...a.com
Subject: Re: [BUG almost bisected] Splat in dequeue_rt_stack() and build error

On Wed, Oct 02, 2024 at 11:01:03AM +0200, Tomas Glozar wrote:
> Ășt 1. 10. 2024 v 18:47 odesĂ­latel Paul E. McKenney <paulmck@...nel.org> napsal:
> > Huh, 50MB and growing.  I need to limit the buffer size as well.
> > How about "trace_buf_size=2k"?  The default is 1,441,792, just
> > over 1m.
> >
> Yeah, limiting the size of the buffer is the way to go, we only need
> the last n entries before the oops.
> 
> > Except that I am not getting either dl_server_start() or dl_server_stop(),
> > perhaps because they are not being invoked in this short test run.
> > So try some function that is definitely getting invoked, such as
> > rcu_sched_clock_irq().
> >
> > No joy there, either, so maybe add "ftrace=function"?
> >
> > No: "[    1.542360] ftrace bootup tracer 'function' not registered."
> >
> Did you enable CONFIG_BOOTTIME_TRACING and CONFIG_FUNCTION_TRACER?
> They are not set in the default configuration for TREE03:
> 
> $ grep -E '(FUNCTION_TRACER)|(BOOTTIME_TRACING)'
> ./tools/testing/selftests/rcutorture/res/2024.09.26-14.35.03/TREE03/.config
> CONFIG_HAVE_FUNCTION_TRACER=y
> # CONFIG_BOOTTIME_TRACING is not set
> # CONFIG_FUNCTION_TRACER is not set

Ah, thank you!  I knew I must be forgetting something.  Now a short test
gets me things like this:

[  304.572701] torture_-190      13d.h2. 302863957us : rcu_is_cpu_rrupt_from_idle <-rcu_sched_clock_irq

> > Especially given that I don't have a QEMU monitor for these 100 runs.
> >
> > But if there is a way to do this programatically from within the
> > kernel, I would be happy to give it a try.
> >
> > > Also I'd say here we're mostly interested in the sequence of events leading
> > > us to the warn (dl_server_start() when the DL entity is somehow still
> > > enqueued) rather than the state of things when the warn is hit, and for
> > > that dumping the ftrace buffer to the console sounds good enough to me.
> >
> > That I can do!!!  Give or take function tracing appearing not to work
> > for me from the kernel command line.  :-(
> >
> >                                                         Thanx, Paul
> >
> 
> Thanks for trying to get details about the bug. See my comment above
> about the config options to enable function tracing.

I will check up on last night's run for heisenbug-evaluation purposes,
and if it did trigger, restart with this added:

--kconfigs "CONFIG_BOOTTIME_TRACING=y CONFIG_FUNCTION_TRACER=y"

> FYI I have managed to reproduce the bug on our infrastructure after 21
> hours of 7*TREE03 and I will continue with trying to reproduce it with
> the tracers we want.

Even better!!!

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ