[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64e92332-09c7-4cae-ac63-e1701b3f3814@paulmck-laptop>
Date: Tue, 1 Oct 2024 09:47:27 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Valentin Schneider <vschneid@...hat.com>
Cc: Chen Yu <yu.c.chen@...el.com>, Peter Zijlstra <peterz@...radead.org>,
linux-kernel@...r.kernel.org, sfr@...b.auug.org.au,
linux-next@...r.kernel.org, kernel-team@...a.com,
Tomas Glozar <tglozar@...hat.com>
Subject: Re: [BUG almost bisected] Splat in dequeue_rt_stack() and build error
On Tue, Oct 01, 2024 at 02:52:37PM +0200, Valentin Schneider wrote:
> On 01/10/24 03:10, Paul E. McKenney wrote:
> > On Mon, Sep 30, 2024 at 10:44:24PM +0200, Valentin Schneider wrote:
> >> On 30/09/24 12:09, Paul E. McKenney wrote:
> >> >
> >> > And Peter asked that I send along a reproducer, which I am finally getting
> >> > around to doing:
> >> >
> >> > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 12h --configs "100*TREE03" --trust-make
> >> >
> >>
> >> FYI Tomas (on Cc) has been working on getting pretty much this to run on
> >> our infra, no hit so far.
> >>
> >> How much of a pain would it be to record an ftrace trace while this runs?
> >> I'm thinking sched_switch, sched_wakeup and function-tracing
> >> dl_server_start() and dl_server_stop() would be a start.
> >>
> >> AIUI this is running under QEMU so we'd need to record the trace within
> >> that, I'm guessing we can (ab)use --bootargs to feed it tracing arguments,
> >> but how do we get the trace out?
To answer this question directly, I am trying this:
--bootargs "trace_event=sched:sched_switch,sched:sched_wakeup ftrace_filter=dl_server_start,dl_server_stop torture.ftrace_dump_at_shutdown=1"
Huh, 50MB and growing. I need to limit the buffer size as well.
How about "trace_buf_size=2k"? The default is 1,441,792, just
over 1m.
Except that I am not getting either dl_server_start() or dl_server_stop(),
perhaps because they are not being invoked in this short test run.
So try some function that is definitely getting invoked, such as
rcu_sched_clock_irq().
No joy there, either, so maybe add "ftrace=function"?
No: "[ 1.542360] ftrace bootup tracer 'function' not registered."
The "torture.ftrace_dump_at_shutdown" is just for experiment. The
actual runs will do something like this:
if (on_dl_rq(dl_se)) { // Was: WARN_ON_ONCE(on_dl_rq(dl_se));
tracing_off();
ftrace_dump(DUMP_ALL);
WARN_ON_ONCE(1);
}
> > Me, I would change those warnings to dump the trace buffer to the
> > console when triggered. Let me see if I can come up with something
> > better over breakfast. And yes, there is the concern that adding tracing
> > will suppress this issue.
> >
> > So is there some state that I could manually dump upon triggering either
> > of these two warnings? That approach would minimize the probability of
> > suppressing the problem.
>
> Usually enabling panic_on_warn and getting a kdump is ideal, but here this
> is with QEMU - I know we can get a vmcore out via dump-guest-memory in the
> QEMU monitor, but I don't have an immediate solution to do that on a
> warn/panic.
Especially given that I don't have a QEMU monitor for these 100 runs.
But if there is a way to do this programatically from within the
kernel, I would be happy to give it a try.
> Also I'd say here we're mostly interested in the sequence of events leading
> us to the warn (dl_server_start() when the DL entity is somehow still
> enqueued) rather than the state of things when the warn is hit, and for
> that dumping the ftrace buffer to the console sounds good enough to me.
That I can do!!! Give or take function tracing appearing not to work
for me from the kernel command line. :-(
Thanx, Paul
Powered by blists - more mailing lists