[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210213003709.GA27846@paulmck-ThinkPad-P72>
Date: Fri, 12 Feb 2021 16:37:09 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Uladzislau Rezki <urezki@...il.com>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>, RCU <rcu@...r.kernel.org>,
Michael Ellerman <mpe@...erman.id.au>,
Andrew Morton <akpm@...ux-foundation.org>,
Daniel Axtens <dja@...ens.net>,
Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <neeraju@...eaurora.org>,
Joel Fernandes <joel@...lfernandes.org>,
Peter Zijlstra <peterz@...radead.org>,
Michal Hocko <mhocko@...e.com>,
Thomas Gleixner <tglx@...utronix.de>,
"Theodore Y . Ts'o" <tytso@....edu>,
Oleksiy Avramchenko <oleksiy.avramchenko@...ymobile.com>
Subject: Re: [PATCH 2/2] rcu-tasks: add RCU-tasks self tests
On Fri, Feb 12, 2021 at 03:48:51PM -0800, Paul E. McKenney wrote:
> On Fri, Feb 12, 2021 at 10:12:07PM +0100, Uladzislau Rezki wrote:
> > On Fri, Feb 12, 2021 at 08:20:59PM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2020-12-09 21:27:32 [+0100], Uladzislau Rezki (Sony) wrote:
> > > > Add self tests for checking of RCU-tasks API functionality.
> > > > It covers:
> > > > - wait API functions;
> > > > - invoking/completion call_rcu_tasks*().
> > > >
> > > > Self-tests are run when CONFIG_PROVE_RCU kernel parameter is set.
> > >
> > > I just bisected to this commit. By booting with `threadirqs' I end up
> > > with:
> > > [ 0.176533] Running RCU-tasks wait API self tests
> > >
> > > No stall warning or so.
> > > It boots again with:
> > >
> > > diff --git a/init/main.c b/init/main.c
> > > --- a/init/main.c
> > > +++ b/init/main.c
> > > @@ -1489,6 +1489,7 @@ void __init console_on_rootfs(void)
> > > fput(file);
> > > }
> > >
> > > +void rcu_tasks_initiate_self_tests(void);
> > > static noinline void __init kernel_init_freeable(void)
> > > {
> > > /*
> > > @@ -1514,6 +1515,7 @@ static noinline void __init kernel_init_freeable(void)
> > >
> > > rcu_init_tasks_generic();
> > > do_pre_smp_initcalls();
> > > + rcu_tasks_initiate_self_tests();
> > > lockup_detector_init();
> > >
> > > smp_init();
> > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > > --- a/kernel/rcu/tasks.h
> > > +++ b/kernel/rcu/tasks.h
> > > @@ -1266,7 +1266,7 @@ static void test_rcu_tasks_callback(struct rcu_head *rhp)
> > > rttd->notrun = true;
> > > }
> > >
> > > -static void rcu_tasks_initiate_self_tests(void)
> > > +void rcu_tasks_initiate_self_tests(void)
> > > {
> > > pr_info("Running RCU-tasks wait API self tests\n");
> > > #ifdef CONFIG_TASKS_RCU
> > > @@ -1322,7 +1322,6 @@ void __init rcu_init_tasks_generic(void)
> > > #endif
> > >
> > > // Run the self-tests.
> > > - rcu_tasks_initiate_self_tests();
> > > }
> > >
> > > #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
> > >
> > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@...il.com>
>
> Apologies for the hassle! My testing clearly missed this combination
> of CONFIG_PROVE_RCU=y and threadirqs=1. :-(
>
> But at least I can easily reproduce this hang as follows:
>
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 2 --configs "TREE03" --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" --bootargs "threadirqs=1" --trust-make
>
> Sadly, I cannot take your patch because that simply papers over the
> fact that early boot use of synchronize_rcu_tasks() is broken in this
> particular configuration, which will likely eventually bite others now
> that init_kprobes() has been moved earlier in boot:
>
> 1b04fa990026 ("rcu-tasks: Move RCU-tasks initialization to before early_initcall()")
> Link: https://lore.kernel.org/rcu/87eekfh80a.fsf@dja-thinkpad.axtens.net/
> Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall")
>
> > > Sebastian
> > >
> > We should be able to use call_rcu_tasks() in the *initcall() callbacks.
> > The problem is that, ksoftirqd threads are not spawned by the time when
> > an rcu_init_tasks_generic() is invoked:
> >
> > diff --git a/init/main.c b/init/main.c
> > index c68d784376ca..e6106bb12b2d 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -954,7 +954,6 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
> > rcu_init_nohz();
> > init_timers();
> > hrtimers_init();
> > - softirq_init();
> > timekeeping_init();
> >
> > /*
> > @@ -1512,6 +1511,7 @@ static noinline void __init kernel_init_freeable(void)
> >
> > init_mm_internals();
> >
> > + softirq_init();
> > rcu_init_tasks_generic();
> > do_pre_smp_initcalls();
> > lockup_detector_init();
> > diff --git a/kernel/softirq.c b/kernel/softirq.c
> > index 9d71046ea247..cafa55c496d0 100644
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> > @@ -630,6 +630,7 @@ void __init softirq_init(void)
> > &per_cpu(tasklet_hi_vec, cpu).head;
> > }
> >
> > + spawn_ksoftirqd();
>
> We need a forward reference to allow this to build, but with that added,
> my test case passes. Good show!
>
> > open_softirq(TASKLET_SOFTIRQ, tasklet_action);
> > open_softirq(HI_SOFTIRQ, tasklet_hi_action);
> > }
> > @@ -732,7 +733,6 @@ static __init int spawn_ksoftirqd(void)
> >
> > return 0;
> > }
> > -early_initcall(spawn_ksoftirqd);
> >
> > /*
> > * [ These __weak aliases are kept in a separate compilation unit, so that
> >
> > Any thoughts?
>
> One likely problem is that there are almost certainly parts of the kernel
> that need softirq_init() to stay roughly where it is. So, is it possible
> to leave softirq_init() where it is, and to arrange for spawn_ksoftirqd()
> to be invoked just before rcu_init_tasks_generic() is called?
This still seems worth trying (and doing so is next on my list), but just
in case there are objections to it, please see below. The advantage
of the approach below is that it renders irrelevant the exact point at
which ksoftirqd is spawned. The advantage of moving spawn_ksoftirqd()
earlier is that it avoids adding checks to the softirq-handler code.
Me, I prefer the robustness of the approach below, but the important
thing is of course to solve the problem one way or another.
> For my part, I will look into what is required to make Tasks RCU do
> without softirq during boot, for example, by looking carefully at where
> in boot RCU grace periods are unconditionally expedited. Just in case
> adjusting softirq has unforeseen side effects.
I should hasten to add that my test case sets tree.use_softirq=0, so
it is not RCU_SOFTIRQ that is lacking, but rather one of the others.
TIMER_SOFTIRQ?
Given that, maybe the code that switches from softirq to ksoftirqd
is doing so too early at boot. And the patch below allows my test
case to complete successfully. I hope this patch is uncontroversial.
After all, unless I am suffering a massive failure of imagination, it
is rather difficult to wake up kthreads that have not yet been created,
which is what current mainline looks to be doing.
There are three places invoking wakeup_softirqd(), and this patch
deals only with the first two. They suffice thus far, and I would like
to hear what others think before I pull something like irq-work into
raise_softirq_irqoff(). :-/
Thoughts?
Thanx, Paul
------------------------------------------------------------------------
commit be5b0b6d9c8f22b86ea5591885bf38987ef777d9
Author: Paul E. McKenney <paulmck@...nel.org>
Date: Fri Feb 12 16:20:40 2021 -0800
softirq: Don't try waking ksoftirqd before it has been spawned
If there is heavy softirq activity, the softirq system will attempt
to awaken ksoftirqd and will stop the traditional back-of-interrupt
softirq processing. This is all well and good, but only if the
ksoftirqd kthreads already exist, which is not the case during early
boot, in which case the system hangs.
One reproducer is as follows:
tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 2 --configs "TREE03" --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" --bootargs "threadirqs=1" --trust-make
This commit therefore adds a couple of existence checks for ksoftirqd
and forces back-of-interrupt softirq processing when ksoftirqd does not
yet exist. With this change, the above test passes.
Reported-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Reported-by: Uladzislau Rezki <urezki@...il.com>
Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 9d71046..ba78e63 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -209,7 +209,7 @@ static inline void invoke_softirq(void)
if (ksoftirqd_running(local_softirq_pending()))
return;
- if (!force_irqthreads) {
+ if (!force_irqthreads || !__this_cpu_read(ksoftirqd)) {
#ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
/*
* We can safely execute softirq on the current stack if
@@ -358,8 +358,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
pending = local_softirq_pending();
if (pending) {
- if (time_before(jiffies, end) && !need_resched() &&
- --max_restart)
+ if (!__this_cpu_read(ksoftirqd) ||
+ (time_before(jiffies, end) && !need_resched() && --max_restart))
goto restart;
wakeup_softirqd();
Powered by blists - more mailing lists