[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210219183336.GA23049@paulmck-ThinkPad-P72>
Date: Fri, 19 Feb 2021 10:33:36 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Uladzislau Rezki <urezki@...il.com>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Masami Hiramatsu <mhiramat@...nel.org>,
Ingo Molnar <mingo@...nel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>, RCU <rcu@...r.kernel.org>,
Michael Ellerman <mpe@...erman.id.au>,
Andrew Morton <akpm@...ux-foundation.org>,
Daniel Axtens <dja@...ens.net>,
Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <neeraju@...eaurora.org>,
Joel Fernandes <joel@...lfernandes.org>,
Michal Hocko <mhocko@...e.com>,
"Theodore Y . Ts'o" <tytso@....edu>,
Oleksiy Avramchenko <oleksiy.avramchenko@...ymobile.com>
Subject: Re: [PATCH] kprobes: Fix to delay the kprobes jump optimization
On Fri, Feb 19, 2021 at 10:18:11AM -0800, Paul E. McKenney wrote:
> On Fri, Feb 19, 2021 at 12:27:51PM +0100, Uladzislau Rezki wrote:
> > On Fri, Feb 19, 2021 at 12:23:57PM +0100, Uladzislau Rezki wrote:
> > > On Fri, Feb 19, 2021 at 12:17:38PM +0100, Sebastian Andrzej Siewior wrote:
> > > > On 2021-02-19 12:13:01 [+0100], Uladzislau Rezki wrote:
> > > > > I or Paul will ask for a test once it is settled down :) Looks like
> > > > > it is, so we should fix for v5.12.
> > > >
> > > > Okay. Since Paul asked for powerpc test on v5.11-rc I wanted check if
> > > > parts of it are also -stable material.
>
> If Masami's patch works for the PowerPC guys on v5.10-rc7, then it can
> be backported. The patch making RCU Tasks initialize itself early won't
> have any effect and can be left or reverted, as we choose. The self-test
> patch will need to be either adjusted or reverted.
>
> However...
>
> The root cause of this problem is that softirq only kind-of works
> during a window of time during boot. It works only if the number and
> duration of softirq handlers during this time is small enough, for some
> ill-defined notion of "small enough". If there are too many, whatever
> that means exactly, then we get failed attempt to awaken ksoftirqd, which
> (sometimes!) results in a silent hang. Which, as you pointed out earlier,
> is a really obnoxious error message. And any minor change could kick
> us into silent-hang state because of the heuristics used to hand off
> to ksoftirqd. The straw that broke the camel's back and all that.
>
> One approach would be to add WARN_ON_ONCE() so that if softirq tries
> to awaken ksoftirqd before it is spawned, we get a nice obvious splat.
> Unfortunately, this gives false positives because there is code that
> needs a softirq handler to run eventually, but is OK with that handler
> being delayed until some random point in the early_initcall() sequence.
>
> Besides which, if we are going to add a check, why not use that check
> just make things work by forcing handler execution to remain within the
> softirq back-of-interrupt context instead of awakening a not-yet-spawned
> ksoftirqd? We can further prevent entry into dyntick-idle state until
> the ksoftirqd kthreads have been spawned, which means that if softirq
> handlers must be deferred, they will be resumed within one jiffy by the
> next scheduler-clock interrupt.
>
> Yes, this can allow softirq handlers to impose large latencies, but only
> during early boot, long before any latency-sensitive applications can
> possibly have been created. So this does not seem like a real problem.
>
> Am I missing something here?
For definiteness, here is the first part of the change, posted earlier.
The commit log needs to be updated. I will post the change that keeps
the tick going as a reply to this email.
Thanx, Paul
------------------------------------------------------------------------
commit 4f659bf04fc4610523544493d6db92fc8670b086
Author: Paul E. McKenney <paulmck@...nel.org>
Date: Fri Feb 12 16:20:40 2021 -0800
softirq: Don't try waking ksoftirqd before it has been spawned
If there is heavy softirq activity, the softirq system will attempt
to awaken ksoftirqd and will stop the traditional back-of-interrupt
softirq processing. This is all well and good, but only if the
ksoftirqd kthreads already exist, which is not the case during early
boot, in which case the system hangs.
One reproducer is as follows:
tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 2 --configs "TREE03" --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" --bootargs "threadirqs=1" --trust-make
This commit therefore adds a couple of existence checks for ksoftirqd
and forces back-of-interrupt softirq processing when ksoftirqd does not
yet exist. With this change, the above test passes.
Reported-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Reported-by: Uladzislau Rezki <urezki@...il.com>
Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 9d71046..ba78e63 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -209,7 +209,7 @@ static inline void invoke_softirq(void)
if (ksoftirqd_running(local_softirq_pending()))
return;
- if (!force_irqthreads) {
+ if (!force_irqthreads || !__this_cpu_read(ksoftirqd)) {
#ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
/*
* We can safely execute softirq on the current stack if
@@ -358,8 +358,8 @@ asmlinkage __visible void __softirq_entry __do_softirq(void)
pending = local_softirq_pending();
if (pending) {
- if (time_before(jiffies, end) && !need_resched() &&
- --max_restart)
+ if (!__this_cpu_read(ksoftirqd) ||
+ (time_before(jiffies, end) && !need_resched() && --max_restart))
goto restart;
wakeup_softirqd();
Powered by blists - more mailing lists