linux-kernel - Re: native_smp_send_reschedule() splat from rt_mutex

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20170921132820.mwwpbee5rbuz54o6@linutronix.de>
Date:   Thu, 21 Sep 2017 15:28:20 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>, mingo@...hat.com,
        linux-kernel@...r.kernel.org, tglx@...utronix.de
Subject: Re: native_smp_send_reschedule() splat from rt_mutex_lock()?

On 2017-09-21 14:41:05 [+0200], Peter Zijlstra wrote:
> On Wed, Sep 20, 2017 at 06:24:47PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2017-09-18 09:51:10 [-0700], Paul E. McKenney wrote:
> > > Hello!
> > Hi,
> > 
> > > [11072.586518] sched: Unexpected reschedule of offline CPU#6!
> > > [11072.587578] ------------[ cut here ]------------
> > > [11072.588563] WARNING: CPU: 0 PID: 59 at /home/paulmck/public_git/linux-rcu/arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x37/0x40
> > > [11072.591543] Modules linked in:
> > > [11072.591543] CPU: 0 PID: 59 Comm: rcub/10 Not tainted 4.14.0-rc1+ #1
> > > [11072.610596] Call Trace:
> > > [11072.611531]  resched_curr+0x61/0xd0
> > > [11072.611531]  switched_to_rt+0x8f/0xa0
> > > [11072.612647]  rt_mutex_setprio+0x25c/0x410
> > > [11072.613591]  task_blocks_on_rt_mutex+0x1b3/0x1f0
> > > [11072.614601]  rt_mutex_slowlock+0xa9/0x1e0
> > > [11072.615567]  rt_mutex_lock+0x29/0x30
> > > [11072.615567]  rcu_boost_kthread+0x127/0x3c0
> > 
> > > In theory, I could work around this by excluding CPU-hotplug operations
> > > while doing RCU priority boosting, but in practice I am very much hoping
> > > that there is a more reasonable solution out there...
> > 
> > so in CPUHP_TEARDOWN_CPU / take_cpu_down() / __cpu_disable() the CPU is
> > marked as offline and interrupt handling is disabled. Later in
> > CPUHP_AP_SCHED_STARTING / sched_cpu_dying() all tasks are migrated away.
> > 
> > Did this hit a random task during a CPU-hotplug operation which was not
> > yet migrated away from the dying CPU? In theory a futex_unlock() of a RT
> > task could also produce such a backtrace.
> 
> So this is an interrupt that got received while we were going down, and
> processed after we've migrated the tasks away, right?

No, I don't think so. A random CPU sent an IPI to an offline not yet
dead CPU which got lost. However before the CPU went dead it migrated
all task to another CPU and I *think* since that task was runnable it
got on the CPU soon.

> Should we not clear the IRQ pending masks somewhere along the line?

no I think we are good.

> Other than that, there's nothing we can do to avoid this race.

We could not send the IPI. Migrating the task (instead the IPI in this
case) is probably too much since it will be done soon (at least in this
scenario). So maybe we could just remove that warning _or_ add something
to cpu.c to check the "hotplug-state < CPUHP_AP_SCHED_STARTING" and cpu_offline()
and warn only in this case.

This is what the removal would look like:

diff --git a/arch/m32r/kernel/smp.c b/arch/m32r/kernel/smp.c
index 564052e3d3a0..2df5373063ae 100644
--- a/arch/m32r/kernel/smp.c
+++ b/arch/m32r/kernel/smp.c
@@ -103,7 +103,6 @@ static void send_IPI_mask(const struct cpumask *, int, int);
  *==========================================================================*/
 void smp_send_reschedule(int cpu_id)
 {
-	WARN_ON(cpu_is_offline(cpu_id));
 	send_IPI_mask(cpumask_of(cpu_id), RESCHEDULE_IPI, 1);
 }
 
diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index 94a62e1197ce..8234e3c04d50 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -260,8 +260,6 @@ void __init ipi_init(void)
 
 void smp_send_reschedule(int cpu)
 {
-	WARN_ON(cpu_is_offline(cpu));
-
 	/*
 	 * We just want to do an MMIO store.  The traditional writeq()
 	 * functions aren't really correct here, since they're always
@@ -277,8 +275,6 @@ void smp_send_reschedule(int cpu)
 {
 	HV_Coord coord;
 
-	WARN_ON(cpu_is_offline(cpu));
-
 	coord.y = cpu_y(cpu);
 	coord.x = cpu_x(cpu);
 	hv_trigger_ipi(coord, IRQ_RESCHEDULE);
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index d3c66a15bbde..31493748dd2d 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -123,10 +123,6 @@ static bool smp_no_nmi_ipi = false;
  */
 static void native_smp_send_reschedule(int cpu)
 {
-	if (unlikely(cpu_is_offline(cpu))) {
-		WARN_ON(1);
-		return;
-	}
 	apic->send_IPI(cpu, RESCHEDULE_VECTOR);
 }
 

Sebastian