linux-kernel - Re: [PATCH] fix rcu vs hotplug race

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080626152728.GA24972@linux.vnet.ibm.com>
Date:	Thu, 26 Jun 2008 08:27:28 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Gautham R Shenoy <ego@...ibm.com>,
	Dhaval Giani <dhaval@...ux.vnet.ibm.com>,
	Dipankar Sarma <dipankar@...ibm.com>, laijs@...fujitsu.com,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	lkml <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] fix rcu vs hotplug race

On Tue, Jun 24, 2008 at 01:01:44PM +0200, Ingo Molnar wrote:
> 
> * Gautham R Shenoy <ego@...ibm.com> wrote:
> 
> > > hm, not sure - we might just be fighting the symptom and we might 
> > > now create a silent resource leak instead. Isnt a full RCU quiescent 
> > > state forced (on all CPUs) before a CPU is cleared out of 
> > > cpu_online_map? That way the to-be-offlined CPU should never 
> > > actually show up in rcp->cpumask.
> > 
> > No, this does not happen currently. The rcp->cpumask is always 
> > initialized to cpu_online_map&~nohz_cpu_mask when we start a new 
> > batch. Hence, before the batch ends, if a cpu goes offline we _can_ 
> > have a stale rcp->cpumask, till the RCU subsystem has handled it's 
> > CPU_DEAD notification.
> > 
> > Thus for a tiny interval, the rcp->cpumask would contain the offlined 
> > CPU. One of the alternatives is probably to handle this using 
> > CPU_DYING notifier instead of CPU_DEAD where we can call 
> > __rcu_offline_cpu().
> > 
> > The warn_on that dhaval was hitting was because of some cpu-offline 
> > that was called just before we did a local_irq_save inside call_rcu(). 
> > But at that time, the rcp->cpumask was still stale, and hence we ended 
> > up sending a smp_reschedule() to an offlined cpu. So the check may not 
> > create any resource leak.
> 
> the check may not - but the problem it highlights might and with the 
> patch we'd end up hiding potential problems in this area.
> 
> Paul, what do you think about this mixed CPU hotplug plus RCU workload?

RCU most certainly needs to work correctly in face of arbitrary sequences
of CPU-hotplug events, and should therefore be tested with arbitrary
CPU-hotplug tests.  And RCU also most certainly needs to refrain from
issuing spurious warning messages that might over time be ignored,
possibly causing someone to miss a real bug.  My concern with this patch
is in the second spurious-warning area.

Not sure I answered the actual question, though...

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/