[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080626152728.GA24972@linux.vnet.ibm.com>
Date: Thu, 26 Jun 2008 08:27:28 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Ingo Molnar <mingo@...e.hu>
Cc: Gautham R Shenoy <ego@...ibm.com>,
Dhaval Giani <dhaval@...ux.vnet.ibm.com>,
Dipankar Sarma <dipankar@...ibm.com>, laijs@...fujitsu.com,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
lkml <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] fix rcu vs hotplug race
On Tue, Jun 24, 2008 at 01:01:44PM +0200, Ingo Molnar wrote:
>
> * Gautham R Shenoy <ego@...ibm.com> wrote:
>
> > > hm, not sure - we might just be fighting the symptom and we might
> > > now create a silent resource leak instead. Isnt a full RCU quiescent
> > > state forced (on all CPUs) before a CPU is cleared out of
> > > cpu_online_map? That way the to-be-offlined CPU should never
> > > actually show up in rcp->cpumask.
> >
> > No, this does not happen currently. The rcp->cpumask is always
> > initialized to cpu_online_map&~nohz_cpu_mask when we start a new
> > batch. Hence, before the batch ends, if a cpu goes offline we _can_
> > have a stale rcp->cpumask, till the RCU subsystem has handled it's
> > CPU_DEAD notification.
> >
> > Thus for a tiny interval, the rcp->cpumask would contain the offlined
> > CPU. One of the alternatives is probably to handle this using
> > CPU_DYING notifier instead of CPU_DEAD where we can call
> > __rcu_offline_cpu().
> >
> > The warn_on that dhaval was hitting was because of some cpu-offline
> > that was called just before we did a local_irq_save inside call_rcu().
> > But at that time, the rcp->cpumask was still stale, and hence we ended
> > up sending a smp_reschedule() to an offlined cpu. So the check may not
> > create any resource leak.
>
> the check may not - but the problem it highlights might and with the
> patch we'd end up hiding potential problems in this area.
>
> Paul, what do you think about this mixed CPU hotplug plus RCU workload?
RCU most certainly needs to work correctly in face of arbitrary sequences
of CPU-hotplug events, and should therefore be tested with arbitrary
CPU-hotplug tests. And RCU also most certainly needs to refrain from
issuing spurious warning messages that might over time be ignored,
possibly causing someone to miss a real bug. My concern with this patch
is in the second spurious-warning area.
Not sure I answered the actual question, though...
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists