lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080701053900.GB8205@in.ibm.com>
Date:	Tue, 1 Jul 2008 11:09:00 +0530
From:	Gautham R Shenoy <ego@...ibm.com>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	Dhaval Giani <dhaval@...ux.vnet.ibm.com>,
	Dipankar Sarma <dipankar@...ibm.com>,
	Ingo Molnar <mingo@...e.hu>, laijs@...fujitsu.com,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	lkml <linux-kernel@...r.kernel.org>,
	Rusty Russel <rusty@...tcorp.com.au>
Subject: Re: [PATCH] fix rcu vs hotplug race

On Fri, Jun 27, 2008 at 07:58:45AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 27, 2008 at 11:19:59AM +0530, Dhaval Giani wrote:
> > On Fri, Jun 27, 2008 at 10:48:55AM +0530, Dipankar Sarma wrote:
> > > On Fri, Jun 27, 2008 at 10:17:38AM +0530, Gautham R Shenoy wrote:
> > > > IMHO the warning is a spurious one.
> > > > Here's the timeline.
> > > > CPU_A						 CPU_B
> > > > --------------------------------------------------------------
> > > > cpu_down():					.
> > > > .					   	.
> > > > .						.
> > > > stop_machine(): /* disables preemption,		.
> > > > 		 * and irqs */			.
> > > > .						.
> > > > .						.
> > > > take_cpu_down();				.
> > > > .						.
> > > > .						.
> > > > .						.
> > > > cpu_disable(); /*this removes cpu 		.
> > > > 		*from cpu_online_map 		.
> > > > 		*/				.
> > > > .						.
> > > > .						.
> > > > restart_machine(); /* enables irqs */		.
> > > > ------WINDOW DURING WHICH rcp->cpumask is stale ---------------
> > > > .						call_rcu();
> > > > .						/* disables irqs here */
> > > > .						.force_quiescent_state();
> > > > .CPU_DEAD:					.for_each_cpu(rcp->cpumask)
> > > > .						.   smp_send_reschedule();
> > > > .						.
> > > > .						.   WARN_ON() for offlined CPU!
> > > > .
> > > 
> > > Exactly. The call_rcu()s are coming from a different subsystem
> > > and can happen anytime during the CPU hotplug path. So, RCU subsystem
> > > doesn't have anything to do to keep rcu->cpumask consistent.
> > > It is *safe* even if we miss poking a cpu or two while
> > > forcing quiescent state in all CPUs. The worst that can happen
> > > is a delay in grace period. No correctness problem here.
> > > 
> > 
> > One question. What is preventing a CPU from clearing its mask after we
> > have checked whether it is online but before we have called into
> > smp_send_reschedule?
> 
> This is my concern as well.  Gautham, at which point in the above
> timeline is the offlining CPU marked DYING?  Before stop_machine(), right?

No :) The offlining CPU is marked DYING after stop_machine(), inside
take_cpu_down() which is the work we want to execute after stopping the
machine.

it's like
_cpu_down()
|
|-> stop_machine_run();
|   |
|   |-> stop_machine(); /* All CPUs irqs disabled. */
|   |
|   |-> take_cpu_down() --> sets state to CPU_DYING. disables irqs on
|   |				offlined cpu
|   |
|   |-> restart_machine(); /* All CPUs irqs reenabled */
|
|-> send_CPU_DEAD_notification.

The very fact that a thread is running with irqs disabled means that
stop_machine_run() thread cannot start executing the work it has been
assinged to execute. Because for Machine to be stopped, stop_machine()
needs to create n-1 high priority threads on n-1 online cpus, which will
disable interrupts and preemption, and stop the machine. Then it will
run the task assigned to it on the ith cpu, which in this case is the
cpu to be offlined.

So, it's the design of stop_machine() that's preventing someone
from updating the cpu_online_map while
force_quiescent_state() is performing the
cpu_is_online() check. Becase we always call force_quiescent_state()
with irqs disabled :)

> 
> If so, can't we just disable irqs, check for DYING or DEAD, and invoke
> smp_send_reschedule() only if not DYING or DEAD?



> 
> 							Thanx, Paul

-- 
Thanks and Regards
gautham
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ