lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150323132133.GD5718@linux.vnet.ibm.com>
Date:	Mon, 23 Mar 2015 06:21:33 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Russell King - ARM Linux <linux@....linux.org.uk>
Cc:	Nicolas Pitre <nico@...xnic.net>,
	Mark Rutland <mark.rutland@....com>,
	Krzysztof Kozlowski <k.kozlowski@...sung.com>,
	Arnd Bergmann <arnd@...db.de>,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@...sung.com>,
	Catalin Marinas <catalin.marinas@....com>,
	Stephen Boyd <sboyd@...eaurora.org>,
	linux-kernel@...r.kernel.org, Will Deacon <will.deacon@....com>,
	linux-arm-kernel@...ts.infradead.org,
	Marek Szyprowski <m.szyprowski@...sung.com>
Subject: Re: [PATCH v2] ARM: Don't use complete() during __cpu_die

On Mon, Mar 23, 2015 at 12:55:45PM +0000, Russell King - ARM Linux wrote:
> On Sun, Mar 22, 2015 at 04:30:57PM -0700, Paul E. McKenney wrote:
> > On Wed, Feb 25, 2015 at 05:05:05PM -0800, Paul E. McKenney wrote:
> > > On Wed, Feb 25, 2015 at 03:16:59PM -0500, Nicolas Pitre wrote:
> > > > On Wed, 25 Feb 2015, Nicolas Pitre wrote:
> > > > 
> > > > > On Wed, 25 Feb 2015, Russell King - ARM Linux wrote:
> > > > > 
> > > > > > We could just use the spin-and-poll solution instead of an IPI, but
> > > > > > I really don't like that - when you see the complexity needed to
> > > > > > re-initialise it each time, it quickly becomes very yucky because
> > > > > > there is no well defined order between __cpu_die() and __cpu_kill()
> > > > > > being called by the two respective CPUs.
> > > > > > 
> > > > > > The last patch I saw doing that had multiple bits to indicate success
> > > > > > and timeout, and rather a lot of complexity to recover from failures,
> > > > > > and reinitialise state for a second CPU going down.
> > > > > 
> > > > > What about a per CPU state?  That would at least avoid the need to 
> > > > > serialize things across CPUs.  If only one CPU may write its state, that 
> > > > > should eliminate the need for any kind of locking.
> > > > 
> > > > Something like the following?  If according to $subject it is the 
> > > > complete() usage that has problems, then this replacement certainly has 
> > > > it removed while keeping things simple.  And I doubt CPU hotplug is 
> > > > performance critical so a simple polling is certainly good enough.
> > > 
> > > For whatever it is worth, I am proposing the patch below for common code.
> > > Works on x86.  (Famous last words...)
> > 
> > So I am intending to submit these changes to the upcoming merge window.
> > Do you guys have something in place for ARM?
> 
> I've rather dropped the ball on this (sorry, I've had wisdom teeth out,
> with a week long recovery, and then had to catch back up with mail...).

Ouch!!!

> Looking at this patch, what advantage does using atomic_t give us?
> For example:
> 
> > > +int cpu_check_up_prepare(int cpu)
> > > +{
> > > +	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) {
> > > +		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
> > > +		return 0;
> > > +	}
> > > +
> > > +	switch (atomic_read(&per_cpu(cpu_hotplug_state, cpu))) {
> > > +
> > > +	case CPU_POST_DEAD:
> > > +
> > > +		/* The CPU died properly, so just start it up again. */
> > > +		atomic_set(&per_cpu(cpu_hotplug_state, cpu), CPU_UP_PREPARE);
> > > +		return 0;
> 
> What if the cpu_hotplug_state for the CPU changes between reading it
> testing its value, and then writing it, or do we guarantee that it
> can't change other than by this function here?  If it can't change,
> then what's the point of using atomic_t for this?

It indeed cannot change -here-, but there are other situations where
more than one CPU can be attempting to change it at the same time.
The reason that it cannot change here is that the variable has the
value that indicates that the previous offline completed within the
timeout period, so there are no outstanding writes.

When going offline, the outgoing CPU might take so long leaving that
the surviving CPU times out, thus both CPUs write to the variable at
the same time, which by itself requires atomics.  If this seems
unlikely, consider virtualized environments where the hypervisor
might preempt the guest OS.

Make sense?

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ