lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.1.00.0802190806330.7833@woody.linux-foundation.org>
Date:	Tue, 19 Feb 2008 08:20:38 -0800 (PST)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Pekka Enberg <penberg@...helsinki.fi>
cc:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
	Torsten Kaiser <just.for.lkml@...glemail.com>,
	Ingo Molnar <mingo@...e.hu>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Christoph Lameter <clameter@....com>
Subject: Re: Linux 2.6.25-rc2



On Tue, 19 Feb 2008, Pekka Enberg wrote:
> 
> Hmm. The barrier() in slab_free() looks fishy. The comment says it's
> there to make sure we've retrieved c->freelist before c->page but then
> it uses a _compiler barrier_ which doesn't affect the CPU and the
> reads may still be re-ordered... Not sure if that matters here though.

No, no. The comment says that it's purely there to serialize an 
*interrupt*, and as such, a compiler-only barrier is sufficient (or the 
comment is wrong).

Interrupts are "totally ordered" within a cpu (of course, in theory a CPU 
might have speculative work etc reordering, but the CPU also guarantees 
that interrupt acts _as_if_ it was exact), so a compiler barrier is 
sufficient.

Of course, if we're talking about interrupts on another CPU, that's a 
different issue, but the fact is, in that case it's not about interrupts 
any more (might as well be other code just running normally on another 
CPU), and a barrier doesn't help, it needs real locking.

So that barrier is fine per se. Of course, the whole code (and/or just the 
comment!) may be buggered, but any CPU SMP-aware barriers shouldn't be 
relevant.

What's much more likely to be an issue is simply the fact that since the 
fastpath now accesses the per-cpu freelist without any locking, if there 
is *any* sequence what-so-ever that does it from another CPU and assumes 
the old locking behaviour, the list will be corrupted. And from a quick 
look-through, I certainly cannot guarantee that isn't the case.

There's still a lot of cases that do direct assignments to "c->freelist" 
without using a guaranteed atomic sequence. They *should* be safe if it's 
guaranteed that 
 (a) they always run with interrupts disabled
AND
 (b) 'c' is _always_ the "current CPU" list

but I can't quickly see that guarantee for either.

I'd happily just revert this thing, but it would be really good to have 
confirmation that it seems to matter. But Torsten's partial bisection 
seems to say that the quicklist thing went into -mm before the crash even 
started.

So:
 - it might be something else entirely
 - it might still be the local cmpxchg, just Torsten didn't happen to 
   notice it until later.
 - it might still be the local cmpxchg, but something else changed its 
   patterns to actually make it start triggering.

and in general I don't think we should revert it unless we have stronger 
indications that it really is the problem (eg somebody finds the actual 
bug, or a reporter can confirm that it goes away when the local cmpxchg 
optimization is disabled).

		Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ