lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110504132022.GA17294@htj.dyndns.org>
Date:	Wed, 4 May 2011 15:20:22 +0200
From:	Tejun Heo <tj@...nel.org>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Pekka Enberg <penberg@...nel.org>, Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jens Axboe <axboe@...nel.dk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	werner <w.landgraf@...ru>, "H. Peter Anvin" <hpa@...or.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Christoph Lameter <cl@...ux.com>
Subject: Re: [block IO crash] Re: 2.6.39-rc5-git2 boot crashs

Hello,

On Wed, May 04, 2011 at 03:00:37PM +0200, Thomas Gleixner wrote:
> On Wed, 4 May 2011, Tejun Heo wrote:
> > > > And that code runs with preemption enabled. So when the task gets
> > > > preempted _BEFORE_ it has actuallty written back the data, then the
> > > > race window is wide open.
> > 
> > Hmmm... if it's a race caused by preemtion enabled where it shouldn't
> > be, it's most likely the wrong type of this_cpu_cmpxchg_double() being
> > used in SLUB?  ie. __this_cpu_cmpxchg_double() where it should have
> > been this_cpu_cmpxchg_double()?  Christoph?
> 
> No, the problem is that ELAN prevents the cmpxchg8b, but keeps
> CONFIG_CMPXCHG_LOCAL=y which then results in the unprotected code for
> the following reason:
...
> So the question is whether CMPXCHG_LOCAL for x86 wants to depend on
> X86_CMPXCHG64.
> 
> The other solution is to use irqsafe_cpu_cmpxchg_double() instead of
> this_cpu_cmpxchg_double() in slub.c. 

I think this is the root cause.  CMPXCHG_LOCAL is an optimization
flag, indicating that the processor provides fast local cmpxchg, it
doesn't say anything about local synchronization properties and if the
code required irq exclusion, it should have used
irqsafe_cpu_cmpxchg_double() whether the processor supports it
natively or not, so there's the bug.  Pekka, can you please change the
offending cmpxchg_double() to irqsafe variant?

As for CMPXCHG_LOCAL being set spuriously, maybe introduce
CMPXCHG_DOUBLE_LOCAL?  I don't know.  It's pretty nasty to implement
different high-level code paths depending on CPU features.  We can't
even determine whether the feature will be actually available at
compile time.  But, then again, it might incur noticeable slowdown for
cases where the generic implementation is used.  Has anyone measured
the difference against before the whole this_cpu conversion?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ