lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 23 Apr 2009 18:17:11 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	Arkadiusz Miskiewicz <a.miskiewicz@...il.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Alan Cox <alan@...rguk.ukuu.org.uk>,
	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
	mark.langsdorf@....com, "H. Peter Anvin" <hpa@...or.com>,
	Andi Kleen <andi@...stfloor.org>, Avi Kivity <avi@...ranet.com>
Subject: Re: [patch 2/2] x86 amd fix cmpxchg read acquire barrier

* Arkadiusz Miskiewicz (a.miskiewicz@...il.com) wrote:
> On Thursday 23 of April 2009, Mathieu Desnoyers wrote:
> > * Ingo Molnar (mingo@...e.hu) wrote:
> > > * Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca> wrote:
> > > > " // Opteron Rev E has a bug in which on very rare occasions a locked
> > > >   // instruction doesn't act as a read-acquire barrier if followed by a
> > > >   // non-locked read-modify-write instruction.  Rev F has this bug in
> > > >   // pre-release versions, but not in versions released to customers,
> > > >   // so we test only for Rev E, which is family 15, model 32..63
> > > > inclusive.
> > >
> > > Dunno. The fix looks a bit intrusive (emits a NOP even on good
> > > CPUs). Also, the text above says "not in versions released to
> > > customers".
> > >
> > > So unless there's an official erratum or reports in the field (not
> > > from early prototype systems shipped to developers) i'd not rush to
> > > apply it, just yet.
> >
> > Actually, Operon Rev E has this bug in the field (family 15, model
> > 32..64). Rev F only had the bug in pre-releases.
> >
> > But yes, it's bad that it drags so many code additions to something as
> > critical as cmpxchg. I start to think it might be better to just
> > disallow bringing up more than one CPU on these machines.
> 
> That probably would be even worse than what we have now. This bug doesn't 
> manifest too often in a noticeable way here (I have few such machines here, 
> mostly 2 x dual core; once per few months mysql dies) and loosing 3 of 4 cores 
> (or 1 cpu of 2; depends on what you mean) doesn't sound like fun.
> 

Having silent data corruption does not sound like fun neither. Another
alternative, when we detect those CPUs, is to printk a warning telling :

"AMD Opteron family X model Y is known to corrupt data on SMP due"
"to incorrect cmpxchg instruction memory barriers. Please contact"
"AMD for more information."

And activate the "tainted" kernel flag. This way, we won't be bothered
trying to fix AMD bugs, and it will officially become AMD's problem.

Mathieu

> > Mathieu
> 
> 
> -- 
> Arkadiusz Miƛkiewicz        PLD/Linux Team
> arekm / maven.pl            http://ftp.pld-linux.org/
> 
> 

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ