[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090423221711.GA30855@Krystal>
Date: Thu, 23 Apr 2009 18:17:11 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: Arkadiusz Miskiewicz <a.miskiewicz@...il.com>
Cc: Ingo Molnar <mingo@...e.hu>, Alan Cox <alan@...rguk.ukuu.org.uk>,
akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
mark.langsdorf@....com, "H. Peter Anvin" <hpa@...or.com>,
Andi Kleen <andi@...stfloor.org>, Avi Kivity <avi@...ranet.com>
Subject: Re: [patch 2/2] x86 amd fix cmpxchg read acquire barrier
* Arkadiusz Miskiewicz (a.miskiewicz@...il.com) wrote:
> On Thursday 23 of April 2009, Mathieu Desnoyers wrote:
> > * Ingo Molnar (mingo@...e.hu) wrote:
> > > * Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca> wrote:
> > > > " // Opteron Rev E has a bug in which on very rare occasions a locked
> > > > // instruction doesn't act as a read-acquire barrier if followed by a
> > > > // non-locked read-modify-write instruction. Rev F has this bug in
> > > > // pre-release versions, but not in versions released to customers,
> > > > // so we test only for Rev E, which is family 15, model 32..63
> > > > inclusive.
> > >
> > > Dunno. The fix looks a bit intrusive (emits a NOP even on good
> > > CPUs). Also, the text above says "not in versions released to
> > > customers".
> > >
> > > So unless there's an official erratum or reports in the field (not
> > > from early prototype systems shipped to developers) i'd not rush to
> > > apply it, just yet.
> >
> > Actually, Operon Rev E has this bug in the field (family 15, model
> > 32..64). Rev F only had the bug in pre-releases.
> >
> > But yes, it's bad that it drags so many code additions to something as
> > critical as cmpxchg. I start to think it might be better to just
> > disallow bringing up more than one CPU on these machines.
>
> That probably would be even worse than what we have now. This bug doesn't
> manifest too often in a noticeable way here (I have few such machines here,
> mostly 2 x dual core; once per few months mysql dies) and loosing 3 of 4 cores
> (or 1 cpu of 2; depends on what you mean) doesn't sound like fun.
>
Having silent data corruption does not sound like fun neither. Another
alternative, when we detect those CPUs, is to printk a warning telling :
"AMD Opteron family X model Y is known to corrupt data on SMP due"
"to incorrect cmpxchg instruction memory barriers. Please contact"
"AMD for more information."
And activate the "tainted" kernel flag. This way, we won't be bothered
trying to fix AMD bugs, and it will officially become AMD's problem.
Mathieu
> > Mathieu
>
>
> --
> Arkadiusz MiĆkiewicz PLD/Linux Team
> arekm / maven.pl http://ftp.pld-linux.org/
>
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists