linux-kernel - RE: Opteron Rev E has a bug ... a locked instruction doesn't act as a read-acquire barrier (confirmed)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CB2417C2B1152C41919D5B9DE58840FA72D8FC@sausexmb5.amd.com>
Date:	Wed, 6 Aug 2008 16:18:03 -0500
From:	"Wahlig, Elsie" <elsie.wahlig@....com>
To:	"Mikael Pettersson" <mikpe@...uu.se>,
	"Arkadiusz Miskiewicz" <arekm@...en.pl>
CC:	<linux-kernel@...r.kernel.org>
Subject: RE: Opteron Rev E has a bug ... a locked instruction doesn't act as a read-acquire barrier (confirmed)

 

Mikael Pettersson writes:
> 
> On Wed, 6 Aug 2008 19:13:34 +0200, Arkadiusz Miskiewicz wrote:
> >On Wednesday 06 August 2008, Wahlig, Elsie wrote:
> >> Your issue may be one that has been seen on 1st generation AMD 
> >> Opteron processor's with cpuid family 0Fh, cpuid model's < 
> 40h with 
> >> the code sequence that performs a read-modify write 
> operation after 
> >> acquiring a semaphore.
> >
> >Matches my hardware
> >
> >cpu family      : 15
> >model           : 33
> >
> >>
> >> The memory read ordering between a semaphore operation and a 
> >> subsequent read-modify-write instruction (an instruction 
> which uses 
> >> the same memory location as both a source and destination) 
> may allow 
> >> the read-modify-write instruction to operate on the memory 
> location 
> >> ahead of the completion of the semaphore operation and an 
> erratum may 
> >> occur.
> 
> Thanks for the detailed erratum description.
> 
> >I wonder why there was no official errata about this?
> 
> Indeed.

I don't know but I will see about getting it in there.

Elsie 

> 
> >> If you think your software is encountering this code sequence, a 
> >> work-around should be implemented by adding an LFENCE instruction 
> >> right after the semaphore, after a cpuid check.
> >> The workaround's applied to OpenSolaris at 
> >> 
> http://mail.opensolaris.org/pipermail/onnv-notify/2006-October/009080
> >> .ht
> >> ml
> >> and Google performance tools tool at
> >> 
> http://google-perftools.googlecode.com/svn-history/r48/trunk/src/base
> >> /at
> >> omicops-internals-x86.cc
> >> are suitable examples.
> >> A list of the model numbers this issue may occur on is at 
> >> 
> http://products.amd.com/en-us/downloads/AMD_Opteron_First_Generation_
> >> Ref
> >> erence_101607.pdf.
> >
> >Would be better to fix the bug on kernel level if this is possible. 
> >Just=20 someone with the knowledge needs to do this. Anyone 
> interested?
> 
> In principle it's easy. We append a 3-byte nop to the 
> lock-taking instructions. We invent an AMD_MUTEX_BUG 
> synthetic cpuid feature bit and add boot-time code to detect 
> it. We use the alternatives() infrastructure to replace that 
> nop with lfence at boot-time if AMD_MUTEX_BUG is present.
> 
> I think the hardest part is locating all lock-taking code sequences.
> 
> Also I think I'll start by writing a user-space test program 
> that does a stress-test of the plain lock;rmw;unlobk sequence 
> to see if it can break it. (Locks/mutexes are also used in 
> user-space.)
> 
> /Mikael
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/