lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <18585.64037.521673.362547@alkaid.it.uu.se>
Date:	Wed, 6 Aug 2008 21:23:17 +0200
From:	Mikael Pettersson <mikpe@...uu.se>
To:	Arkadiusz Miskiewicz <arekm@...en.pl>
Cc:	"Wahlig, Elsie" <elsie.wahlig@....com>, mikpe@...uu.se,
	linux-kernel@...r.kernel.org
Subject: Re: Opteron Rev E has a bug ... a locked instruction doesn't act as a read-acquire barrier (confirmed)

On Wed, 6 Aug 2008 19:13:34 +0200, Arkadiusz Miskiewicz wrote:
>On Wednesday 06 August 2008, Wahlig, Elsie wrote:
>> Your issue may be one that has been seen on 1st generation
>> AMD Opteron processor's with cpuid family 0Fh, cpuid model's
>> < 40h with the code sequence that performs a read-modify write
>> operation after acquiring a semaphore.
>
>Matches my hardware
>
>cpu family      : 15
>model           : 33
>
>>
>> The memory read ordering between a semaphore operation and a
>> subsequent read-modify-write instruction (an instruction which
>> uses the same memory location as both a source and destination)
>> may allow the read-modify-write instruction to operate on the
>> memory location ahead of the completion of the semaphore
>> operation and an erratum may occur.

Thanks for the detailed erratum description.

>I wonder why there was no official errata about this?

Indeed.

>> If you think your software is encountering this code sequence,
>> a work-around should be implemented by adding an LFENCE
>> instruction right after the semaphore, after a cpuid check.
>> The workaround's applied to OpenSolaris at
>> http://mail.opensolaris.org/pipermail/onnv-notify/2006-October/009080.ht
>> ml
>> and Google performance tools tool at
>> http://google-perftools.googlecode.com/svn-history/r48/trunk/src/base/at
>> omicops-internals-x86.cc
>> are suitable examples.
>> A list of the model numbers this issue may occur on is at
>> http://products.amd.com/en-us/downloads/AMD_Opteron_First_Generation_Ref
>> erence_101607.pdf.
>
>Would be better to fix the bug on kernel level if this is possible. Just=20
>someone with the knowledge needs to do this. Anyone interested?

In principle it's easy. We append a 3-byte nop to the lock-taking
instructions. We invent an AMD_MUTEX_BUG synthetic cpuid feature
bit and add boot-time code to detect it. We use the alternatives()
infrastructure to replace that nop with lfence at boot-time if
AMD_MUTEX_BUG is present.

I think the hardest part is locating all lock-taking code sequences.

Also I think I'll start by writing a user-space test program that
does a stress-test of the plain lock;rmw;unlobk sequence to see if
it can break it. (Locks/mutexes are also used in user-space.)

/Mikael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ