linux-kernel - Re: [RFC][PATCH] mips: Fix arch_spin

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1601200131240.5958@tp.orcam.me.uk>
Date:	Wed, 27 Jan 2016 09:57:24 +0000
From:	"Maciej W. Rozycki" <macro@...tec.com>
To:	David Daney <ddaney@...iumnetworks.com>
CC:	Måns Rullgård <mans@...sr.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ralf Baechle <ralf@...ux-mips.org>,
	<linux-kernel@...r.kernel.org>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>,
	Will Deacon <will.deacon@....com>,
	<torvalds@...ux-foundation.org>, <boqun.feng@...il.com>
Subject: Re: [RFC][PATCH] mips: Fix arch_spin_unlock()

On Thu, 12 Nov 2015, David Daney wrote:

> > > Certainly we can load up the code with "SYNC" all over the place, but
> > > it will kill performance on SMP systems.  So, my vote would be to make
> > > it as light weight as possible, but no lighter.  That will mean
> > > inventing the proper barrier primitives.
> > 
> > It seems to me that the proper barrier here is a "SYNC 18" aka
> > SYNC_RELEASE instruction, at least on CPUs that implement that variant.

 For the record, we've had "cooked" aliases in the toolchain for a short 
while now -- since Sep 2010 or binutils 2.21 -- so for readability you can 
actually use `sync_release' in your source code rather than obscure `sync 
18' (of course you could define a macro instead, but there's no need now), 
and disassembly will show the "cooked" mnemonic too.

 Although Documentation/Changes still lists binutils 2.12 as the minimum, 
so perhaps using macros is indeed the way to go now, at least for the time 
being.

> Yes, unfortunately very few CPUs implement that.  It is an instruction that
> MIPS invented only recently, so older CPUs need a different solution.

 Hmm, it looks to me we might actually be safe, although as often the 
situation seems more complicated than it had to be.

 Conventional wisdom says that SYNC as the ultimate ordering barrier, aka 
SYNC 0, was added with the MIPS II ISA, with a provision to define less 
restrictive barriers in the future in a backward compatible manner, by the 
means of undefined (any non-zero at the time) barrier types defaulting to 
0. Early references seem to have been lost in the mist of time, however a 
few legacy MIPS ISA documents remain, e.g. the MIPS IV ISA document 
says[1]:

"The stype values 1-31 are reserved; they produce the same result as the 
value zero."

making it clear that non-zero arguments will work as expected, albeit 
perhaps with a somewhat heavyweight effect.  But there's sometimes no 
other way.

 This seems more ambiguous with earlier documentation available, e.g. the 
MIPS R4000 processor manual, which omits the mention of `stype' altogether 
and merely defines a single SYNC instruction encoding with all-zeros 
across bits 25:6 of the instruction word, among which `stype' normally 
lives[2].  This appears the same with other MIPS III processor 
documentation (e.g. IDT 79RV4700[3]).  However I'm fairly sure all these 
simply did not bother decoding SYNC beyond the major and minor opcode, so 
again SYNC 0 semantics should be held across the more recently defined 
variants.  I could this actually sometime with an R4000 class processor.

 Modern MIPS architecture specifications started with the same definition 
as the MIPS IV ISA had, rev. 0.95 documents still stated[4][5]:

"The stype values 1-31 are reserved; they produce the same result as the 
value zero."

Unfortunately the requirement got weakened later on, rev. 1.00 
architecture specifications now stated[6][7]:

"The stype values 1-31 are reserved for future extensions to the 
architecture.  A value of zero will always be defined such that it 
performs all defined synchronization operations.  Non-zero values may be 
defined to remove some synchronization operations.  As such, software 
should never use a non-zero value of the stype field, as this may 
inadvertently cause future failures if non-zero values remove 
synchronization operations."

I think the intent was not to break backwards compatibility, and certainly 
anyone who looked at one of the earlier documents might have realised that 
implementing non-zero SYNC operations, that do not have a vendor-specific 
semantics, as aliases to SYNC 0 rather than NOP or RI triggers would be a 
good idea.  However implementers may not have been able to infer that from 
reading the lone current revision of architecture documents.

 It was only with rev. 2.60 of architecture specifications that along new 
SYNC operations the requirement for undefined SYNC operations to behave as 
SYNC 0 was put in the text back in an unambiguous form[8][9]:

"A stype value of zero will always be defined such that it performs the 
most complete set of synchronization operations that are defined.  This 
means stype zero always does a completion barrier that affects both loads 
and stores preceding the SYNC instruction and both loads and stores that 
are subsequent to the SYNC instruction.  Non-zero values of stype may be 
defined by the architecture or specific implementations to perform 
synchronization behaviors that are less complete than that of stype zero. 
If an implementation does not use one of these non-zero values to define a 
different synchronization behavior, then that non-zero value of stype must 
act the same as stype zero completion barrier.  This allows software 
written for an implementation with a lighter-weight barrier to work on 
another implementation which only implements the stype zero completion 
barrier."

This definition has then been retained in the architecture specification 
throughout now.

 Overall I think it should be safe after all to use SYNC_RELEASE and other 
modern lightweight barriers uncondtionally under the assumption that 
architecture was meant to remain backward compatible.  Even though it 
might be possible someone would implement unusual semantics for the then 
undefined `stype' values, I highly doubt it as it would be extra effort 
and hardware logic space for no gain.  We could try and reach architecture 
overseers to double-check whether the `stype' encodings, somewhat 
irregularly distributed, were indeed defined in a manner so as not to 
clash with values implementers chose to use before rev. 2.61 of the 
architecture specification.

 Then, for performance reasons, if there were indeed any pre-2.61 
implementations which define vendor-specific lightweight barriers, then we 
could replace the standard encoding embedded in the kernel binary, by 
run-time patching the image up at bootstrap, based on the processor type 
identified in cpu-probe.c.  Likewise, for implementations that are weakly 
enough ordered to define SYNC as an actual barrier rather than a different 
encoding of NOP (e.g. the NEC VR4100 is strongly ordered and implements 
SYNC as a NOP[10]), yet strongly enough ordered for some of the other 
barriers not to be necessary, the respective barriers could be patched up 
with NOPs.

 For I/O ordering and completion barriers, mentioned earlier in the 
thread, on the MIPS target we need a different set of primitives, as some 
early incarnations of the architecture were weakly ordered in this respect 
in a somewhat unusual way, at least to some.  Only reads were strongly 
ordered in all cases.  However writes could bypass each other, could be 
merged, or could be removed altogether (preempted with a later one).  
Then reads could bypass writes or read back a pending write.  None of this 
matters for true memory, however it certainly does for I/O, where side 
effects exist or timely completion is required.

 I have previously outlined what needs to be implemented in this area, as 
recorded here: 
<http://www.linux-mips.org/cgi-bin/mesg.cgi?a=linux-mips&i=alpine.LFD.2.11.1404280048540.11598%40eddie.linux-mips.org>, 
to unify the uncoordinated platform attempts made so far.  I still have it 
on my to-do list, hopefully to get at soon.

References:

[1]  "MIPS IV Instruction Set", MIPS Technologies, Inc., Revision 3.2, By
     Charles Price, September, 1995, p. A-161
     <http://techpubs.sgi.com/library/manuals/2000/007-2597-001/pdf/007-2597-001.pdf>

[2]  Joe Heinrich: "MIPS R4000 Microprocessor User's Manual", Second
     Edition, MIPS Technologies, Inc., April 1, 1994, p. A-161
     <http://techpubs.sgi.com/library/manuals/2000/007-2489-001/pdf/007-2489-001.pdf>

[3]  "IDT79RV4700 RISC Processor Hardware User's Manual", Integrated 
     Device Technology, Inc., Version 2.1, December 1997, p. A-130

[4]  "MIPS32 Architecture For Programmers, Volume II: The MIPS32 
     Instruction Set", MIPS Technologies, Inc., Document Number: MD00086, 
     Revision 0.95, March 12, 2001, p. 215

[5]  "MIPS64 Architecture For Programmers, Volume II: The MIPS64 
     Instruction Set", MIPS Technologies, Inc., Document Number: MD00087, 
     Revision 0.95, March 12, 2001, p. 300

[6]  "MIPS32 Architecture For Programmers, Volume II: The MIPS32
     Instruction Set", MIPS Technologies, Inc., Document Number: MD00086,
     Revision 1.00, August 29, 2002, p. 209

[7]  "MIPS64 Architecture For Programmers, Volume II: The MIPS64
     Instruction Set", MIPS Technologies, Inc., Document Number: MD00087,
     Revision 1.00, August 29, 2002, p. 295

[8]  "MIPS32 Architecture For Programmers, Volume II: The MIPS32
     Instruction Set", MIPS Technologies, Inc., Document Number: MD00086,
     Revision 2.60, June 25, 2008, p. 250

[9]  "MIPS64 Architecture For Programmers, Volume II: The MIPS64
     Instruction Set", MIPS Technologies, Inc., Document Number: MD00087,
     Revision 2.60, June 25, 2008, p. 317

[10] "VR4100 64-BIT MICROPROCESSOR USER'S MANUAL (PRELIMINARY)", NEC 
     Corporation, Document No. U10050EJ3V0UM00 (3rd edition), January 
     1996, p. 413

  Maciej