lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 3 Jun 2008 19:46:44 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Nick Piggin <nickpiggin@...oo.com.au>
cc:	Trent Piepho <tpiepho@...escale.com>,
	Russell King <rmk+lkml@....linux.org.uk>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	David Miller <davem@...emloft.net>, linux-arch@...r.kernel.org,
	scottwood@...escale.com, linuxppc-dev@...abs.org,
	alan@...rguk.ukuu.org.uk, linux-kernel@...r.kernel.org
Subject: Re: MMIO and gcc re-ordering issue



On Wed, 4 Jun 2008, Nick Piggin wrote:
> 
> Actually, according to the document I am looking at (the AMD one), a UC
> store may pass a previous WC store.

Hmm. Intel arch manyal, Vol 3, 10.3 (page 10-7 in my version):

  "If the WC bufer is partially filled, the writes may be delayed until 
   the next ocurrence of a serializing event; such as, an SFENCE or MFENCE 
   instruction, CPUID execution, a read or write to uncached memory, ..."

Any typos mine.

Anyway, Intel certainly seems to document that WC memory is serialized by 
any access to UC memory.

But yes, I can well imagine that AMD is different, and I also heartily 
would recommend rather being safe than sorry. Putting an explicit memory 
barrier in between those accesses when you know it might make a difference 
is just a good idea. 

But basically, as far as I know the thing was designed to be invisible to 
old software: that is the whole idea behind WC memory. So the design was 
certainly intended to be that you can generally mark a framebuffer-like 
structure WC without any software _ever_ caring, as long as you keep all 
control ports in UC memory.

Of course, because burst writes from the WC buffer are <i>so</i> much more 
efficient on the PCI bus than dribbling them out one write at a time, it 
didn't take long before all the graphics cards etc wanted to <i>also</i> 
mark their command queues as WC memory, so that you could burst out the 
commands to the ring buffers as fast as possible. So now you have both 
your frame buffer *and* your command buffers mapped WC, and now ordering 
really has to be ensured in software if you access both.

[ And then there are the crazy people who mark *main memory* as WC, 
  because they don't want to pollute the cache with all the data, and then 
  you have the issue of cache coherency etc crap. Which only gets worse 
  with SMP, especially if one processor thinks it has part of memory 
  exclusively cached, and another one - or even the same one, 
  through another aliasign address - ignores the cache protocol.

  And you now get unhappy CPU's that think that there is a bug in the 
  cache protocol and they get machine check faults.

  So what started out as a "we can do accesses to the frame buffer more 
  efficiently without anybody ever even having to know or care" has 
  turned into a whole nightmare of people using it for other things, and 
  then you very much _do_ have to care! ]

And it doesn't surprise me if AMD then didn't get exactly the same 
rules. 

Oh, well.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ