lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 2 Mar 2009 01:06:51 +1100
From:	Nick Piggin <>
To:	"H. Peter Anvin" <>
Cc:	Arjan van de Ven <>,
	Andi Kleen <>,
	David Miller <>,,,,,
Subject: Re: [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache()

On Sunday 01 March 2009 12:40:51 H. Peter Anvin wrote:
> Arjan van de Ven wrote:
> > the reason that movntq and co are faster is because you avoid the
> > write-allocate behavior of the caches....
> >
> > the cache polluting part of it I find hard to buy for general use (as
> > this discussion shows)... that will be extremely hard to measure as
> > a real huge thing, while the WA part is like a 1.5x to 2x thing.
> Note that hardware *can* (which is not the same thing as hardware
> *will*) elide the write-allocate behavior.  We did that at Transmeta for
> rep movs and certain other instructions which provably filled in entire
> cache lines.  I haven't investigated if newer Intel CPUs do that in the
> "fast rep movs" case.

I would expect any high performance CPU these days to combine entries
in the store queue, even for normal store instructions (especially for
linear memcpy patterns). Isn't this likely to be the case?

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

Powered by blists - more mailing lists