linux-kernel - Re: [patch] x86, mm: pass in 'total' to __copy_from_user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090303090252.GC11484@elte.hu>
Date:	Tue, 3 Mar 2009 10:02:52 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Nick Piggin <nickpiggin@...oo.com.au>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Arjan van de Ven <arjan@...radead.org>,
	Andi Kleen <andi@...stfloor.org>,
	David Miller <davem@...emloft.net>, sqazi@...gle.com,
	linux-kernel@...r.kernel.org, tglx@...utronix.de
Subject: Re: [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache()

* Nick Piggin <nickpiggin@...oo.com.au> wrote:

> On Tuesday 03 March 2009 08:16:23 Linus Torvalds wrote:
> > On Mon, 2 Mar 2009, Nick Piggin wrote:
> > > I would expect any high performance CPU these days to combine entries
> > > in the store queue, even for normal store instructions (especially for
> > > linear memcpy patterns). Isn't this likely to be the case?
> >
> > None of this really matters.
> 
> Well that's just what I was replying to. Of course 
> nontemporal/uncached stores can't avoid cc operations either, 
> but somebody was hoping that they would avoid the 
> write-allocate / RMW behaviour. I just replied because I think 
> that modern CPUs can combine stores in their store queues to 
> get the same result for cacheable stores.
> 
> Of course it doesn't make it free especially if it is a cc 
> protocol that has to go on the interconnect anyway. But 
> avoiding the RAM read is a good thing anyway.

Hm, why do you assume that there is a RAM read? A sufficiently 
advanced x86 CPU will have good string moves with full cacheline 
transfers - removing partial cachelines and removing the need 
for the physical read.

The cacheline still has to be flushed/queried/transferred across 
the cc domain according to the cc protocol in use, to make sure 
there's no stale cached data elsewhere, but that is not a RAM 
read and in the common case (when the address is not present in 
any cache) it can be quite cheap.

The only cost is the dirty cacheline that is left around that 
increases the flush-out pressure on the cache. (the CPU might 
still be smart about this detail too so in practice a lot of 
write-allocates might not even cause that much trouble.)

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/