lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090225082958.GA9322@elte.hu>
Date:	Wed, 25 Feb 2009 09:29:58 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Nick Piggin <nickpiggin@...oo.com.au>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Salman Qazi <sqazi@...gle.com>, davem@...emloft.net,
	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>, Andi Kleen <andi@...stfloor.org>
Subject: Re: [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache()


* Nick Piggin <nickpiggin@...oo.com.au> wrote:

> On Wednesday 25 February 2009 18:25:03 Ingo Molnar wrote:
> > * Nick Piggin <nickpiggin@...oo.com.au> wrote:
> > > On Wednesday 25 February 2009 02:52:34 Linus Torvalds wrote:
> > > > On Tue, 24 Feb 2009, Nick Piggin wrote:
> > > > > > it does make some kind of sense to try to avoid the noncached
> > > > > > versions for small writes - because small writes tend to be for
> > > > > > temp-files.
> > > > >
> > > > > I don't see the significance of a temp file. If the pagecache is
> > > > > truncated, then the cachelines remain dirty and so you can't avoid an
> > > > > eventual store back to RAM?
> > > >
> > > > No, because many small files end up being used as scratch-pads (think
> > > > shell script sequences etc), and get read back immediately again. Doing
> > > > non-temporal stores might just be bad simply because trying to play
> > > > games with caching may simply do the wrong thing.
> > >
> > > OK, for that angle it could make sense. Although as has been
> > > noted earlier, at this point of the copy, we don't have much
> > > idea about the length of the write passed into the vfs (and
> > > obviously will never know the higher level intention of
> > > userspace).
> > >
> > > I don't know if we can say a 1 page write is nontemporal, but
> > > anything smaller is temporal. And having these kinds of
> > > behavioural cutoffs I would worry will create strange
> > > performance boundary conditions in code.
> >
> > I agree in principle.
> >
> > The main artifact would be the unaligned edges around a bigger
> > write. In particular the tail portion of a big write will be
> > cached.
> >
> > For example if we write a 100,000 bytes file, we'll copy the
> > first 24 pages (98304 bytes) uncached, while the final 1696
> > bytes cached. But there is nothing that necessiates this
> > assymetry.
> >
> > For that reason it would be nice to pass down the total size of
> > the write to the assembly code. These are single-usage-site APIs
> > anyway so it should be easy.
> >
> > I.e. something like the patch below (untested). I've extended
> > the copy APIs to also pass down a 'total' size as well, and
> > check for that instead of the chunk 'len'. Note that it's
> > checked in the inlined portion so all the "total == len" special
> > cases will optimize out the new parameter completely.
> 
> This does give more information, not exactly all (it could be 
> a big total write with many smaller writes especially if the 
> source is generated on the fly and will be in cache, or if the 
> source is not in cache, then we would also want to do 
> nontemporal loads from there etc etc).

A big total write with many smaller writes should already be 
handled in this codepath, as long as it's properly iovec-ed.

We can do little about user-space doing stupid things as doing a 
big write as a series of many smaller-than-4K writes.

> > This should express the 'large vs. small' question 
> > adequately, with no artifacts. Agreed?
> 
> Well, no artifacts, but it still has a boundary condition 
> where one might cut from temporal to nontemporal behaviour.
> 
> If it is a *really* important issue, maybe some flags should 
> be incorporated into an extended API? It not, then I wonder if 
> it is important enough to add such complexity for?

the iozone numbers in the old commits certainly are convincing. 
The new numbers from Salman are convincing too - and his fix 
should preserve the iozone [large-write] numbers as well.

I.e. i think this is a reasonable compromise, it should handle 
all the sane cases.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ