lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 27 Mar 2015 12:38:48 -0400
From:	Milosz Tanski <milosz@...in.com>
To:	Jeremy Allison <jra@...ba.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Hellwig <hch@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	"linux-aio@...ck.org" <linux-aio@...ck.org>,
	Mel Gorman <mgorman@...e.de>,
	Volker Lendecke <Volker.Lendecke@...net.de>,
	Tejun Heo <tj@...nel.org>, Jeff Moyer <jmoyer@...hat.com>,
	"Theodore Ts'o" <tytso@....edu>, Al Viro <viro@...iv.linux.org.uk>,
	Linux API <linux-api@...r.kernel.org>,
	Michael Kerrisk <mtk.manpages@...il.com>,
	linux-arch@...r.kernel.org, Dave Chinner <david@...morbit.com>
Subject: Re: [PATCH v7 0/5] vfs: Non-blockling buffered fs read (page cache only)

On Fri, Mar 27, 2015 at 11:58 AM, Jeremy Allison <jra@...ba.org> wrote:
> On Fri, Mar 27, 2015 at 02:01:59AM -0700, Andrew Morton wrote:
>> On Fri, 27 Mar 2015 01:48:33 -0700 Christoph Hellwig <hch@...radead.org> wrote:
>>
>> > On Fri, Mar 27, 2015 at 01:35:16AM -0700, Andrew Morton wrote:
>> > > fincore() doesn't have to be ugly.  Please address the design issues I
>> > > raised.  How is pread2() useful to the class of applications which
>> > > cannot proceed until all data is available?
>> >
>> > It actually makes them work correctly?  preadv2( ..., DONTWAIT) will
>> > return -EGAIN, which causes them to bounce to the threadpool where
>> > they call preadv(...).
>>
>> (I assume you mean RWF_NONBLOCK)
>>
>> That isn't how pread2() works.  If the leading one or more pages are
>> uptodate, pread2() will return a partial read.  Now what?  Either the
>> application reads the same data a second time via the worker thread
>> (dumb, but it will usually be a rare case)
>
> The problem with the above is that we can't tell the difference
> between pread2() returning a short read because the pages are not
> in cache, or because someone truncated the file. So we need some
> way to differentiate this.
>
> My preference from userspace would be for pread2() to return
> EAGAIN if *all* the data requested is not available (where
> 'all' can be less than the size requested if the file has
> been truncated in the meantime).
>
> So:
>
> ret = pread2(fd, buf, size_wanted, RWF_NONBLOCK)
>
> if (ret == -1) {
>         if (errno == EAGAIN) {
>                 goto threadpool...
>         }
>         .. real error..
> }
>
> if (ret == size_wanted) {
>         .. normal read, file not truncated...
> }
>
> if (ret < size_wanted) {
>         .. file was truncated..
> }
>
> The thing I want to avoid is the case where
> ret < size_wanted means only part of the file
> is in cache.

I very much like the short read behavior. It lets you overlap some CPU
work partial data (like TLS and then sticking it network output
buffer) with waiting for the test of the data (enequed in the thread
pool).

Short reads are the current behavior, if you call preadv2 a second
time around at EOF it'll return 0 instead of EWOULDBLOCK today. I
actually test for this in the preadv2 test in xfstest here:
https://github.com/mtanski/xfstests/commit/688db24c292999c81ee17caf2b61fe8cf7bb3cd6#diff-114416ea98ce29dde3b5b3d145afbd2bR81.

There's one caveat, that it's possible to get EWOULDBLOCK when reading
at end of file if the file metadata is not paged in.

-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@...in.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ