linux-kernel - Re: read()/readv() only from page cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANP1eJHDQfcfrRw8JWzfBz34Q-CppN39iPXX+4BMig2k=ESRAQ@mail.gmail.com>
Date:	Fri, 5 Sep 2014 12:45:52 -0400
From:	Milosz Tanski <milosz@...in.com>
To:	Christoph Hellwig <hch@...radead.org>
Cc:	Mel Gorman <mgorman@...e.de>, LKML <linux-kernel@...r.kernel.org>,
	Volker Lendecke <Volker.Lendecke@...net.de>,
	Tejun Heo <tj@...nel.org>, linux-aio@...ck.org
Subject: Re: read()/readv() only from page cache

On Fri, Sep 5, 2014 at 12:32 PM, Christoph Hellwig <hch@...radead.org> wrote:
> On Fri, Sep 05, 2014 at 12:27:21PM -0400, Milosz Tanski wrote:
>> I would prefer a interface more like recv() where I can specify the
>> flag if I want blocking behavior for this read or not. Let me explain
>> why:
>>
>> In a VLDB like workload this would enable me to lower the latency of
>> common fast requests and. By fast requests I mean ones that do not
>> require much data, the data is cached, or there's a predictable read
>> pattern (read-ahead). Obviously it would be at the expense of the
>> latency of large/slow requests (they have to make 2 read calls, the
>> first one always EWOULDBLOCK) ... but in that case it doesn't matter
>> since the time to do actual IO would trump any kind of extra latency.
>
> This is another good suggestion.  I've actually heard people asking
> for allowing per-I/O flags for other uses cases.  The one I cane
> remember is applying O_DSYNC only for FUA writes on a SCSI target,
> the other one would be Samba again, as SMB allows per-I/O flags on
> the wire as well.
>
>> Essentially, it's using the kernel facilities (page cache) to help me
>> perform better (in a more predictable fashion). I would implement this
>> in our application tomorrow. It's frustrating that there is a similar
>> interface (recv* family) that I cannot use.
>>
>> I know there's been a bunch of attempts at buffered AIO and none of
>> them made it into the kernel. It would let me build a buffered AIO
>> implementation in user-space using a threadpool. And cached data would
>> not end up getting blocked behind other non-cached requests sitting in
>> the queue. I know there's other sources of blocking (locking, metadata
>> lookups) but direct AIO already suffers from these so I'm fine to
>> paper over that for now.
>
> Although I still think providing useful AIO at the kernel level would be
> better than having everyone reimplement it it still would be useful to
> allow people to sanely reimplement it.  If only to avoid the discussion
> about what API to use between the non-standard and not really that nice
> Linux io_submit and the utterly horrible Posix aio_ semantics.

Yeah, I would love for that to happen but I've been lurking and
following the non-blocking buffered AIO discussions and attempts on
lkml since about 2008 and the threads go back much further than that
about 12 years. I would take a much less ambitious syscall read/pread
syscall that gets me 90% of the way there and I can build the
remainder in user-space. It also has the nice side-effect of being
providing a not-horrible fallback for older/non-linux systems where
all IO goes into the thread pool (without the option to skip it).

-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@...in.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/