lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANP1eJGixMLX+cjtQe120QS76OkKFo4oJKrpvF4fzPHC7cwpSg@mail.gmail.com>
Date:	Mon, 15 Sep 2014 18:13:39 -0400
From:	Milosz Tanski <milosz@...in.com>
To:	Andreas Dilger <adilger@...ger.ca>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Christoph Hellwig <hch@...radead.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
	linux-aio@...ck.org, Mel Gorman <mgorman@...e.de>,
	Volker Lendecke <Volker.Lendecke@...net.de>,
	Tejun Heo <tj@...nel.org>, Jeff Moyer <jmoyer@...hat.com>
Subject: Re: [RFC PATCH 0/7] Non-blockling buffered fs read (page cache only)

Like you Andreas I would like to see a syscall that let you take
vectored positions (along with buffers and lengths). However, that's
not the problem I'm trying to solve with this patchset which is
non-blocking read for filesystem fds. The vectored position read
call(s) deserve another submission for a number of the usual reasons.

Best,
- Milosz

On Mon, Sep 15, 2014 at 5:33 PM, Andreas Dilger <adilger@...ger.ca> wrote:
> On Sep 15, 2014, at 2:20 PM, Milosz Tanski <milosz@...in.com> wrote:
>
>> This patcheset introduces an ability to perform a non-blocking read
>> from regular files in buffered IO mode. This works by only for those
>> filesystems that have data in the page cache.
>>
>> It does this by introducing new syscalls new syscalls readv2/writev2
>> and preadv2/pwritev2. These new syscalls behave like the network sendmsg,
>> recvmsg syscalls that accept an extra flag argument (O_NONBLOCK).
>
> It's too bad that we are introducing yet another new read/write
> syscall pair that only allow IO into discontiguous memory regions,
> but do not allow a single call to access discontiguous file regions
> (i.e. specify a separate file offset for each iov).
>
> Adding syscalls similar to preadv/pwritev() that could take a iovec
> that specified the file offset+length in addition to the memory address
> would allow efficient scatter-gather IO in a single syscall.  While
> that is less critical for local filesystems with small syscall latency,
> it is more important for network filesystems, or in the case of
> NVRAM-backed filesystems.
>
> Cheers, Andreas
>
>> It's a very common patern today (samba, libuv, etc..) use a large
>> threadpool to perform buffered IO operations. They submit the work
>> form another thread that performs network IO and epoll or other threads
>> that perform CPU work. This leads to increased latency for processing,
>> esp. in the case of data that's already cached in the page cache.
>>
>> With the new interface the applications will now be able to fetch the
>> data in their network / cpu bound thread(s) and only defer to a
>> threadpool if it's not there. In our own application (VLDB) we've
>> observed a decrease in latency for "fast" request by avoiding unnecessary
>> queuing and having to swap out current tasks in IO bound work threads.
>>
>> I have co-developed these changes with Christoph Hellwig, a whole lot
>> of his fixes went into the first patch in the series (were squashed
>> with his approval).
>>
>> I am going to post the perf report in a reply-to to this RFC.
>>
>> Christoph Hellwig (3):
>>  documentation updates
>>  move flags enforcement to vfs_preadv/vfs_pwritev
>>  check for O_NONBLOCK in all read_iter instances
>>
>> Milosz Tanski (4):
>>  Prepare for adding a new readv/writev with user flags.
>>  Define new syscalls readv2,preadv2,writev2,pwritev2
>>  Export new vector IO (with flags) to userland
>>  O_NONBLOCK flag for readv2/preadv2
>>
>> Documentation/filesystems/Locking |    4 +-
>> Documentation/filesystems/vfs.txt |    4 +-
>> arch/x86/syscalls/syscall_32.tbl  |    4 +
>> arch/x86/syscalls/syscall_64.tbl  |    4 +
>> drivers/target/target_core_file.c |    6 +-
>> fs/afs/internal.h                 |    2 +-
>> fs/afs/write.c                    |    4 +-
>> fs/aio.c                          |    4 +-
>> fs/block_dev.c                    |    9 ++-
>> fs/btrfs/file.c                   |    2 +-
>> fs/ceph/file.c                    |   10 ++-
>> fs/cifs/cifsfs.c                  |    9 ++-
>> fs/cifs/cifsfs.h                  |   12 ++-
>> fs/cifs/file.c                    |   30 +++++---
>> fs/ecryptfs/file.c                |    4 +-
>> fs/ext4/file.c                    |    4 +-
>> fs/fuse/file.c                    |   10 ++-
>> fs/gfs2/file.c                    |    5 +-
>> fs/nfs/file.c                     |   13 ++--
>> fs/nfs/internal.h                 |    4 +-
>> fs/nfsd/vfs.c                     |    4 +-
>> fs/ocfs2/file.c                   |   13 +++-
>> fs/pipe.c                         |    7 +-
>> fs/read_write.c                   |  146 +++++++++++++++++++++++++++++++------
>> fs/splice.c                       |    4 +-
>> fs/ubifs/file.c                   |    5 +-
>> fs/udf/file.c                     |    5 +-
>> fs/xfs/xfs_file.c                 |   12 ++-
>> include/linux/fs.h                |   16 ++--
>> include/linux/syscalls.h          |   12 +++
>> include/uapi/asm-generic/unistd.h |   10 ++-
>> mm/filemap.c                      |   34 +++++++--
>> mm/shmem.c                        |    6 +-
>> 33 files changed, 306 insertions(+), 112 deletions(-)
>>
>> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
> Cheers, Andreas
>
>
>
>
>



-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@...in.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ