lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANP1eJGE=vQBdLnMBFPcTWrvh+dUA-UCK=ugO-tc2KN4LWLPSA@mail.gmail.com>
Date:	Thu, 24 Jul 2014 22:36:33 -0400
From:	Milosz Tanski <milosz@...in.com>
To:	Mel Gorman <mgorman@...e.de>
Cc:	LKML <linux-kernel@...r.kernel.org>
Subject: read()/readv() only from page cache

Mel,

I've been following your recent work with the postgres folks to
improve the kernel for postgres like workloads (which can really help
all database like loads).

After spending some time of my own fighting similar problems I figured
I'd reach out to see if there's something that can be done that can
make my use case easier. I was wondering if there is a read family
syscall that allows me to read from a file descriptor only if the data
is in the page cache (or only the portion of the data is in the page
cache).

The way my userspace application (database like system) is divided is
three kinds of threads. There's threads for dealing with processing of
data and IO threads (mostly for reading data). There's also threads
for dealing with networking (epoll) but that's not interesting.

What I would like to be able to do is a issue a read call in the
processing thread to get more data ... if it exists in the page cache.
If it doesn't then I would end up queuing that work to the IO threads.
Today as it stands I always have to queue up the work to the IO
threads and I end up paying for the message passing (and
synchronization) for case where it's a simple page cache to userspace
buffer memcpy. Add kernel readahead to my example and it's a pretty
big win.

I'm not the only person who laments this kind of facility. Other folks
have also been frustrated by lack of being able to tell if this read
will block or not.
http://www.1024cores.net/home/scalable-architecture/parallel-disk-io/the-solution

The sad part is that we do have similar syscall that handles none-file
fds like recvmsg() where you can specify O_NOBLOCK and have it return
if there's no data in the buffer. Sadly it doesn't work for regular
files.

I understand that there is a mincore() syscall but in this case it's
not useful since it requires an extra syscall and

Is there any kind of facility / solution for my problem that I can
leverage in the Linux kernel? Linus is always adamant about working
with the page cache versus working against the page cache and in this
case that's exactly what I'm trying to do here.

-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz@...in.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ