[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20130527204651.GB21682@quack.suse.cz>
Date: Mon, 27 May 2013 22:46:51 +0200
From: Jan Kara <jack@...e.cz>
To: Milosz Tanski <milosz@...in.com>
Cc: linux-kernel@...r.kernel.org
Subject: Re: Nonblocking buffered AIO from userspace
Hello,
On Thu 23-05-13 16:49:49, Milosz Tanski wrote:
> I need some advice on the best way to accomplish non-blocking buffered
> disk IO from my user space application. Unlike some of the other
> database systems I'm trying to outsource as much work to the kernel as
> possible. I would prefer to avoid having to resolve to O_DIRECT and
> io_submit to fetch the data and having to reimplement the page /
> buffer cache & read ahead.
>
> The application is read heavy with occasional long running write jobs.
> Since I'm not too concerned about the performance on the write path I
> am able to run that work in threads and block.
>
> Current I'm mmaping the files, and the make the read path quite simple
> and is great for disk scans when my data set is stored in memory. When
> the data is not cached the performance becomes more unpredictable,
> esp. when I'm doing an indexed read (giant bitmap indexes). Here's how
> my IO path looks like:
>
> application <--> fscache (SSD) <--> cephfs <--> ceph cluster
>
> Ultimately what I'd like is a way to do non-blocking scatter gather IO
> from disk or page cache into my application. I'd like to be
> non-blocking because it often happens that I can do something useful
> while waiting on IO like uncompress indexes for another request that
> is waiting, process network IO., etc.
>
> With mmap my blocking is unpredictable and mlock() blocks and only
> lets me lock a range and not a vector of page ranges.
Maybe the API you are looking for is madvise(MADV_WILLNEED)? That forces
asynchronous readahead for the specified range.
Honza
--
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists