[<prev] [next>] [day] [month] [year] [list]
Message-ID: <470E603A.2080203@clusterfs.com>
Date: Thu, 11 Oct 2007 20:41:14 +0300
From: "Vladimir V. Saveliev" <vs@...sterfs.com>
To: Andreas Dilger <adilger@...sterfs.com>
CC: Valerie Henson <val.henson@...il.com>,
Theodore Ts'o <tytso@....edu>, Ric Wheeler <ric@....com>,
linux-ext4 <linux-ext4@...r.kernel.org>
Subject: Re: Threaded readahead strawman
Hello
Andreas Dilger wrote:
> On Oct 10, 2007 20:09 -0700, Valerie Henson wrote:
>> I need to get started on a mergeable version of the threaded readahead
>> patch for e2fsck. I intend for it to be compatible with Andreas'
>> sys_readahead() for block devices that support it. Here's a first
>> draft proposal - your thoughts? Note that it's not really that
>> anything is being read *ahead* per se, but that it's being read
>> simultaneously. Single-threaded readahead doesn't go any faster.
>
> We've been fiddling with this as well. I'd attach some patches but
> bugzilla is down as I write this :(. I also asked Vladimir (working on
> these patches) to forward them to you and the linux-ext4 mailing list.
>
The patch is attached.
If an application can foresee what it is going to read in future - it
can call io_channel_readahead for those data forehand. Even if
io_channel_readahead is called right before the data are actually needed
- it may make positive effect for multi disk devices because of parallel
reading.
For example, using io_channel_readahead to readahead coming inode tables
in done_group callback of ext2_inode_scan changes inode table scan in my
local quick test from 34 seconds to 26 (on 2 two ide disk raid0)
> We added a "readahead" method to the io_manager interface (no-op for
> Win/DOS) that can be used generically. This is currently done via
> posix_fadvise(POSIX_FADV_WILLNEED). We haven't done any multi-threading
> yet, but there is some hope that the block layer could sort it out?
> It would still be beneficial to have multiple user-space threads do
> the reading of the data, to get parallel memcpy() into userspace.
>
>> The major global parameters to the system are:
>>
>> 1. Optimal number of concurrent requests - number of underlying read
>> heads times some N of best number of outstanding requests. Default to
>> one.
>>
>> 2. Stripe size, or more generally which areas can be read concurrently
>> and which cannot.
>
> There are new parameters in the superblock (s_raid_stride and
> s_raid_stripe_width) but as yet only s_raid_stride is initialized by
> mke2fs. There is a library in xfstools (libdisk or somesuch) that
> can get a lot more disk geometry info and it would be good to leverage
> that for mke2fs also.
>
>> 3. Maximum memory to use. We have to keep the readahead from
>> outrunning the actual processing (though so far, that hasn't been a
>> problem) and having bits of our buffer cache kicked out before they
>> are used. This can be set to some percentage of available memory by
>> default.
>
> Agreed. I'd proposed in the past that fsck could call fsck.{fstype}
> with a parameter like --expected-memory to determine the expected memory
> usage of fsck.{fstype} based on the filesystem geometry, and it could
> also supply --max-memory so we don't have parallel fscks stomping on
> each other.
>
>> I see two main ways to do this: One is a straightforward offset plus
>> size, telling it what to read. The other is to make libext2 do all
>> the interpretation of ondisk format, and design the interface in terms
>> of kinds of metadata to read. Given that libext2 functions like
>> ext2fs_get_next_inode_full() should be aware of what's going on in
>> readahead. This argues for a metadata aware, in-library
>> implementation. Something like:
>>
>> /* Creates the threads, sets some variables. Returns a handle. */
>> handle = ext2fs_readahead_init(concurrent_requests, stripe_size, max_memory);
>>
>> /* Readahead inode tables and inode indirect blocks - can't really be
>> separated */
>> ext2fs_readahead_inodes(handle, fs);
>
> Well, there's something to be said for allowing the inode tables and
> corresponding bitmaps to be read in a single shot. Also, not all users
> require the indirect blocks, so I would make that an option.
>
>> /* Read the directory block list (pass 2) */
>> ext2fs_readahead_dblist(handle, fs);
>
> We're working on this as part of e2scan (in bug 13108 above), not sure if
> there is a patch available or not.
>
>> /* Read bitmaps (pass 5) */
>> ext2fs_readahead_bitmaps(handle, fs);
>
> This is a big one, because of the many seeks for small data read. Using
> the FLEX_BG feature (which is really a tiny kernel patch) could improve
> this many times.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>
>
View attachment "e2fsprogs-add-io_channel_readahead.patch" of type "text/x-patch" (5137 bytes)
Powered by blists - more mailing lists