[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <CA2BF8AB-6F61-4856-8B0E-9D954BDEB243@dilger.ca>
Date: Fri, 25 Jul 2014 23:27:19 -0600
From: Andreas Dilger <adilger@...ger.ca>
To: Abhi Das <adas@...hat.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
"cluster-devel@...hat.com" <cluster-devel@...hat.com>
Subject: Re: [RFC PATCH 0/2] dirreadahead system call
Is there a time when this doesn't get called to prefetch entries in
readdir() order? It isn't clear to me what benefit there is of returning
the entries to userspace instead of just doing the statahead implicitly
in the kernel?
The Lustre client has had what we call "statahead" for a while,
and similar to regular file readahead it detects the sequential access
pattern for readdir() + stat() in readdir() order (taking into account if ".*"
entries are being processed or not) and starts fetching the inode
attributes asynchronously with a worker thread.
This syscall might be more useful if userspace called readdir() to get
the dirents and then passed the kernel the list of inode numbers
to prefetch before starting on the stat() calls. That way, userspace
could generate an arbitrary list of inodes (e.g. names matching a
regexp) and the kernel doesn't need to guess if every inode is needed.
As it stands, this syscall doesn't help in anything other than readdir
order (or of the directory is small enough to be handled in one
syscall), which could be handled by the kernel internally already,
and it may fetch a considerable number of extra inodes from
disk if not every inode needs to be touched.
Cheers, Andreas
> On Jul 25, 2014, at 11:37, Abhi Das <adas@...hat.com> wrote:
>
> This system call takes 3 arguments:
> fd - file descriptor of the directory being readahead
> *offset - offset in dir from which to resume. This is updated
> as we move along in the directory
> count - The max number of entries to readahead
>
> The syscall is supposed to read upto 'count' entries starting at
> '*offset' and cache the inodes corresponding to those entries. It
> returns a negative error code or a positive number indicating
> the number of inodes it has issued readaheads for. It also
> updates the '*offset' value so that repeated calls to dirreadahead
> can resume at the right location. Returns 0 when there are no more
> entries left.
>
> Abhi Das (2):
> fs: Add dirreadahead syscall and VFS hooks
> gfs2: GFS2's implementation of the dir_readahead file operation
>
> arch/x86/syscalls/syscall_32.tbl | 1 +
> arch/x86/syscalls/syscall_64.tbl | 1 +
> fs/gfs2/Makefile | 3 +-
> fs/gfs2/dir.c | 49 ++++++---
> fs/gfs2/dir.h | 15 +++
> fs/gfs2/dir_readahead.c | 209 +++++++++++++++++++++++++++++++++++++++
> fs/gfs2/file.c | 2 +
> fs/gfs2/main.c | 10 +-
> fs/gfs2/super.c | 1 +
> fs/readdir.c | 49 +++++++++
> include/linux/fs.h | 3 +
> 11 files changed, 328 insertions(+), 15 deletions(-)
> create mode 100644 fs/gfs2/dir_readahead.c
>
> --
> 1.8.1.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists