linux-kernel - Re: readahead on directories

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100421220612.GD27575@shareable.org>
Date:	Wed, 21 Apr 2010 23:06:12 +0100
From:	Jamie Lokier <jamie@...reable.org>
To:	Phillip Susi <psusi@....rr.com>
Cc:	linux-fsdevel@...r.kernel.org,
	Linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: readahead on directories

Phillip Susi wrote:
> > ...for those things where AIO is supported at all.  The problem with
> > more complicated fs operations (like, say, buffered file reads and
> > directory operations) is you can't just put a request in a queue.
> 
> Unfortunately there aren't async versions of the calls that make
> directory operations, but aio_read() performs a buffered file read
> asynchronously just fine.

Why am I reading all over the place that Linux AIO only works with O_DIRECT?
Is it out of date? :-)

I admit I haven't even _tried_ buffered files with Linux AIO due to
the evil propaganda.

> > The most promising direction for AIO at the moment is in fact spawning
> > kernel threads on demand to do the work that needs a context, and
> > swizzling some pointers so that it doesn't look like threads were used
> > to userspace.
> 
> NO!  This is how aio was implemented at first and it was terrible.
> Context is only required because it is easier to write the code linearly
> instead of as a state machine.  It would be better for example, to have
> readahead() register a callback function to be called when the read of
> the indirect block completes, and the callback needs zero context to
> queue reads of the data blocks referred to by the indirect block.

To read an indirect block, you have to allocate memory: another
callback after you've slept waiting for memory to be freed up.

Then you allocate a request: another callback while you wait for the
request queue to drain.

Then you submit the request: that's the callback you mentioned,
waiting for the result.

But then triple, double, single indirect blocks: each of the above
steps repeated.

In the case of writing, another group of steps for bitmap blocks,
inode updates, and heaven knows how fiddly it gets with ordered
updates to the journal, synchronised with other writes.

Plus every little mutex / rwlock is another place where you need those
callback functions.  We don't even _have_ an async mutex facility in
the kernel.  So every user of a mutex has to be changed to use
waitqueues or something.  No more lockdep checking, no more RT
priority inheritance.

There are a _lot_ of places that can sleep on the way to a trivial
file I/O, and quite a lot of state to be past along the continuation
functions.

It's possible but by no means obvious that it's better.

I think people have mostly given up on that approach due to the how
much it complicates all the filesystem code, and how much goodness
there is in being able to call things which can sleep when you look at
all the different places.  It seemed like a good idea for a while.

And it's not _that_ certain that it would be faster at high
loads after all the work.

A compromise where just a few synchronisation points are made async is
ok.  But then it's a compromise... so you still need a multi-threaded
caller to keep the queues full in all situations.

> > Filesystem-independent readahead() on directories is out of the
> > question (except by using a kernel background thread, which is
> > pointless because you can do that yourself.)
> 
> No need for a thread.  readahead() does not need one for files, reading
> the contents of a directory should be no different.
>
> > Some filesystems have directories which aren't stored like a file's
> > data, and the process of reading the directory needs to work through
> > its logic, and needs a sleepable context to work in.  Generic page
> > reading won't work for all of them.
> 
> If the fs absolutely has to block that's ok, since that is no different
> from the way readahead() works on files, but most of the time it
> shouldn't have to and should be able to throw the read in the queue and
> return.

For specific filesystems, you could do it.  readahead() on directories
is not an unreasonable thing to add on.

Generically is not likely. It's not about blocking, it's about the
fact that directories don't always consist of data blocks on the store
organised similarly to a file.  For example NFS, CIFS, or (I'm not
sure), maybe even reiserfs/btrfs?

-- Jamie

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/