[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YfiUaJ59A3px+DqP@casper.infradead.org>
Date: Tue, 1 Feb 2022 02:01:12 +0000
From: Matthew Wilcox <willy@...radead.org>
To: NeilBrown <neilb@...e.de>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Jeff Layton <jlayton@...nel.org>,
Ilya Dryomov <idryomov@...il.com>,
Miklos Szeredi <miklos@...redi.hu>,
Trond Myklebust <trond.myklebust@...merspace.com>,
Anna Schumaker <anna.schumaker@...app.com>, linux-mm@...ck.org,
linux-nfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
ceph-devel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] fuse: remove reliance on bdi congestion
On Tue, Feb 01, 2022 at 10:00:23AM +1100, NeilBrown wrote:
> On Tue, 01 Feb 2022, Matthew Wilcox wrote:
> > On Mon, Jan 31, 2022 at 03:47:41PM +1100, NeilBrown wrote:
> > > On Mon, 31 Jan 2022, Matthew Wilcox wrote:
> > > > > +++ b/fs/fuse/file.c
> > > > > @@ -958,6 +958,8 @@ static void fuse_readahead(struct readahead_control *rac)
> > > > >
> > > > > if (fuse_is_bad(inode))
> > > > > return;
> > > > > + if (fc->num_background >= fc->congestion_threshold)
> > > > > + return;
> > > >
> > > > This seems like a bad idea to me. If we don't even start reads on
> > > > readahead pages, they'll get ->readpage called on them one at a time
> > > > and the reading thread will block. It's going to lead to some nasty
> > > > performance problems, exactly when you don't want them. Better to
> > > > queue the reads internally and wait for congestion to ease before
> > > > submitting the read.
> > > >
> > >
> > > Isn't that exactly what happens now? page_cache_async_ra() sees that
> > > inode_read_congested() returns true, so it doesn't start readahead.
> > > ???
> >
> > It's rather different. Imagine the readahead window has expanded to
> > 256kB (64 pages). Today, we see congestion and don't do anything.
> > That means we miss the async readahed opportunity, find a missing
> > page and end up calling into page_cache_sync_ra(), by which time
> > we may or may not be congested.
> >
> > If the inode_read_congested() in page_cache_async_ra() is removed and
> > the patch above is added to replace it, we'll allocate those 64 pages and
> > add them to the page cache. But then we'll return without starting IO.
> > When we hit one of those !uptodate pages, we'll call ->readpage on it,
> > but we won't do anything to the other 63 pages. So we'll go through a
> > protracted slow period of sending 64 reads, one at a time, whether or
> > not congestion has eased. Then we'll hit a missing page and proceed
> > to the sync ra case as above.
>
> Hmmm... where is all this documented?
> The entry for readahead in vfs.rst says:
>
> If the filesystem decides to stop attempting I/O before reaching the
> end of the readahead window, it can simply return.
>
> but you are saying that if it simply returns, it'll most likely just get
> called again. So maybe it shouldn't say that?
That's not what I'm saying at all. I'm saying that if ->readahead fails
to read the page, ->readpage will be called to read the page (if it's
actually accessed).
> What do other filesystems do?
> ext4 sets REQ_RAHEAD, but otherwise just pushes ahead and submits all
> requests. btrfs seems much the same.
> xfs uses iomp_readahead .. which again sets REQ_RAHEAD but otherwise
> just does a normal read.
>
> The effect of REQ_RAHEAD seems to be primarily to avoid retries on
> failure.
>
> So it seems that core read-ahead code it not set up to expect readahead
> to fail, though it is (begrudgingly) permitted.
Well, yes. The vast majority of reads don't fail.
> The current inode_read_congested() test in page_cache_async_ra() seems
> to be just delaying the inevitable (and in fairness, the comment does
> say "Defer...."). Maybe just blocking on the congestion is an equally
> good way to delay it...
I don't think we should _block_ for an async read request. We're in the
context of a process which has read a different page. Maybe what we
need is a readahead_abandon() call that removes the just-added pages
from the page cache, so we fall back to doing a sync readahead?
> I note that ->readahead isn't told if the read-ahead is async or not, so
> my patch will drop sync read-ahead on congestion, which the current code
> doesn't do.
Now that we have a readahead_control, it's simple to add that
information to it.
> So maybe this congestion tracking really is useful, and we really want
> to keep it.
>
> I really would like to see that high-level documentation!!
I've done my best to add documentation. There's more than before
I started.
Powered by blists - more mailing lists