[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <620724.1612907726@warthog.procyon.org.uk>
Date: Tue, 09 Feb 2021 21:55:26 +0000
From: David Howells <dhowells@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: dhowells@...hat.com, Matthew Wilcox <willy@...radead.org>,
Jeff Layton <jlayton@...hat.com>,
David Wysochanski <dwysocha@...hat.com>,
Anna Schumaker <anna.schumaker@...app.com>,
Trond Myklebust <trondmy@...merspace.com>,
Steve French <sfrench@...ba.org>,
Dominique Martinet <asmadeus@...ewreck.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
ceph-devel@...r.kernel.org, linux-afs@...ts.infradead.org,
linux-cachefs@...hat.com, CIFS <linux-cifs@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
"open list:NFS, SUNRPC, AND..." <linux-nfs@...r.kernel.org>,
v9fs-developer@...ts.sourceforge.net,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library
Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> > Yeah, I have trouble with the private2 vs fscache bit too. I've been
> > trying to persuade David that he doesn't actually need an fscache
> > bit at all; he can just increment the page's refcount to prevent it
> > from being freed while he writes data to the cache.
>
> Does the code not hold a refcount already?
AIUI, Willy wanted me to drop the refcount and rely on PG_locked alone during
I/O triggered by the new ->readahead() method, so when it comes to setting
PG_fscache after a successful read from the server, I don't hold any page refs
- the assumption being that the waits in releasepage and invalidatepage
suffice. If that isn't sufficient, I can make it take page refs on the pages
to be written out - that should be easy enough to do.
> Honestly, the fact that writeback doesn't take a refcount, and then
> has magic "if writeback is set, don't free" code in other parts of the
> VM layer has been a problem already, when the wakeup ended up
> "leaking" from a previous page to a new allocation.
>
> I very much hope the fscache bit does not make similar mistakes,
> because the rest of the VM will _not_ have special "if fscache is set,
> then we won't do X" the way we do for writeback.
The VM can't do that because PG_private_2 might not be being used for
PG_fscache. It does, however, treat PG_private_2 like PG_private when
triggering calls to releasepage and invalidatepage.
> So I think the fscache code needs to hold a refcount regardless, and
> that the fscache bit is set the page has to have a reference.
>
> So what are the current lifetime rules for the fscache bit?
It depends which 'current' you're referring to.
The old fscache I/O API (ie. what's upstream) - in which PG_fscache is set on
a page to note that fscache knows about the page - does not keep a separate
ref on such pages.
The new fscache I/O API simplifies things. With that, pages are only known
about for the duration of a write to the cache. I've tried to analogise the
way PG_writeback works[*], including waiting for it in places like
invalidation, releasepage, page_mkwrite (though in the netfs, not the core VM)
as it may represent DMA.
Note that with the new I/O API, fscache and cachefiles know nothing about the
PG_fscache bit or netfs pages; they just deal with an iov_iter and a
completion function. Dealing with PG_fscache is done by the netfs and the new
netfs helper lib.
[*] Though I see that 073861ed77b6b made a change to end_page_writeback() for
an issue that probably affects unlock_page_fscache() too[**].
[**] This may mean that both PG_fscache and PG_writeback need to hold a ref on
the page.
David
Powered by blists - more mailing lists