[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231126184141.GF38156@ZenIV>
Date: Sun, 26 Nov 2023 18:41:41 +0000
From: Al Viro <viro@...iv.linux.org.uk>
To: Gabriel Krisman Bertazi <gabriel@...sman.be>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Christian Brauner <brauner@...nel.org>, tytso@....edu,
linux-f2fs-devel@...ts.sourceforge.net, ebiggers@...nel.org,
linux-fsdevel@...r.kernel.org, jaegeuk@...nel.org,
linux-ext4@...r.kernel.org,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Miklos Szeredi <miklos@...redi.hu>
Subject: fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev]
[PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs
[folks involved into d_invalidate()/submount eviction stuff Cc'd]
On Sun, Nov 26, 2023 at 04:52:19AM +0000, Al Viro wrote:
> PS: as the matter of fact, it might be a good idea to pass the parent
> as explicit argument to ->d_revalidate(), now that we are passing the
> name as well. Look at the boilerplate in the instances; all that
> parent = READ_ONCE(dentry->d_parent);
> dir = d_inode_rcu(parent);
> if (!dir)
> return -ECHILD;
> ...
> on the RCU side combined with
> parent = dget_parent(dentry);
> dir = d_inode(parent);
> ...
> dput(dir);
> stuff.
>
> It's needed only because the caller had not told us which directory
> is that thing supposed to be in; in non-RCU mode the parent is
> explicitly pinned down, no need to play those games. All we need
> is
> dir = d_inode_rcu(parent);
> if (!dir) // could happen only in RCU mode
> return -ECHILD;
> assuming we need the parent inode, that is.
>
> So... how about
> int (*d_revalidate)(struct dentry *dentry, struct dentry *parent,
> const struct qstr *name, unsigned int flags);
> since we are touching all instances anyway?
OK, it's definitely a good idea for simplifying ->d_revalidate() instances
and I think we should go for it on thes grounds alone. I'll do that.
d_invalidate() situation is more subtle - we need to sort out its interplay
with d_splice_alias().
More concise variant of the scenario in question:
* we have /mnt/foo/bar and a lot of its descendents in dcache on client
* server does a rename, after which what used to be /mnt/foo/bar is /mnt/foo/baz
* somebody on the client does a lookup of /mnt/foo/bar and gets told by
the server that there's no directory with that name anymore.
* that somebody hits d_invalidate(), unhashes /mnt/foo/bar and starts
evicting its descendents
* We try to mount something on /mnt/foo/baz/blah. We look up baz, get
an fhandle and notice that there's a directory inode for it (/mnt/foo/bar).
d_splice_alias() picks the bugger and moves it to /mnt/foo/baz, rehashing
it in process, as it ought to. Then we find /mnt/foo/baz/blah in dcache and
mount on top of it.
* d_invalidate() finishes shrink_dcache_parent() and starts hunting for
submounts to dissolve. And finds the mount we'd done. Which mount quietly
disappears.
Note that from the server POV the thing had been moved quite a while ago.
No server-side races involved - all it seeem is a couple of LOOKUP in the
same directory, one for the old name, one for the new.
On the client on the mounter side we have an uneventful mount on /mnt/foo/baz,
which had been there on server at the time we started and which remains in
place after mount we'd created suddenly disappears.
For the thread that ended up calling d_invalidate(), they'd been doing e.g.
stat on a pathname that used to be there a while ago, but currently isn't.
They get -ENOENT and no indication that something odd might have happened.
>From ->d_revalidate() point of view there's also nothing odd happening -
dentry is not a mountpoint, it stays in place until we return and there's
no directory entry with that name on in its parent. It's as clear-cut
as it gets - dentry is stale.
The only overlap happening there is d_splice_alias() hitting in the middle
of already started d_invalidate().
For a while I thought that ff17fa561a04 "d_invalidate(): unhash immediately"
and 3a8e3611e0ba "d_walk(): kill 'finish' callback" might have something
to do with it, but the same problem existed prior to that.
FWIW, I suspect that the right answer would be along the lines of
* if d_splice_alias() does move an exsiting (attached) alias in
place, it ought to dissolve all mountpoints in subtree being moved.
There might be subtleties, but in case when that __d_unalias() happens
due to rename on server this is definitely the right thing to do.
* d_invalidate() should *NOT* do anything with dentry that
got moved (including moved by d_splice_alias()) from the place we'd
found it in dcache. At least d_invalidate() done due to having
->d_revalidate() return 0.
* d_invalidate() should dissolve all mountpoints in the
subtree that existed when it got started (and found the victim
still unmoved, that is). It should (as it does) prevent any
new mountpoints added in that subtree, unless the mountpoint
to be had been moved (spliced) out. What it really shouldn't
do is touch the mountpoints that are currently outside of it
due to moves.
I'm going to look around and see if we have any weird cases where
d_splice_alias() is used for things like "correct the case of
dentry name on a case-mangled filesystem" - that would presumably
not want to dissolve any submounts. I seem to recall seeing
some shite of that sort, but that was a long time ago.
Eric, Miklos - it might be a good idea if you at least took a
look at whatever comes out of that (sub)thread; I'm trying to
reconstruct the picture, but the last round of serious reworking
of that area had been almost 10 years ago and your recollections
of the considerations back then might help. I realize that they
are probably rather fragmentary (mine definitely are) and any
analysis will need to be redone on the current tree, but...
Powered by blists - more mailing lists