lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 26 Nov 2023 04:52:19 +0000
From: Al Viro <>
To: Gabriel Krisman Bertazi <>
Cc: Linus Torvalds <>,
	Christian Brauner <>,,,,,,
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on
 case-insensitive ext4 and f2fs

On Sat, Nov 25, 2023 at 10:01:36PM +0000, Al Viro wrote:
> On Fri, Nov 24, 2023 at 10:22:49AM -0500, Gabriel Krisman Bertazi wrote:
> > ack. I'll base the other changes we discussed on top of your branch.
> Rebased to v6.7-rc1, fixed up (ceph calls fscrypt_d_revalidate() directly,
> and D/f/porting entry had been missing), pushed out as #no-rebase-d_revalidate

FWIW, ->d_revalidate() has an old unpleasant problem we might try to solve

In non-RCU mode We treat 0 as "invalidate that sucker and do a fresh lookup".
Fine, except that there's a possibility of race here - we'd hit ->d_revalidate()
while another thread was renaming object in question.  Or has just found it
by doing lookup in a place where it had been moved on server.

->d_revalidate() decides that it needs to be looked up on server and forms
a request before rename succeeds.  So NFS (e.g.) request goes out with the
old parent and name.  By the time server sees it, RENAME has been processed
and succeeded.  There's no such file in the old place anymore.

So ->d_revalidate() returns 0... and we proceed to invalidate the dentry.
Which had been moved to *new* place by now.  In that place it's perfectly
valid and does not deserve invalidation.

Scenario when rename had been done not from this client is even worse:

server:/srv/nfs/foo is mounted on /mnt/foo
we state /mnt/foo/bar
/mnt/foo/bar is in dcache
somebody on server renames /srv/nfs/foo/bar to /srv/nfs/foo/barf
process A: stat /mnt/foo/bar/baz.
process B: mount something on /mnt/foo/barf/
process B: no /mnt/foo/barf in dcache, let's look it up
	   found fhandle of /mnt/foo
	   sent LOOKUP "barf" in it
	   got an fhandle and found it matching the inode of /mnt/foo/bar
process A: has reached /mnt/foo/bar and decided to revalidate it.
	   found fhandle of /mnt/foo
	   sent a LOOKUP "bar" in that
	   got "nothing with that name there"
	   ->d_revalidate() returns 0
	   loses CPU
process B: splice the dentry of /mnt/foo/bar to /mnt/foo/barf
	   proceed to mount on top of it
process A: gets CPU back
	   calls d_invalidate() on the dentry that now is /mnt/foo/barf
	   dissolves the mount created by process B

Note that server:/srv/nfs/foo/barf has been there and perfectly valid
since before B has started doing anything.  It has no idea that the
damn thing used to be in a different place and something on the same
client had seen it at the old place once upon a time.  As far as it is
concerned, mount has succeeded and then quietly disappeared.  The mountpoint
is still there - with freshly looked up dentry, since the old one had been
invalidated, but userland doesn't see that, so... WTF?

It's not easy to hit, but I'd expect it to be feasible on SMP KVM, where instead
of A losing CPU we might've had the virtual CPU losing the timeslice on host.

IMO we should only do d_invalidate() if
	* ->d_revalidate() has returned 0
	* dentry is still hashed, still has the same parent and still matches
the name from ->d_compare() POV.
If it doesn't, we should just leave it whereever it has been moved to and
act as if we hadn't seen it in the first place.

In other words, have
d_revalidate(dentry, parent, name, flags) doing the following:
	if no ->d_revalidate
		return 1
	ret = ->d_revalidate(...)
	if (unlikely(ret == 0) && !(flags & LOOKUP_RCU)) {
		if (!d_same_name(dentry, parent, name))
	return ret

where d_invalidate_locked() would be d_invalidate() sans the initial

That would solve that problem, AFAICS.  Objections, anyone?  I'm too
sleepy to put together a patch at the moment, will post after I get
some sleep...

PS: as the matter of fact, it might be a good idea to pass the parent
as explicit argument to ->d_revalidate(), now that we are passing the
name as well.  Look at the boilerplate in the instances; all that
        parent = READ_ONCE(dentry->d_parent);
	dir = d_inode_rcu(parent);
	if (!dir)
		return -ECHILD;
on the RCU side combined with
	parent = dget_parent(dentry);
	dir = d_inode(parent);

It's needed only because the caller had not told us which directory
is that thing supposed to be in; in non-RCU mode the parent is
explicitly pinned down, no need to play those games.  All we need
	dir = d_inode_rcu(parent);
	if (!dir) // could happen only in RCU mode
		return -ECHILD;
assuming we need the parent inode, that is.

So... how about
	int (*d_revalidate)(struct dentry *dentry, struct dentry *parent,
			  const struct qstr *name, unsigned int flags);
since we are touching all instances anyway?

Powered by blists - more mailing lists