linux-ext4 - Re: fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87v89kio36.fsf@email.froward.int.ebiederm.org>
Date: Wed, 29 Nov 2023 09:19:41 -0600
From: "Eric W. Biederman" <ebiederm@...ssion.com>
To: Al Viro <viro@...iv.linux.org.uk>
Cc: linux-fsdevel@...r.kernel.org,  Linus Torvalds
 <torvalds@...ux-foundation.org>,  Christian Brauner <brauner@...nel.org>,
  tytso@....edu,  linux-f2fs-devel@...ts.sourceforge.net,
  ebiggers@...nel.org,  jaegeuk@...nel.org,  linux-ext4@...r.kernel.org,
  Miklos Szeredi <miklos@...redi.hu>,  Gabriel Krisman Bertazi
 <gabriel@...sman.be>
Subject: Re: fun with d_invalidate() vs. d_splice_alias() was Re: [f2fs-dev]
 [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs

Al Viro <viro@...iv.linux.org.uk> writes:

> On Mon, Nov 27, 2023 at 06:38:43AM +0000, Al Viro wrote:
>
>> > FWIW, I suspect that the right answer would be along the lines of
>> > 	* if d_splice_alias() does move an exsiting (attached) alias in
>> > place, it ought to dissolve all mountpoints in subtree being moved.
>> > There might be subtleties,
>
> Are there ever...  Starting with the "our test for loop creation
> (alias is a direct ancestor, need to fail with -ELOOP) is dependent
> upon rename_lock being held all along".
>
> Folks, what semantics do we want for dissolving mounts on splice?
> The situation when it happens is when we have a subtree on e.g. NFS
> and have some mounts (on client) inside that.  Then somebody on
> server moves the root of that subtree somewhere else and we try
> to do a lookup in new place.  Options:
>
> 1) our dentry for directory that got moved on server is moved into
> new place, along with the entire subtree *and* everything mounted
> on it.  Very dubious semantics, especially since if we look the
> old location up before looking for new one, the mounts will be
> dissolved; no way around that.
>
> 2) lookup fails.  It's already possible; e.g. if server has
> /srv/nfs/1/2/3 moved to /srv/nfs/x, then /srv/nfs/1/2 moved
> to /srv/nfs/x/y and client has a process with cwd in /mnt/nfs/1/2/3
> doing a lookup for "y", there's no way in hell to handle that -
> the lookup will return the fhandle of /srv/nfs/x, which is the
> same thing the client has for /mnt/nfs/1/2; we *can't* move that
> dentry to /mnt/nfs/1/2/3/y - not without creating a detached loop.
> We can also run into -ESTALE if one of the trylocks in
> __d_unalias() fails.  Having the same happen if there are mounts
> in the subtree we are trying to splice would be unpleasant, but
> not fatal.  The trouble is, that won't be a transient failure -
> not until somebody tries to look the old location up.
>
> 3) dissolve the mounts.  Doable, but it's not easy; especially
> since we end up having to redo the loop-prevention check after
> the mounts had been dissolved.  And that check may be failing
> by that time, with no way to undo that dissolving...

To be clear this is a change in current semantics and has a minuscule
change of resulting in a regression.  That should be called out in the
change log.

If we choose to change the semantics I would suggest that the new
semantics be:

If a different name for a directory already exists:
	* Detach the mounts unconditionally (leaving dentry descendants alone).
	* Attempt the current splice.
          - If the splice succeeds ( return the new dentry )
	  - If the splice fails ( fail the lookup, and d_invalidate the existing name )

Unconditionally dissolving the mounts before attempting the rename
should simplify everything.

In the worst case a race between d_invalidate and d_splice_alias will
now become a race to see who can detach the mounts first.

Eric