linux-kernel - Re: [PATCH 11/19] VFS: Add ability to exclusively lock a dentry and use for create/remove operations.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250209064027.GV1977892@ZenIV>
Date: Sun, 9 Feb 2025 06:40:27 +0000
From: Al Viro <viro@...iv.linux.org.uk>
To: NeilBrown <neilb@...e.de>
Cc: Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jeff Layton <jlayton@...nel.org>,
	Dave Chinner <david@...morbit.com>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 11/19] VFS: Add ability to exclusively lock a dentry and
 use for create/remove  operations.

On Thu, Feb 06, 2025 at 04:42:48PM +1100, NeilBrown wrote:

> +bool d_update_lock(struct dentry *dentry,
> +		   struct dentry *base, const struct qstr *last,
> +		   unsigned int subclass)
> +{
> +	lock_acquire_exclusive(&dentry->d_update_map, subclass, 0, NULL, _THIS_IP_);
> +again:
> +	spin_lock(&dentry->d_lock);
> +	wait_var_event_spinlock(&dentry->d_flags,
> +				!check_dentry_locked(dentry),
> +				&dentry->d_lock);
> +	if (d_is_positive(dentry)) {
> +		rcu_read_lock(); /* needed for d_same_name() */

It isn't.  You are holding ->d_lock there.

> +		if (
> +			/* Was unlinked while we waited ?*/
> +			d_unhashed(dentry) ||
> +			/* Or was dentry renamed ?? */
> +			dentry->d_parent != base ||
> +			dentry->d_name.hash != last->hash ||
> +			!d_same_name(dentry, base, last)

Negatives can't be moved, but they bloody well can be unhashed.  So skipping
the d_unhashed() part for negatives is wrong.

> +		) {
> +			rcu_read_unlock();
> +			spin_unlock(&dentry->d_lock);
> +			lock_map_release(&dentry->d_update_map);
> +			return false;
> +		}
> +		rcu_read_unlock();
> +	}
> +	/* Must ensure DCACHE_PAR_UPDATE in child is visible before reading
> +	 * from parent
> +	 */
> +	smp_store_mb(dentry->d_flags, dentry->d_flags | DCACHE_PAR_UPDATE);

... paired with?

> +	if (base->d_flags & DCACHE_PAR_UPDATE) {
> +		/* We cannot grant DCACHE_PAR_UPDATE on a dentry while
> +		 * it is held on the parent
> +		 */
> +		dentry->d_flags &= ~DCACHE_PAR_UPDATE;
> +		spin_unlock(&dentry->d_lock);
> +		spin_lock(&base->d_lock);
> +		wait_var_event_spinlock(&base->d_flags,
> +					!check_dentry_locked(base),
> +					&base->d_lock);

Oh?  So you might also be waiting on the parent?  That's a deadlock fodder right
there - caller might be holding ->i_rwsem on the same parent, so you have waiting
on _->d_flags nested both outside and inside _->d_inode->i_rwsem.

Just in case anyone goes "->i_rwsem will only be held shared" - that wouldn't help.
Throw fchmod() into the mix and enjoy your deadlock -
	A: holds ->i_rwsem shared, waits for C to clear DCACHE_PAR_UPDATE.
	B: blocked trying to grab ->i_rwsem exclusive
	C: has DCACHE_PAR_UPDATE set, is blocked trying to grab ->i_rwsem shared
and there you go...

> +		spin_unlock(&base->d_lock);
> +		goto again;
> +	}
> +	spin_unlock(&dentry->d_lock);
> +	return true;
> +}

The entire thing is refcount-neutral for both dentry and base.  Which makes this

> @@ -1759,8 +1863,9 @@ static struct dentry *lookup_and_lock_nested(const struct qstr *last,
>  
>  	if (!(lookup_flags & LOOKUP_PARENT_LOCKED))
>  		inode_lock_nested(base->d_inode, subclass);
> -
> -	dentry = lookup_one_qstr(last, base, lookup_flags);
> +	do {
> +		dentry = lookup_one_qstr(last, base, lookup_flags);
> +	} while (!IS_ERR(dentry) && !d_update_lock(dentry, base, last, subclass));

... a refcount leak waiting to happen.