lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <871qqmgxeu.fsf@oracle.com>
Date:   Tue, 01 Nov 2022 14:47:05 -0700
From:   Stephen Brennan <stephen.s.brennan@...cle.com>
To:     Al Viro <viro@...iv.linux.org.uk>
Cc:     Jan Kara <jack@...e.cz>, Amir Goldstein <amir73il@...il.com>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC] fsnotify: allow sleepable child dentry flag update

Al Viro <viro@...iv.linux.org.uk> writes:
> On Thu, Oct 13, 2022 at 03:27:19PM -0700, Stephen Brennan wrote:
>
>> So I have two more narrowly scoped strategies to improve the situation. Both are
>> included in the hacky, awful patch below.
>> 
>> First, is to let __fsnotify_update_child_dentry_flags() sleep. This means nobody
>> is holding the spinlock for several seconds at a time. We can actually achieve
>> this via a cursor, the same way that simple_readdir() is implemented. I think
>> this might require moving the declaration of d_alloc_cursor() and maybe
>> exporting it. I had to #include fs/internal.h which is not ok.
>
> Er...  Won't that expose every filesystem to having to deal with cursors?
> Currently it's entirely up to the filesystem in question and I wouldn't
> be a dime on everyone being ready to cope with such surprises...

Hi Al,

I wanted to follow-up on this. Yeah, I didn't realize that this could
cause issues for some filesystems when I wrote it, since I hadn't
considered that rather few filesystems use dcache_readdir(), and so most
aren't prepared to encounter a cursor. Thanks for that catch.

I think I came up with a better solution, which you can see in context
in v3 [1]. The only change I have from the posting there is to eliminate
the unnecssary "child->d_parent != parent" check. So the new idea is to
take a reference to the current child and then go to sleep. That alone
should be enough to prevent dentry_kill() from removing the child from
its parent's list. However, it wouldn't prevent d_move() from moving it
to a new list, nor would it prevent it from moving along the list if it
were a cursor. However, in this situation, we also hold the parent
i_rwsem in write mode, which means that no concurrent rename or readdir
can happen. You can see my full analysis here [2]. So here's the new
approach, if you can call out anything I've missod or confirm that it's
sound, I'd really appreciate it!


/* Must be called with inode->i_rwsem in held in write mode */
static bool __fsnotify_update_children_dentry_flags(struct inode *inode)
{
	struct dentry *child, *alias, *last_ref = NULL;
	alias = d_find_any_alias(inode);

	/*
	 * These lists can get very long, so we may need to sleep during
	 * iteration. Normally this would be impossible without a cursor,
	 * but since we have the inode locked exclusive, we're guaranteed
	 * that the directory won't be modified, so whichever dentry we
	 * pick to sleep on won't get moved.
	 */
	spin_lock(&alias->d_lock);
	list_for_each_entry(child, &alias->d_subdirs, d_child) {
		if (need_resched()) {
			/*
			 * We need to hold a reference while we sleep. But when
			 * we wake, dput() could free the dentry, invalidating
			 * the list pointers. We can't look at the list pointers
			 * until we re-lock the parent, and we can't dput() once
			 * we have the parent locked. So the solution is to hold
			 * onto our reference and free it the *next* time we drop
			 * alias->d_lock: either at the end of the function, or
			 * at the time of the next sleep.
			 */
			dget(child);
			spin_unlock(&alias->d_lock);
			dput(last_ref);
			last_ref = child;
			cond_resched();
			spin_lock(&alias->d_lock);
		}

		if (!child->d_inode)
			continue;

		spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
		if (watched)
			child->d_flags |= DCACHE_FSNOTIFY_PARENT_WATCHED;
		else
			child->d_flags &= ~DCACHE_FSNOTIFY_PARENT_WATCHED;
		spin_unlock(&child->d_lock);
	}
	spin_unlock(&alias->d_lock);
	dput(last_ref);
	dput(alias);
	return watched;
}

Thanks,
Stephen

[1]: https://lore.kernel.org/linux-fsdevel/20221028001016.332663-4-stephen.s.brennan@oracle.com/
[2]: https://lore.kernel.org/linux-fsdevel/874jvigye9.fsf@oracle.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ