lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 12 Nov 2022 12:00:16 +0200
From:   Amir Goldstein <amir73il@...il.com>
To:     Stephen Brennan <stephen.s.brennan@...cle.com>
Cc:     Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [PATCH v4 4/5] fsnotify: allow sleepable child flag update

On Sat, Nov 12, 2022 at 12:06 AM Stephen Brennan
<stephen.s.brennan@...cle.com> wrote:
>
> With very large d_subdirs lists, iteration can take a long time. Since
> iteration needs to hold alias->d_lock, this can trigger soft lockups.
> It would be best to make this iteration sleepable. We can drop
> alias->d_lock and sleep, by taking a reference to the current child.
> However, we need to be careful, since it's possible for the child's
> list pointers to be modified once we drop alias->d_lock. The following
> are the only cases where the list pointers are modified:
>
> 1. dentry_unlist() in fs/dcache.c
>
>    This is a helper of dentry_kill(). This function is quite careful to
>    check the reference count of the dentry once it has taken the
>    requisite locks, and bail out if a new reference was taken. So we
>    can be safe that, assuming we found the dentry and took a reference
>    before dropping alias->d_lock, any concurrent dentry_kill() should
>    bail out and leave our list pointers untouched.
>
> 2. __d_move() in fs/dcache.c
>
>    If the child was moved to a new parent, then we can detect this by
>    testing d_parent and retrying the iteration.
>
> 3. Initialization code in d_alloc() family
>
>    We are safe from this code, since we cannot encounter a dentry until
>    it has been initialized.
>
> 4. Cursor code in fs/libfs.c for dcache_readdir()
>
>    Dentries with DCACHE_DENTRY_CURSOR should be skipped before sleeping,
>    since we could awaken to find they have skipped over a portion of the
>    child list.
>
> Given these considerations, we can carefully write a loop that iterates
> over d_subdirs and is capable of going to sleep periodically.
>
> Signed-off-by: Stephen Brennan <stephen.s.brennan@...cle.com>
> ---
>
> Notes:
>     v4:
>     - I've updated this patch so it should be safe even without the
>       inode locked, by handling cursors and d_move() races.
>     - Moved comments to lower indentation
>     - I didn't keep Amir's R-b since this was a fair bit of change.
>     v3:
>     - removed if statements around dput()
>     v2:
>     - added a check for child->d_parent != alias and retry logic
>
>  fs/notify/fsnotify.c | 46 ++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index 409d479cbbc6..0ba61211456c 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -105,7 +105,7 @@ void fsnotify_sb_delete(struct super_block *sb)
>   */
>  void fsnotify_update_children_dentry_flags(struct inode *inode, bool watched)
>  {
> -       struct dentry *alias, *child;
> +       struct dentry *alias, *child, *last_ref = NULL;
>
>         if (!S_ISDIR(inode->i_mode))
>                 return;
> @@ -116,12 +116,49 @@ void fsnotify_update_children_dentry_flags(struct inode *inode, bool watched)
>                 return;
>
>         /*
> -        * run all of the children of the original inode and fix their
> -        * d_flags to indicate parental interest (their parent is the
> -        * original inode)
> +        * These lists can get very long, so we may need to sleep during
> +        * iteration. Normally this would be impossible without a cursor,
> +        * but since we have the inode locked exclusive, we're guaranteed

Not exactly true for v4 patchset order, but I do prefer that we make it true.

> +        * that the directory won't be modified, so whichever dentry we
> +        * pick to sleep on won't get moved. So, start a manual iteration
> +        * over d_subdirs which will allow us to sleep.
>          */
>         spin_lock(&alias->d_lock);
> +retry:

Better if we can avoid this retry by inode lock.
Note that it is enough to take inode_lock_shared() to protect
this code.

It means that tasks doing remove+add parent watch may
iterate d_subdirs in parallel, but maybe that's even better
then having them iterate d_subdirs sequentially?

>         list_for_each_entry(child, &alias->d_subdirs, d_child) {
> +               /*
> +                * We need to hold a reference while we sleep. But we cannot
> +                * sleep holding a reference to a cursor, or we risk skipping
> +                * through the list.
> +                *
> +                * When we wake, dput() could free the dentry, invalidating the
> +                * list pointers.  We can't look at the list pointers until we
> +                * re-lock the parent, and we can't dput() once we have the
> +                * parent locked.  So the solution is to hold onto our reference
> +                * and free it the *next* time we drop alias->d_lock: either at
> +                * the end of the function, or at the time of the next sleep.
> +                */

My usual nit picking: you could concat this explanation to the
comment outside the loop. It won't make it any less clear, maybe
even more clear in the wider context of how the children are safely
iterated.

Thanks,
Amir.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ