lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 17 May 2017 14:58:11 -0700
From:   Khazhismel Kumykov <khazhy@...gle.com>
To:     linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        viro@...iv.linux.org.uk
Subject: Re: Hang/soft lockup in d_invalidate with simultaneous calls

On Mon, May 15, 2017 at 5:05 PM, Khazhismel Kumykov <khazhy@...gle.com> wrote:
> Hi,
>
> I'm seeing behavior in d_invalidate, if multiple threads call d_invalidate on
> the same tree at the same, behavior time blows up and all the calls hang with
> large enough trees/enough simultaneous callers. (e.g. a directory w/ 100k
> entries in d_subdir, and 5 or so threads calling d_invalidate was able to hang
> my test VMs)
>
> This seems to be due to thrashing on the dentry locks in multiple threads
> tightly looping calling d_walk waiting for a shrink_dentry_list call (also
> thrashing the locks) to complete. (d_invalidate loops calling d_walk so long as
> any dentry in the tree is in a dcache shrink list).
>
> There was a similar report recently "soft lockup in d_invalidate()" that
> proposed in the d_invalidate d_walk to ignore dentries marked as in a shrink
> list already, which does prevent this hang/lockup in this case as well, although
> I'm not sure it doesn't violate the contract for d_invalidate. (Although the
> entries in a shrink list should be dropped eventually, not necessarily by the
> time d_invalidate returns).
>
> If someone more familiar with the dcache could provide input I would appreciate.
>
> A reliable repro on mainline is:
>  - create a mountpoint with DCACHE_OP_REVALIDATE, e.g. fuse passthrough
>  - create a directory and populate with ~100k files with content to
> populate dcache
>  - create some background processes opening/reading files in this folder (5
>       while true; cat $file was enough to get an indefinite hang for me)
>  - cause the directory to need to be invalidated (e.g., make_bad_inode on the
>     directory)
>
> This results in the background processes all entering d_invalidate and hanging,
> while with just one process in d_invalidate (e.g., stat'ing a file in the dir)
> things go pretty quickly as expected.
>
>
> (The proposed patch from other thread)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 7b8feb6..3a3b0f37 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1364,7 +1364,7 @@ static enum d_walk_ret select_collect(void *_data,
> struct dentry *dentry)
>                  goto out;
>
>          if (dentry->d_flags & DCACHE_SHRINK_LIST) {
> -               data->found++;
> +               goto out;
>          } else {
>                  if (dentry->d_flags & DCACHE_LRU_LIST)
>                          d_lru_del(dentry);
>
>
> khazhy

Would this change actually violate any guarantees? select_collect
looks like it used to ignore unused dentries that were part of a
shrink list before fe91522a7ba82ca1a51b07e19954b3825e4aaa22 (found
would not be incremented). Once the dentry is on a shrink list would
it be unreachable anyways, so returning from d_invalidate before all
the shrink lists finish processing would be ok?

khazhy

Download attachment "smime.p7s" of type "application/pkcs7-signature" (4843 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ