lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 25 May 2017 15:31:23 -0700
From:   Khazhismel Kumykov <>
        "" <>,
Subject: Re: Hang/soft lockup in d_invalidate with simultaneous calls

On Mon, May 22, 2017 at 11:18 AM, Khazhismel Kumykov <> wrote:
> On Wed, May 17, 2017 at 2:58 PM Khazhismel Kumykov <> wrote:
>> On Mon, May 15, 2017 at 5:05 PM, Khazhismel Kumykov <> wrote:
>> > Hi,
>> >
>> > I'm seeing behavior in d_invalidate, if multiple threads call d_invalidate on
>> > the same tree at the same, behavior time blows up and all the calls hang with
>> > large enough trees/enough simultaneous callers. (e.g. a directory w/ 100k
>> > entries in d_subdir, and 5 or so threads calling d_invalidate was able to hang
>> > my test VMs)
>> >
>> > This seems to be due to thrashing on the dentry locks in multiple threads
>> > tightly looping calling d_walk waiting for a shrink_dentry_list call (also
>> > thrashing the locks) to complete. (d_invalidate loops calling d_walk so long as
>> > any dentry in the tree is in a dcache shrink list).
>> >
>> > There was a similar report recently "soft lockup in d_invalidate()" that
>> > proposed in the d_invalidate d_walk to ignore dentries marked as in a shrink
>> > list already, which does prevent this hang/lockup in this case as well, although
>> > I'm not sure it doesn't violate the contract for d_invalidate. (Although the
>> > entries in a shrink list should be dropped eventually, not necessarily by the
>> > time d_invalidate returns).
>> >
>> > If someone more familiar with the dcache could provide input I would appreciate.
>> >
>> > A reliable repro on mainline is:
>> >  - create a mountpoint with DCACHE_OP_REVALIDATE, e.g. fuse passthrough
>> >  - create a directory and populate with ~100k files with content to
>> > populate dcache
>> >  - create some background processes opening/reading files in this folder (5
>> >       while true; cat $file was enough to get an indefinite hang for me)
>> >  - cause the directory to need to be invalidated (e.g., make_bad_inode on the
>> >     directory)
>> >
>> > This results in the background processes all entering d_invalidate and hanging,
>> > while with just one process in d_invalidate (e.g., stat'ing a file in the dir)
>> > things go pretty quickly as expected.
>> >
>> >
>> > (The proposed patch from other thread)
>> >
>> > diff --git a/fs/dcache.c b/fs/dcache.c
>> > index 7b8feb6..3a3b0f37 100644
>> > --- a/fs/dcache.c
>> > +++ b/fs/dcache.c
>> > @@ -1364,7 +1364,7 @@ static enum d_walk_ret select_collect(void *_data,
>> > struct dentry *dentry)
>> >                  goto out;
>> >
>> >          if (dentry->d_flags & DCACHE_SHRINK_LIST) {
>> > -               data->found++;
>> > +               goto out;
>> >          } else {
>> >                  if (dentry->d_flags & DCACHE_LRU_LIST)
>> >                          d_lru_del(dentry);
>> >
>> >
>> > khazhy
>> Would this change actually violate any guarantees? select_collect
>> looks like it used to ignore unused dentries that were part of a
>> shrink list before fe91522a7ba82ca1a51b07e19954b3825e4aaa22 (found
>> would not be incremented). Once the dentry is on a shrink list would
>> it be unreachable anyways, so returning from d_invalidate before all
>> the shrink lists finish processing would be ok?
>> khazhy
> Pinging in case this got missed, would appreciate thoughts from someone more
> familiar with the dcache.
> My previous email wasn't completely correct, while before fe91522a7ba8
> ("don't remove from shrink list in select_collect()") d_invalidate
> would not busy
> wait for other workers calling shrink_list to compete, it would return -EBUSY,
> rather than success, so a change to return success without waiting would not
> be equivalent behavior before. Currently, we will loop calling d_walk repeatedly
> causing the extreme slowdown observed.
> I still want to understand better, in d_invalidate if we can return
> without pruning
> in-use dentries safely, would returning before unused dentries are
> pruned, so long
> as we know they will be pruned by another task (are on a shrink list),
> be safe as well?
> If not, would it make more sense to have have a mutual exclusion on calling
> d_invalidate on the same dentries simultaneously?
> khazhy


Again to summarize:
With a directory with large number of children, which makes a dentry with a
large number dentries in d_subdir, simultaneous calls to d_invalidate on that
dentry take a very long time. As an example, a directory with 160k files,
where a single d_invalidate call took ~1.4s, having 6 tasks call d_invalidate
simultaneously took ~6.5 hours to resolve, about 10000x longer.


This seems to be due to contention between d_walk and shrink_dentry_list,
and particularly d_invalidate tightly looping d_walk so long as any dentry
it finds is in a shrink list. With this particular directory
structure, d_walk will
hold d_lock for "dir" while walking over d_subdir, meanwhile
shrink_dentry_list will release and regrab "dir"s d_lock every iteration,
also throwing it at the back of the queue.

One proposed solution is in select_collect, do not increment the
number of found unused descendants when we find a dentry in a
shrink list. This proposal will result in d_invalidate and shrink_dcache_parent
returning before necessarily all unused children are pruned, but they will
be pruned at some time soon by the other task currently shrinking, which
I am not sure if that is OK or not.

A quick and dirty fix would be adding a mutex around the shrink loops,
so simultaneous tasks don't thrash with each other.

There are probably also other solutions here, I would appreciate a look
from someone more familiar with this area.


Download attachment "smime.p7s" of type "application/pkcs7-signature" (4843 bytes)

Powered by blists - more mailing lists