[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACGdZY+U54nfgZW1o+bgWXButsQpJhJOPTLRqM-JPU-y0bCBgg@mail.gmail.com>
Date: Mon, 16 Apr 2018 11:28:42 -0700
From: Khazhismel Kumykov <khazhy@...gle.com>
To: Al Viro <viro@...iv.linux.org.uk>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Nikolay Borisov <nborisov@...e.com>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
David Rientjes <rientjes@...gle.com>,
Goldwyn Rodrigues <rgoldwyn@...e.de>,
Jeff Mahoney <jeffm@...e.com>,
Davidlohr Bueso <dave@...olabs.net>
Subject: Re: [PATCH] fs/dcache.c: re-add cond_resched() in shrink_dcache_parent()
On Sun, Apr 15, 2018 at 3:34 PM, Al Viro <viro@...iv.linux.org.uk> wrote:
> On Sun, Apr 15, 2018 at 10:54:55PM +0100, Al Viro wrote:
>> On Sun, Apr 15, 2018 at 09:40:54PM +0100, Al Viro wrote:
>>
>> > BTW, the current placement of cond_resched() looks bogus; suppose we
>> > have collected a lot of victims and ran into need_resched(). We leave
>> > d_walk() and call shrink_dentry_list(). At that point there's a lot
>> > of stuff on our shrink list and anybody else running into them will
>> > have to keep scanning. Giving up the timeslice before we take care
>> > of any of those looks like a bad idea, to put it mildly, and that's
>> > precisely what will happen.
>> >
>> > What about doing that in the end of __dentry_kill() instead? And to
>> > hell with both existing call sites - dput() one (before going to
>> > the parent) is obviously covered by that (dentry_kill() only returns
>> > non-NULL after having called __dentry_kill()) and in shrink_dentry_list()
>> > we'll get to it as soon as we go through all dentries that can be
>> > immediately kicked off the shrink list. Which, AFAICS, improves the
>> > situation, now that shrink_lock_dentry() contains no trylock loops...
>> >
>> > Comments?
>>
>> What I mean is something like this (cumulative diff, it'll obviously need
>> to be carved up into 3--4 commits):
>
> ... and carved-up version is in vfs.git#work.dcache. Could syzbot folks
> hit it with their reproducers?
To me it seems like shrink_dcache_parent could still spin without
scheduling similar to before - found > 0 since *someone* is shrinking,
but we have 0 entries to shrink - we never call __dentry_kill or
schedule, and just spin.
And indeed, the syzbot reproducer @vfs#work.dcache hits a softlockup
in shrink_dcache_parent.
I'd think re-adding cond_resched to shrink_dcache_parent does make the
softlockup go way. It seems to fix the reproducer.
Although did we want to terminate the loop in shrink_dcache_parent earlier?
Download attachment "smime.p7s" of type "application/pkcs7-signature" (4843 bytes)
Powered by blists - more mailing lists