[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ed839995-9f49-19b6-a46a-5f777cd8f52b@virtuozzo.com>
Date: Thu, 19 Dec 2019 10:35:57 +0000
From: Pavel Tikhomirov <ptikhomirov@...tuozzo.com>
To: Dave Chinner <david@...morbit.com>,
Shakeel Butt <shakeelb@...gle.com>
CC: Andrey Ryabinin <aryabinin@...tuozzo.com>,
Andrew Morton <akpm@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>,
Cgroups <cgroups@...r.kernel.org>, Linux MM <linux-mm@...ck.org>,
Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Roman Gushchin <guro@...com>,
Chris Down <chris@...isdown.name>,
Yang Shi <yang.shi@...ux.alibaba.com>,
Tejun Heo <tj@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Konstantin Khorenko <khorenko@...tuozzo.com>,
Kirill Tkhai <ktkhai@...tuozzo.com>,
Trond Myklebust <trond.myklebust@...merspace.com>,
Anna Schumaker <anna.schumaker@...app.com>,
"J. Bruce Fields" <bfields@...ldses.org>,
Chuck Lever <chuck.lever@...cle.com>,
"linux-nfs@...r.kernel.org" <linux-nfs@...r.kernel.org>,
Alexander Viro <viro@...iv.linux.org.uk>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [PATCH] mm: fix hanging shrinker management on long
do_shrink_slab
On 12/10/19 4:20 AM, Dave Chinner wrote:
> On Fri, Dec 06, 2019 at 09:11:25AM -0800, Shakeel Butt wrote:
>> On Thu, Dec 5, 2019 at 6:10 PM Dave Chinner <david@...morbit.com> wrote:
>>> If a shrinker is blocking for a long time, then we need to
>>> work to fix the shrinker implementation because blocking is a much
>>> bigger problem than just register/unregister.
>>>
>>
>> Yes, we should be fixing the implementations of all shrinkers and yes
>> it is bigger issue but we can also fix register/unregister isolation
>> issue in parallel. Fixing all shrinkers would a tedious and long task
>> and we should not block fixing isolation issue on it.
>
> "fixing all shrinkers" is a bit of hyperbole - you've identified
> only one instance where blocking is causing you problems. Indeed,
> most shrinkers are already non-blocking and won't cause you any
> problems at all.
>
>>> IOWs, we already know that cycling a global rwsem on every
>>> individual shrinker invocation is going to cause noticable
>>> scalability problems. Hence I don't think that this sort of "cycle
>>> the global rwsem faster to reduce [un]register latency" solution is
>>> going to fly because of the runtime performance regressions it will
>>> introduce....
>>>
>>
>> I agree with your scalability concern (though others would argue to
>> first demonstrate the issue before adding more sophisticated scalable
>> code).
>
> Look at the git history. We *know* this is a problem, so anyone
> arguing that we have to prove it can go take a long walk of a short
> plank....
>
>> Most memory reclaim code is written without the performance or
>> scalability concern, maybe we should switch our thinking.
>
> I think there's a lot of core mm and other developers that would
> disagree with you there. With respect to shrinkers, we've been
> directly concerned about performance and scalability of the
> individual instances as well as the infrastructure for at least the
> last decade....
>
> Cheers,
>
> Dave.
>
Thanks a lot for your replies, now I see that the core of the problem is
in nfs hang, before that I was unsure if it's OK to have such a hang in
do_shrink_slab.
I have a possibly bad idea on how my patch can still work. What if we
use unlock/refcount way only for nfs-shrinkers? It will still give a
performance penalty if one has many nfs mounts, but for those who has
little number of nfs mounts the penalty would be less. And this would be
a small isolation improvement for nfs users.
--
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.
Powered by blists - more mailing lists