linux-kernel - Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20171115132836.GA6524@cmpxchg.org>
Date:   Wed, 15 Nov 2017 08:28:36 -0500
From:   Johannes Weiner <hannes@...xchg.org>
To:     Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc:     mhocko@...nel.org, minchan@...nel.org, ying.huang@...el.com,
        mgorman@...hsingularity.net, vdavydov.dev@...il.com,
        akpm@...ux-foundation.org, shakeelb@...gle.com, gthelen@...gle.com,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] mm,vmscan: Kill global shrinker lock.

On Wed, Nov 15, 2017 at 07:58:09PM +0900, Tetsuo Handa wrote:
> I think that Minchan's approach depends on how
> 
>   In our production, we have observed that the job loader gets stuck for
>   10s of seconds while doing mount operation. It turns out that it was
>   stuck in register_shrinker() and some unrelated job was under memory
>   pressure and spending time in shrink_slab(). Our machines have a lot
>   of shrinkers registered and jobs under memory pressure has to traverse
>   all of those memcg-aware shrinkers and do affect unrelated jobs which
>   want to register their own shrinkers.
> 
> is interpreted. If there were 100000 shrinkers and each do_shrink_slab() call
> took 1 millisecond, aborting the iteration as soon as rwsem_is_contended() would
> help a lot. But if there were 10 shrinkers and each do_shrink_slab() call took
> 10 seconds, aborting the iteration as soon as rwsem_is_contended() would help
> less. Or, there might be some specific shrinker where its do_shrink_slab() call
> takes 100 seconds. In that case, checking rwsem_is_contended() is too lazy.

In your patch, unregister() waits for shrinker->nr_active instead of
the lock, which is decreased in the same location where Minchan drops
the lock. How is that different behavior for long-running shrinkers?

Anyway, I suspect it's many shrinkers and many concurrent invocations,
so the lockbreak granularity you both chose should be fine.