linux-kernel - Re: [PATCH v3 0/8] make slab shrink lockless

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <04953598-accc-7eac-1960-94a4fed580fd@bytedance.com>
Date:   Wed, 1 Mar 2023 10:27:27 +0800
From:   Qi Zheng <zhengqi.arch@...edance.com>
To:     Michal Hocko <mhocko@...e.com>, Mike Rapoport <rppt@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, tkhai@...ru,
        hannes@...xchg.org, shakeelb@...gle.com, roman.gushchin@...ux.dev,
        muchun.song@...ux.dev, david@...hat.com, shy828301@...il.com,
        sultan@...neltoast.com, dave@...olabs.net,
        penguin-kernel@...ove.sakura.ne.jp, paulmck@...nel.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 0/8] make slab shrink lockless



On 2023/3/1 02:40, Michal Hocko wrote:
> On Mon 27-02-23 17:08:30, Mike Rapoport wrote:
> [...]
>> The results you present do show improvement in IPC for an artificial test
>> script. But more interesting would be to see how a real world workloads
>> benefit from your changes.
> 
> It's been quite some time ago (2018ish) when we have seen bug report
> where mount got stalled when racing with memory reclaim. This was
> nasty because the said mount was a part of login chain and users simply
> had to wait for a long time to get loged in in that particular
> deployment.
> 
> The mount was blocked on a shrinker registration and the reclaim was
> stalled in a slab shrinker IIRC. I do not remember all the details but
> the underlying problem was that a shrinker callback took a long time
> because there were too many objects to scan or it had to sync with other
> fs operation. I believe we ended up using Minchan's break out from slab
> shrinking if the shrinker semaphore was contended and that helped to
> some degree but there were still some corner cases where a single slab
> shrinker could take a noticeable amount of time.
> 
> In general using a "big" lock like shrinker_rwsem from the reclaim and
> potentially block many unrelated subsystems that just want to register
> or unregister shrinkers is a potential source of hard to predict
> problems. So this is a very welcome change.

Totally agree. :)

Thanks,
Qi