lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <04953598-accc-7eac-1960-94a4fed580fd@bytedance.com>
Date:   Wed, 1 Mar 2023 10:27:27 +0800
From:   Qi Zheng <zhengqi.arch@...edance.com>
To:     Michal Hocko <mhocko@...e.com>, Mike Rapoport <rppt@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, tkhai@...ru,
        hannes@...xchg.org, shakeelb@...gle.com, roman.gushchin@...ux.dev,
        muchun.song@...ux.dev, david@...hat.com, shy828301@...il.com,
        sultan@...neltoast.com, dave@...olabs.net,
        penguin-kernel@...ove.sakura.ne.jp, paulmck@...nel.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 0/8] make slab shrink lockless



On 2023/3/1 02:40, Michal Hocko wrote:
> On Mon 27-02-23 17:08:30, Mike Rapoport wrote:
> [...]
>> The results you present do show improvement in IPC for an artificial test
>> script. But more interesting would be to see how a real world workloads
>> benefit from your changes.
> 
> It's been quite some time ago (2018ish) when we have seen bug report
> where mount got stalled when racing with memory reclaim. This was
> nasty because the said mount was a part of login chain and users simply
> had to wait for a long time to get loged in in that particular
> deployment.
> 
> The mount was blocked on a shrinker registration and the reclaim was
> stalled in a slab shrinker IIRC. I do not remember all the details but
> the underlying problem was that a shrinker callback took a long time
> because there were too many objects to scan or it had to sync with other
> fs operation. I believe we ended up using Minchan's break out from slab
> shrinking if the shrinker semaphore was contended and that helped to
> some degree but there were still some corner cases where a single slab
> shrinker could take a noticeable amount of time.
> 
> In general using a "big" lock like shrinker_rwsem from the reclaim and
> potentially block many unrelated subsystems that just want to register
> or unregister shrinkers is a potential source of hard to predict
> problems. So this is a very welcome change.

Totally agree. :)

Thanks,
Qi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ