linux-kernel - Re: [PATCH v3 0/4] mm/slub: Fix count

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <9ea6829a-bf10-4c24-bc8c-492862a76b54@linux.alibaba.com>
Date:   Tue, 16 Mar 2021 19:49:33 +0800
From:   Xunlei Pang <xlpang@...ux.alibaba.com>
To:     Vlastimil Babka <vbabka@...e.cz>, xlpang@...ux.alibaba.com,
        Christoph Lameter <cl@...ux.com>,
        Pekka Enberg <penberg@...nel.org>,
        Roman Gushchin <guro@...com>,
        Konstantin Khlebnikov <khlebnikov@...dex-team.ru>,
        David Rientjes <rientjes@...gle.com>,
        Matthew Wilcox <willy@...radead.org>,
        Shu Ming <sming56@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Christoph Lameter <cl@...two.de>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Wen Yang <wenyang@...ux.alibaba.com>,
        James Wang <jnwang@...ux.alibaba.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH v3 0/4] mm/slub: Fix count_partial() problem

On 3/16/21 7:02 PM, Vlastimil Babka wrote:
> On 3/16/21 11:42 AM, Xunlei Pang wrote:
>> On 3/16/21 2:49 AM, Vlastimil Babka wrote:
>>> On 3/9/21 4:25 PM, Xunlei Pang wrote:
>>>> count_partial() can hold n->list_lock spinlock for quite long, which
>>>> makes much trouble to the system. This series eliminate this problem.
>>>
>>> Before I check the details, I have two high-level comments:
>>>
>>> - patch 1 introduces some counting scheme that patch 4 then changes, could we do
>>> this in one step to avoid the churn?
>>>
>>> - the series addresses the concern that spinlock is being held, but doesn't
>>> address the fact that counting partial per-node slabs is not nearly enough if we
>>> want accurate <active_objs> in /proc/slabinfo because there are also percpu
>>> slabs and per-cpu partial slabs, where we don't track the free objects at all.
>>> So after this series while the readers of /proc/slabinfo won't block the
>>> spinlock, they will get the same garbage data as before. So Christoph is not
>>> wrong to say that we can just report active_objs == num_objs and it won't
>>> actually break any ABI.
>>
>> If maintainers don't mind this inaccuracy which I also doubt its
>> importance, then it becomes easy. For fear that some people who really
>> cares, introducing an extra config(default-off) for it would be a good
>> option.
> 
> Great.
> 
>>> At the same time somebody might actually want accurate object statistics at the
>>> expense of peak performance, and it would be nice to give them such option in
>>> SLUB. Right now we don't provide this accuracy even with CONFIG_SLUB_STATS,
>>> although that option provides many additional tuning stats, with additional
>>> overhead.
>>> So my proposal would be a new config for "accurate active objects" (or just tie
>>> it to CONFIG_SLUB_DEBUG?) that would extend the approach of percpu counters in
>>> patch 4 to all alloc/free, so that it includes percpu slabs. Without this config
>>> enabled, let's just report active_objs == num_objs.
>> For percpu slabs, the numbers can be retrieved from the existing
>> slub_percpu_partial()->pobjects, looks no need extra work.
> 
> Hm, unfortunately it's not that simple, the number there is a snapshot that can
> become wildly inacurate afterwards.
> 

It's hard to make it absoultely accurate using percpu, the data can
change during you iterating all the cpus and total_objects, I can't
imagine its real-world usage, not to mention the percpu freelist cache.
I think sysfs slabs_cpu_partial should work enough for common debug purpose.