linux-kernel - Re: [RFC PATCH] mm, slub: change percpu partial accounting from objects to pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ba5b7957-52fc-d8be-ed51-a2d21a233b4b@suse.cz>
Date:   Wed, 15 Sep 2021 10:42:06 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     David Rientjes <rientjes@...gle.com>
Cc:     linux-mm@...ck.org, Christoph Lameter <cl@...ux.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Pekka Enberg <penberg@...nel.org>,
        Jann Horn <jannh@...gle.com>, linux-kernel@...r.kernel.org,
        Roman Gushchin <guro@...com>
Subject: Re: [RFC PATCH] mm, slub: change percpu partial accounting from
 objects to pages

On 9/15/21 07:32, David Rientjes wrote:
> On Mon, 13 Sep 2021, Vlastimil Babka wrote:
> 
>> While this is no longer a problem in kmemcg context thanks to the accounting
>> rewrite in 5.9, the memory waste is still not ideal and it's questionable
>> whether it makes sense to perform free object count based control when object
>> counts can easily become so much inaccurate. So this patch converts the
>> accounting to be based on number of pages only (which is precise) and removes
>> the page->pobjects field completely. This is also ultimately simpler.
>> 
> 
> Thanks for the very detailed explanation, this is very timely for us.
> 
> I'm wondering if we should be concerned about the memory waste even being 
> possible, though, now that we have the kmemcg accounting change?
> 
> IIUC, because we're accounting objects and not pages, then it *seems* like 
> we could have a high number of pages but very few objects charged per 
> page so this memory waste could go unconstrained from any kmemcg 
> limitation.

So the main problem before 5.9 was that there were separate kmem caches per
memcg with their own percpu partial lists, so the memory used was determined
by caches x cpus x memcgs, now they are shared so it's just caches x cpus.
What you're saying would be also true, but relatively much smaller issue
than what it was before 5.9.

>> To retain the existing set_cpu_partial() heuristic, first calculate the target
>> number of objects as previously, but then convert it to target number of pages
>> by assuming the pages will be half-filled on average. This assumption might
>> obviously also be inaccurate in practice, but cannot degrade to actual number of
>> pages being equal to the target number of objects.
>> 
> 
> I think that's a fair heuristic.
> 
>> We could also skip the intermediate step with target number of objects and
>> rewrite the heuristic in terms of pages. However we still have the sysfs file
>> cpu_partial which uses number of objects and could break existing users if it
>> suddenly becomes number of pages, so this patch doesn't do that.
>> 
>> In practice, after this patch the heuristics limit the size of percpu partial
>> list up to 2 pages. In case of a reported regression (which would mean some
>> workload has benefited from the previous imprecise object based counting), we
>> can tune the heuristics to get a better compromise within the new scheme, while
>> still avoid the unexpectedly long percpu partial lists.
>> 
> 
> Curious if you've tried netperf TCP_RR with this change?  This benchmark 
> was the most significantly improved benchmark that I recall with the 
> introduction of per-cpu partial slabs for SLUB.  If there are any 
> regressions to be introduced by such an approach, I'm willing to bet that 
> it would be surfaced with that benchmark.

I'll try, thanks for the tip.