[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ba5b7957-52fc-d8be-ed51-a2d21a233b4b@suse.cz>
Date: Wed, 15 Sep 2021 10:42:06 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: David Rientjes <rientjes@...gle.com>
Cc: linux-mm@...ck.org, Christoph Lameter <cl@...ux.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Pekka Enberg <penberg@...nel.org>,
Jann Horn <jannh@...gle.com>, linux-kernel@...r.kernel.org,
Roman Gushchin <guro@...com>
Subject: Re: [RFC PATCH] mm, slub: change percpu partial accounting from
objects to pages
On 9/15/21 07:32, David Rientjes wrote:
> On Mon, 13 Sep 2021, Vlastimil Babka wrote:
>
>> While this is no longer a problem in kmemcg context thanks to the accounting
>> rewrite in 5.9, the memory waste is still not ideal and it's questionable
>> whether it makes sense to perform free object count based control when object
>> counts can easily become so much inaccurate. So this patch converts the
>> accounting to be based on number of pages only (which is precise) and removes
>> the page->pobjects field completely. This is also ultimately simpler.
>>
>
> Thanks for the very detailed explanation, this is very timely for us.
>
> I'm wondering if we should be concerned about the memory waste even being
> possible, though, now that we have the kmemcg accounting change?
>
> IIUC, because we're accounting objects and not pages, then it *seems* like
> we could have a high number of pages but very few objects charged per
> page so this memory waste could go unconstrained from any kmemcg
> limitation.
So the main problem before 5.9 was that there were separate kmem caches per
memcg with their own percpu partial lists, so the memory used was determined
by caches x cpus x memcgs, now they are shared so it's just caches x cpus.
What you're saying would be also true, but relatively much smaller issue
than what it was before 5.9.
>> To retain the existing set_cpu_partial() heuristic, first calculate the target
>> number of objects as previously, but then convert it to target number of pages
>> by assuming the pages will be half-filled on average. This assumption might
>> obviously also be inaccurate in practice, but cannot degrade to actual number of
>> pages being equal to the target number of objects.
>>
>
> I think that's a fair heuristic.
>
>> We could also skip the intermediate step with target number of objects and
>> rewrite the heuristic in terms of pages. However we still have the sysfs file
>> cpu_partial which uses number of objects and could break existing users if it
>> suddenly becomes number of pages, so this patch doesn't do that.
>>
>> In practice, after this patch the heuristics limit the size of percpu partial
>> list up to 2 pages. In case of a reported regression (which would mean some
>> workload has benefited from the previous imprecise object based counting), we
>> can tune the heuristics to get a better compromise within the new scheme, while
>> still avoid the unexpectedly long percpu partial lists.
>>
>
> Curious if you've tried netperf TCP_RR with this change? This benchmark
> was the most significantly improved benchmark that I recall with the
> introduction of per-cpu partial slabs for SLUB. If there are any
> regressions to be introduced by such an approach, I'm willing to bet that
> it would be surfaced with that benchmark.
I'll try, thanks for the tip.
Powered by blists - more mailing lists