[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <62d61572-830b-a660-8049-3826128343c5@suse.cz>
Date: Wed, 27 Jan 2021 14:38:29 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Michal Hocko <mhocko@...e.com>,
Vincent Guittot <vincent.guittot@...aro.org>
Cc: Christoph Lameter <cl@...ux.com>,
Bharata B Rao <bharata@...ux.ibm.com>,
linux-kernel <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
David Rientjes <rientjes@...gle.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Andrew Morton <akpm@...ux-foundation.org>, guro@...com,
Shakeel Butt <shakeelb@...gle.com>,
Johannes Weiner <hannes@...xchg.org>,
aneesh.kumar@...ux.ibm.com, Jann Horn <jannh@...gle.com>
Subject: Re: [RFC PATCH v0] mm/slub: Let number of online CPUs determine the
slub page order
On 1/26/21 2:59 PM, Michal Hocko wrote:
>>
>> On 8 CPUs, I run hackbench with up to 16 groups which means 16*40
>> threads. But I raise up to 256 groups, which means 256*40 threads, on
>> the 224 CPUs system. In fact, hackbench -g 1 (with 1 group) doesn't
>> regress on the 224 CPUs system. The next test with 4 groups starts
>> to regress by -7%. But the next one: hackbench -g 16 regresses by 187%
>> (duration is almost 3 times longer). It seems reasonable to assume
>> that the number of running threads and resources scale with the number
>> of CPUs because we want to run more stuff.
>
> OK, I do understand that more jobs scale with the number of CPUs but I
> would also expect that higher order pages are generally more expensive
> to get so this is not really a clear cut especially under some more
> demand on the memory where allocations are smooth. So the question
> really is whether this is not just optimizing for artificial conditions.
FWIW, I enabled CONFIG_SLUB_STATS and run "hackbench -l 16000 -g 16" in a
(small) VM, and checked tools/vm/slabinfo -DA as per the config option's help,
and it seems to be these 2 caches that are stressed:
Name Objects Alloc Free %Fast Fallb O CmpX UL
kmalloc-512 812 25655535 25654908 71 1 0 0 20082 0
skbuff_head_cache 304 25602632 25602632 84 1 0 0 11241 0
I guess larger pages mean more batched per-cpu allocations without going to the
shared structures or even page allocator. But 3 times duration is still surprising
to me. I'll dig more.
Powered by blists - more mailing lists