[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210123051607.GC2587010@in.ibm.com>
Date: Sat, 23 Jan 2021 10:46:07 +0530
From: Bharata B Rao <bharata@...ux.ibm.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Vincent Guittot <vincent.guittot@...aro.org>,
Christoph Lameter <cl@...ux.com>,
linux-kernel <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
David Rientjes <rientjes@...gle.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Andrew Morton <akpm@...ux-foundation.org>, guro@...com,
Shakeel Butt <shakeelb@...gle.com>,
Johannes Weiner <hannes@...xchg.org>,
aneesh.kumar@...ux.ibm.com, Jann Horn <jannh@...gle.com>,
Michal Hocko <mhocko@...nel.org>
Subject: Re: [RFC PATCH v0] mm/slub: Let number of online CPUs determine the
slub page order
On Fri, Jan 22, 2021 at 01:03:57PM +0100, Vlastimil Babka wrote:
> On 1/22/21 9:03 AM, Vincent Guittot wrote:
> > On Thu, 21 Jan 2021 at 19:19, Vlastimil Babka <vbabka@...e.cz> wrote:
> >>
> >> On 1/21/21 11:01 AM, Christoph Lameter wrote:
> >> > On Thu, 21 Jan 2021, Bharata B Rao wrote:
> >> >
> >> >> > The problem is that calculate_order() is called a number of times
> >> >> > before secondaries CPUs are booted and it returns 1 instead of 224.
> >> >> > This makes the use of num_online_cpus() irrelevant for those cases
> >> >> >
> >> >> > After adding in my command line "slub_min_objects=36" which equals to
> >> >> > 4 * (fls(num_online_cpus()) + 1) with a correct num_online_cpus == 224
> >> >> > , the regression diseapears:
> >> >> >
> >> >> > 9 iterations of hackbench -l 16000 -g 16: 3.201sec (+/- 0.90%)
> >>
> >> I'm surprised that hackbench is that sensitive to slab performance, anyway. It's
> >> supposed to be a scheduler benchmark? What exactly is going on?
> >>
> >
> > From hackbench description:
> > Hackbench is both a benchmark and a stress test for the Linux kernel
> > scheduler. It's main
> > job is to create a specified number of pairs of schedulable
> > entities (either threads or
> > traditional processes) which communicate via either sockets or
> > pipes and time how long it
> > takes for each pair to send data back and forth.
>
> Yep, so I wonder which slab entities this is stressing that much.
>
> >> Things would be easier if we could trust *on all arches* either
> >>
> >> - num_present_cpus() to count what the hardware really physically has during
> >> boot, even if not yet onlined, at the time we init slab. This would still not
> >> handle later hotplug (probably mostly in a VM scenario, not that somebody would
> >> bring bunch of actual new cpu boards to a running bare metal system?).
> >>
> >> - num_possible_cpus()/nr_cpu_ids not to be excessive (broken BIOS?) on systems
> >> where it's not really possible to plug more CPU's. In a VM scenario we could
> >> still have an opposite problem, where theoretically "anything is possible" but
> >> the virtual cpus are never added later.
> >
> > On all the system that I have tested num_possible_cpus()/nr_cpu_ids
> > were correctly initialized
> >
> > large arm64 acpi system
> > small arm64 DT based system
> > VM on x86 system
>
> So it's just powerpc that has this issue with too large nr_cpu_ids? Is it caused
> by bios or the hypervisor? How does num_present_cpus() look there?
PowerPC PowerNV Host: (160 cpus)
num_online_cpus 1 num_present_cpus 160 num_possible_cpus 160 nr_cpu_ids 160
PowerPC pseries KVM guest: (-smp 16,maxcpus=160)
num_online_cpus 1 num_present_cpus 16 num_possible_cpus 160 nr_cpu_ids 160
That's what I see on powerpc, hence I thought num_present_cpus() could
be the correct one to use in slub page order calculation.
>
> What about heuristic:
> - num_online_cpus() > 1 - we trust that and use it
> - otherwise nr_cpu_ids
> Would that work? Too arbitrary?
Looking at the following snippet from include/linux/cpumask.h, it
appears that num_present_cpus() should be reasonable compromise
between online and possible/nr_cpus_ids to use here.
/*
* The following particular system cpumasks and operations manage
* possible, present, active and online cpus.
*
* cpu_possible_mask- has bit 'cpu' set iff cpu is populatable
* cpu_present_mask - has bit 'cpu' set iff cpu is populated
* cpu_online_mask - has bit 'cpu' set iff cpu available to scheduler
* cpu_active_mask - has bit 'cpu' set iff cpu available to migration
*
* If !CONFIG_HOTPLUG_CPU, present == possible, and active == online.
*
* The cpu_possible_mask is fixed at boot time, as the set of CPU id's
* that it is possible might ever be plugged in at anytime during the
* life of that system boot. The cpu_present_mask is dynamic(*),
* representing which CPUs are currently plugged in. And
* cpu_online_mask is the dynamic subset of cpu_present_mask,
* indicating those CPUs available for scheduling.
*
* If HOTPLUG is enabled, then cpu_possible_mask is forced to have
* all NR_CPUS bits set, otherwise it is just the set of CPUs that
* ACPI reports present at boot.
*
* If HOTPLUG is enabled, then cpu_present_mask varies dynamically,
* depending on what ACPI reports as currently plugged in, otherwise
* cpu_present_mask is just a copy of cpu_possible_mask.
*
* (*) Well, cpu_present_mask is dynamic in the hotplug case. If not
* hotplug, it's a copy of cpu_possible_mask, hence fixed at boot.
*/
So for host systems, present is (usually) equal to possible and for
guest systems present should indicate the CPUs found to be present
at boottime. The intention of my original patch was to use this
metric in slub page order calculation rather than nr_cpus_ids
or num_cpus_possible() which could be high on guest systems that
typically support CPU hotplug.
Regards,
Bharata.
Powered by blists - more mailing lists