linux-kernel - Re: [RFC PATCH v0] mm/slub: Let number of online CPUs determine the slub page order

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210126135918.GQ827@dhcp22.suse.cz>
Date:   Tue, 26 Jan 2021 14:59:18 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Vlastimil Babka <vbabka@...e.cz>, Christoph Lameter <cl@...ux.com>,
        Bharata B Rao <bharata@...ux.ibm.com>,
        linux-kernel <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
        David Rientjes <rientjes@...gle.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Andrew Morton <akpm@...ux-foundation.org>, guro@...com,
        Shakeel Butt <shakeelb@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        aneesh.kumar@...ux.ibm.com, Jann Horn <jannh@...gle.com>
Subject: Re: [RFC PATCH v0] mm/slub: Let number of online CPUs determine the
 slub page order

On Tue 26-01-21 14:38:14, Vincent Guittot wrote:
> On Tue, 26 Jan 2021 at 09:52, Michal Hocko <mhocko@...e.com> wrote:
> >
> > On Thu 21-01-21 19:19:21, Vlastimil Babka wrote:
> > [...]
> > > We could also start questioning the very assumption that number of cpus should
> > > affect slab page size in the first place. Should it? After all, each CPU will
> > > have one or more slab pages privately cached, as we discuss in the other
> > > thread... So why make the slab pages also larger?
> >
> > I do agree. What is the acutal justification for this scaling?
> >         /*
> >          * Attempt to find best configuration for a slab. This
> >          * works by first attempting to generate a layout with
> >          * the best configuration and backing off gradually.
> >          *
> >          * First we increase the acceptable waste in a slab. Then
> >          * we reduce the minimum objects required in a slab.
> >          */
> >
> > doesn't speak about CPUs.  9b2cd506e5f2 ("slub: Calculate min_objects
> > based on number of processors.") does talk about hackbench "This has
> > been shown to address the performance issues in hackbench on 16p etc."
> > but it doesn't give any more details to tell actually _why_ that works.
> >
> > This thread shows that this is still somehow related to performance but
> > the real reason is not clear. I believe we should be focusing on the
> > actual reasons for the performance impact than playing with some fancy
> > math and tuning for a benchmark on a particular machine which doesn't
> > work for others due to subtle initialization timing issues.
> >
> > Fundamentally why should higher number of CPUs imply the size of slab in
> > the first place?
> 
> A 1st answer is that the activity and the number of threads involved
> scales with the number of CPUs. Regarding the hackbench benchmark as
> an example, the number of group/threads raise to a higher level on the
> server than on the small system which doesn't seem unreasonable.
> 
> On 8 CPUs, I run hackbench with up to 16 groups which means 16*40
> threads. But I raise up to 256 groups, which means 256*40 threads, on
> the 224 CPUs system. In fact, hackbench -g 1 (with 1 group) doesn't
> regress on the 224 CPUs  system.  The next test with 4 groups starts
> to regress by -7%. But the next one: hackbench -g 16 regresses by 187%
> (duration is almost 3 times longer). It seems reasonable to assume
> that the number of running threads and resources scale with the number
> of CPUs because we want to run more stuff.

OK, I do understand that more jobs scale with the number of CPUs but I
would also expect that higher order pages are generally more expensive
to get so this is not really a clear cut especially under some more
demand on the memory where allocations are smooth. So the question
really is whether this is not just optimizing for artificial conditions.
-- 
Michal Hocko
SUSE Labs