linux-kernel - Re: [RFC PATCH v0] mm/slub: Let number of online CPUs determine the slub page order

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.22.394.2102040851170.76039@www.lameter.com>
Date:   Thu, 4 Feb 2021 09:07:27 +0000 (UTC)
From:   Christoph Lameter <cl@...ux.com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
cc:     Bharata B Rao <bharata@...ux.ibm.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Will Deacon <will@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>, linux-mm@...ck.org,
        David Rientjes <rientjes@...gle.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Andrew Morton <akpm@...ux-foundation.org>, guro@...com,
        Shakeel Butt <shakeelb@...gle.com>,
        Johannes Weiner <hannes@...xchg.org>,
        aneesh.kumar@...ux.ibm.com, Jann Horn <jannh@...gle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Catalin Marinas <Catalin.Marinas@....com>
Subject: Re: [RFC PATCH v0] mm/slub: Let number of online CPUs determine the
 slub page order

On Thu, 4 Feb 2021, Vincent Guittot wrote:

> > So what is preferrable here now? Above or other quick fix or reverting
> > the original commit?
>
> I'm fine with whatever the solution as long as we can use keep using
> nr_cpu_ids when other values like num_present_cpus, don't reflect
> correctly the system

AFAICT they are correctly reflecting the current state of the system.

The problem here is the bringup of the system and the tuning therefor.

One additional thing that may help: The slab caches can work in a degraded
mode where no fastpath allocations can occur. That mode is used primarily
for debugging but maybe that mode can also help during bootstrap to avoid
having to deal with the per cpu data and so on.

In degraded mode SLUB will take a lock for each operation on an object.

In this mode the following is true

  kmem_cache_cpu->page == NULL
  kmem_cache_cpu->freelist == NULL

   kmem_cache_debug(s) == true

So if you define a new debug mode and include it in SLAB_DEBUG_FLAGS then
you can force SLUB to fallback to operations where a lock is taken and
where slab allocation can be stopped. This may be ok for bring up.

The debug flags are also tied to some wizardry that can patch the code at
runtime to optimize for debubgging or fast operations. You would tie into
that one as well. Start in debug mode by default and switch to fast
operations after all processors are up.