netdev - Re: [PATCH 0/1] mm: Remove the SLAB allocator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190417133852.GL5878@dhcp22.suse.cz>
Date:   Wed, 17 Apr 2019 15:38:52 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Jesper Dangaard Brouer <netdev@...uer.com>
Cc:     Pekka Enberg <penberg@....fi>, "Tobin C. Harding" <me@...in.cc>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Tobin C. Harding" <tobin@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Christoph Lameter <cl@...ux.com>,
        Pekka Enberg <penberg@...nel.org>,
        David Rientjes <rientjes@...gle.com>,
        Joonsoo Kim <iamjoonsoo.kim@....com>,
        Tejun Heo <tj@...nel.org>, Qian Cai <cai@....pw>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Mel Gorman <mgorman@...hsingularity.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Alexander Duyck <alexander.duyck@...il.com>
Subject: Re: [PATCH 0/1] mm: Remove the SLAB allocator

On Wed 17-04-19 10:50:18, Jesper Dangaard Brouer wrote:
> On Thu, 11 Apr 2019 11:27:26 +0300
> Pekka Enberg <penberg@....fi> wrote:
> 
> > Hi,
> > 
> > On 4/11/19 10:55 AM, Michal Hocko wrote:
> > > Please please have it more rigorous then what happened when SLUB was
> > > forced to become a default  
> > 
> > This is the hard part.
> > 
> > Even if you are able to show that SLUB is as fast as SLAB for all the 
> > benchmarks you run, there's bound to be that one workload where SLUB 
> > regresses. You will then have people complaining about that (rightly so) 
> > and you're again stuck with two allocators.
> > 
> > To move forward, I think we should look at possible *pathological* cases 
> > where we think SLAB might have an advantage. For example, SLUB had much 
> > more difficulties with remote CPU frees than SLAB. Now I don't know if 
> > this is the case, but it should be easy to construct a synthetic 
> > benchmark to measure this.
> 
> I do think SLUB have a number of pathological cases where SLAB is
> faster.  If was significantly more difficult to get good bulk-free
> performance for SLUB.  SLUB is only fast as long as objects belong to
> the same page.  To get good bulk-free performance if objects are
> "mixed", I coded this[1] way-too-complex fast-path code to counter
> act this (joined work with Alex Duyck).
> 
> [1] https://github.com/torvalds/linux/blob/v5.1-rc5/mm/slub.c#L3033-L3113

How often is this a real problem for real workloads?

> > For example, have a userspace process that does networking, which is 
> > often memory allocation intensive, so that we know that SKBs traverse 
> > between CPUs. You can do this by making sure that the NIC queues are 
> > mapped to CPU N (so that network softirqs have to run on that CPU) but 
> > the process is pinned to CPU M.
> 
> If someone want to test this with SKBs then be-aware that we netdev-guys
> have a number of optimizations where we try to counter act this. (As
> minimum disable TSO and GRO).
> 
> It might also be possible for people to get inspired by and adapt the
> micro benchmarking[2] kernel modules that I wrote when developing the
> SLUB and SLAB optimizations:
> 
> [2] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm

While microbenchmarks are good to see pathological behavior, I would be
really interested to see some numbers for real world usecases.
 
> > It's, of course, worth thinking about other pathological cases too. 
> > Workloads that cause large allocations is one. Workloads that cause lots 
> > of slab cache shrinking is another.
> 
> I also worry about long uptimes when SLUB objects/pages gets too
> fragmented... as I said SLUB is only efficient when objects are
> returned to the same page, while SLAB is not.

Is this something that has been actually measured in a real deployment?
-- 
Michal Hocko
SUSE Labs