[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190417133852.GL5878@dhcp22.suse.cz>
Date: Wed, 17 Apr 2019 15:38:52 +0200
From: Michal Hocko <mhocko@...nel.org>
To: Jesper Dangaard Brouer <netdev@...uer.com>
Cc: Pekka Enberg <penberg@....fi>, "Tobin C. Harding" <me@...in.cc>,
Vlastimil Babka <vbabka@...e.cz>,
"Tobin C. Harding" <tobin@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Christoph Lameter <cl@...ux.com>,
Pekka Enberg <penberg@...nel.org>,
David Rientjes <rientjes@...gle.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Tejun Heo <tj@...nel.org>, Qian Cai <cai@....pw>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Mel Gorman <mgorman@...hsingularity.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Alexander Duyck <alexander.duyck@...il.com>
Subject: Re: [PATCH 0/1] mm: Remove the SLAB allocator
On Wed 17-04-19 10:50:18, Jesper Dangaard Brouer wrote:
> On Thu, 11 Apr 2019 11:27:26 +0300
> Pekka Enberg <penberg@....fi> wrote:
>
> > Hi,
> >
> > On 4/11/19 10:55 AM, Michal Hocko wrote:
> > > Please please have it more rigorous then what happened when SLUB was
> > > forced to become a default
> >
> > This is the hard part.
> >
> > Even if you are able to show that SLUB is as fast as SLAB for all the
> > benchmarks you run, there's bound to be that one workload where SLUB
> > regresses. You will then have people complaining about that (rightly so)
> > and you're again stuck with two allocators.
> >
> > To move forward, I think we should look at possible *pathological* cases
> > where we think SLAB might have an advantage. For example, SLUB had much
> > more difficulties with remote CPU frees than SLAB. Now I don't know if
> > this is the case, but it should be easy to construct a synthetic
> > benchmark to measure this.
>
> I do think SLUB have a number of pathological cases where SLAB is
> faster. If was significantly more difficult to get good bulk-free
> performance for SLUB. SLUB is only fast as long as objects belong to
> the same page. To get good bulk-free performance if objects are
> "mixed", I coded this[1] way-too-complex fast-path code to counter
> act this (joined work with Alex Duyck).
>
> [1] https://github.com/torvalds/linux/blob/v5.1-rc5/mm/slub.c#L3033-L3113
How often is this a real problem for real workloads?
> > For example, have a userspace process that does networking, which is
> > often memory allocation intensive, so that we know that SKBs traverse
> > between CPUs. You can do this by making sure that the NIC queues are
> > mapped to CPU N (so that network softirqs have to run on that CPU) but
> > the process is pinned to CPU M.
>
> If someone want to test this with SKBs then be-aware that we netdev-guys
> have a number of optimizations where we try to counter act this. (As
> minimum disable TSO and GRO).
>
> It might also be possible for people to get inspired by and adapt the
> micro benchmarking[2] kernel modules that I wrote when developing the
> SLUB and SLAB optimizations:
>
> [2] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm
While microbenchmarks are good to see pathological behavior, I would be
really interested to see some numbers for real world usecases.
> > It's, of course, worth thinking about other pathological cases too.
> > Workloads that cause large allocations is one. Workloads that cause lots
> > of slab cache shrinking is another.
>
> I also worry about long uptimes when SLUB objects/pages gets too
> fragmented... as I said SLUB is only efficient when objects are
> returned to the same page, while SLAB is not.
Is this something that has been actually measured in a real deployment?
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists