[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200620005718.GF237539@carbon.dhcp.thefacebook.com>
Date: Fri, 19 Jun 2020 17:57:18 -0700
From: Roman Gushchin <guro@...com>
To: Mel Gorman <mgorman@...hsingularity.net>
CC: Vlastimil Babka <vbabka@...e.cz>,
Shakeel Butt <shakeelb@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Christoph Lameter <cl@...ux.com>,
Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Linux MM <linux-mm@...ck.org>,
Kernel Team <kernel-team@...com>,
LKML <linux-kernel@...r.kernel.org>,
Jesper Dangaard Brouer <brouer@...hat.com>
Subject: Re: [PATCH v6 00/19] The new cgroup slab memory controller
On Wed, Jun 17, 2020 at 03:31:10PM +0100, Mel Gorman wrote:
> On Wed, Jun 17, 2020 at 01:24:21PM +0200, Vlastimil Babka wrote:
> > > Not really.
> > >
> > > Sharing a single set of caches adds some overhead to root- and non-accounted
> > > allocations, which is something I've tried hard to avoid in my original version.
> > > But I have to admit, it allows to simplify and remove a lot of code, and here
> > > it's hard to argue with Johanness, who pushed on this design.
> > >
> > > With performance testing it's not that easy, because it's not obvious what
> > > we wanna test. Obviously, per-object accounting is more expensive, and
> > > measuring something like 1000000 allocations and deallocations in a line from
> > > a single kmem_cache will show a regression. But in the real world the relative
> > > cost of allocations is usually low, and we can get some benefits from a smaller
> > > working set and from having shared kmem_cache objects cache hot.
> > > Not speaking about some extra memory and the fragmentation reduction.
> > >
> > > We've done an extensive testing of the original version in Facebook production,
> > > and we haven't noticed any regressions so far. But I have to admit, we were
> > > using an original version with two sets of kmem_caches.
> > >
> > > If you have any specific tests in mind, I can definitely run them. Or if you
> > > can help with the performance evaluation, I'll appreciate it a lot.
> >
> > Jesper provided some pointers here [1], it would be really great if you could
> > run at least those microbenchmarks. With mmtests it's the major question of
> > which subset/profiles to run, maybe the referenced commits provide some hints,
> > or maybe Mel could suggest what he used to evaluate SLAB vs SLUB not so long ago.
> >
>
> Last time the list of mmtests configurations I used for a basic
> comparison were
>
> db-pgbench-timed-ro-small-ext4
> db-pgbench-timed-ro-small-xfs
> io-dbench4-async-ext4
> io-dbench4-async-xfs
> io-bonnie-dir-async-ext4
> io-bonnie-dir-async-xfs
> io-bonnie-file-async-ext4
> io-bonnie-file-async-xfs
> io-fsmark-xfsrepair-xfs
> io-metadata-xfs
> network-netperf-unbound
> network-netperf-cross-node
> network-netperf-cross-socket
> network-sockperf-unbound
> network-netperf-unix-unbound
> network-netpipe
> network-tbench
> pagereclaim-shrinker-ext4
> scheduler-unbound
> scheduler-forkintensive
> workload-kerndevel-xfs
> workload-thpscale-madvhugepage-xfs
> workload-thpscale-xfs
>
> Some were more valid than others in terms of doing an evaluation. I
> followed up later with a more comprehensive comparison but that was
> overkill.
>
> Each time I did a slab/slub comparison in the past, I had to reverify
> the rate that kmem_cache_* functions were actually being called as the
> pattern can change over time even for the same workload. A comparison
> gets more complicated when comparing cgroups as ideally there would be
> workloads running in multiple group but that gets complex and I think
> it's reasonable to just test the "basic" case without cgroups.
Thank you Mel for the suggestion!
I'll try to come up with some numbers soon. I guess networking tests
will be most interesting in this case.
Thanks!
Roman
Powered by blists - more mailing lists