[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211017135708.GA8442@kvm.asia-northeast3-a.c.our-ratio-313919.internal>
Date: Sun, 17 Oct 2021 13:57:08 +0000
From: Hyeonggon Yoo <42.hyeyoo@...il.com>
To: linux-mm@...ck.org
Cc: linux-kernel@...r.kernel.org, Christoph Lameter <cl@...ux.com>,
Pekka Enberg <penberg@...nel.org>,
David Rientjes <rientjes@...gle.com>,
Joonsoo Kim <iamjoonsoo.kim@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>,
Hyeonggon Yoo <42.hyeyoo@...il.com>
Subject: Do we really need SLOB nowdays?
On Sun, Oct 17, 2021 at 01:36:18PM +0000, Hyeonggon Yoo wrote:
> On Sun, Oct 17, 2021 at 04:28:52AM +0000, Hyeonggon Yoo wrote:
> > I've been reading SLUB/SLOB code for a while. SLUB recently became
> > real time compatible by reducing its locking area.
> >
> > for now, SLUB is the only slab allocator for PREEMPT_RT because
> > it works better than SLAB on RT and SLOB uses non-deterministic method,
> > sequential fit.
> >
> > But memory usage of SLUB is too high for systems with low memory.
> > So In my local repository I made SLOB to use segregated free list
> > method, which is more more deterministic, to provide bounded latency.
> >
> > This can be done by managing list of partial pages globally
> > for every power of two sizes (8, 16, 32, ..., PAGE_SIZE) per NUMA nodes.
> > minimal allocation size is size of pointers to keep pointer of next free object
> > like SLUB.
> >
> > By making objects in same page to have same size, there's no
> > need to iterate free blocks in a page. (Also iterating pages isn't needed)
> >
> > Some cleanups and more tests (especially with NUMA/RT configs) needed,
> > but want to hear your opinion about the idea. Did not test on RT yet.
> >
> > Below is result of benchmarks and memory usage. (on !RT)
> > with 13% increase in memory usage, it's nine times faster and
> > bounded fragmentation, and importantly provides predictable execution time.
> >
>
> Hello linux-mm, I improved it and it uses lower memory
> and 9x~13x faster than original SLOB. it shows much less fragmentation
> after hackbench.
>
> Rather than managing global freelist that has power of 2 sizes,
> I made a kmem_cache to manage its own freelist (for each NUMA nodes) and
> Added support for slab merging. So It quite looks like a lightweight SLUB now.
>
> I'll send rfc patch after some testing and code cleaning.
>
> I think it is more RT-friendly becuase it's uses more deterministic
> algorithm (But lock is still shared among cpus). Any opinions for RT?
Hi there. after some thinking, I got a new question:
If a lightweight SLUB is better than SLOB,
Do we really need SLOB nowdays?
And one more question:
in Christoph's presentation [1], it says SLOB uses
300 KB of memory. but on my system it uses almost 8000 KB.
what's is differences?
[1] https://events.static.linuxfound.org/sites/events/files/slides/slaballocators.pdf
SLUB without cpu partials:
memory usage:
after boot:
Slab: 8672 kB
after hackbench:
Slab: 9540 kB
Performance counter stats for 'hackbench -g 4 -l 10000':
48463.05 msec cpu-clock # 1.995 CPUs utilized
944154 context-switches # 19.482 K/sec
8161 cpu-migrations # 168.396 /sec
4117 page-faults # 84.951 /sec
52570808507 cycles # 1.085 GHz
65083778667 instructions # 1.24 insn per cycle
234990576 branch-misses
23628671709 cache-references # 487.561 M/sec
739599271 cache-misses # 3.130 % of all cache refs
24.287392120 seconds time elapsed
1.509198000 seconds user
46.942748000 seconds sys
> current SLOB:
> memory usage:
> after boot:
> Slab: 7908 kB
> after hackbench:
> Slab: 8544 kB
>
> Time: 189.947
> Performance counter stats for 'hackbench -g 4 -l 10000':
> 379413.20 msec cpu-clock # 1.997 CPUs utilized
> 8818226 context-switches # 23.242 K/sec
> 375186 cpu-migrations # 988.859 /sec
> 3954 page-faults # 10.421 /sec
> 269923095290 cycles # 0.711 GHz
> 212341582012 instructions # 0.79 insn per cycle
> 2361087153 branch-misses
> 58222839688 cache-references # 153.455 M/sec
> 6786521959 cache-misses # 11.656 % of all cache refs
>
> 190.002062273 seconds time elapsed
>
> 3.486150000 seconds user
> 375.599495000 seconds sys
>
> SLOB with segregated list + slab merging:
> memory usage:
> after boot:
> Slab: 7560 kB
> after hackbench:
> Slab: 7836 kB
>
> hackbench:
> Time: 20.780
> Performance counter stats for 'hackbench -g 4 -l 10000':
> 41509.79 msec cpu-clock # 1.996 CPUs utilized
> 630032 context-switches # 15.178 K/sec
> 8287 cpu-migrations # 199.640 /sec
> 4036 page-faults # 97.230 /sec
> 57477161020 cycles # 1.385 GHz
> 62775453932 instructions # 1.09 insn per cycle
> 164902523 branch-misses
> 22559952993 cache-references # 543.485 M/sec
> 832404011 cache-misses # 3.690 % of all cache refs
>
> 20.791893590 seconds time elapsed
>
> 1.423282000 seconds user
> 40.072449000 seconds sys
> -
> Thanks,
> Hyeonggon
Powered by blists - more mailing lists