[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <0dbbbe9d17ed489d4a7dbe12026fc6fd@beldev.am>
Date: Sun, 06 Apr 2025 11:53:10 +0400
From: Igor Belousov <igor.b@...dev.am>
To: vitaly.wool@...sulko.se
Cc: Johannes Weiner <hannes@...xchg.org>, linux-mm@...ck.org,
akpm@...ux-foundation.org, linux-kernel@...r.kernel.org, Nhat Pham
<nphamcs@...il.com>, Shakeel Butt <shakeel.butt@...ux.dev>
Subject: Re: [PATCH v2] mm: add zblock allocator
Hi Vitaly,
2025-04-05 03:56 Vitaly wrote:
>
>> Do you have zswap/zswapped meminfo metrics from these tests?
> Yep, and those look somewhat similar:
> - zblock:
> Zswap: 234128 kB
> Zswapped: 733216 kB
> - zsmalloc:
> Zswap: 286080 kB
> Zswapped: 774688 kB
I tested the kernel build on a 4-core virtual machine with allocated 4
GB RAM running on a Ryzen 9.
The results are the following:
1. make -j4:
1.1 zsmalloc:
real 10m59.689s
user 30m53.913s
sys 6m20.720s
zswpin 127811
zswpout 443914
zswpwb 764
Zswap: 292428 kB
Zswapped: 801536 kB
1.2 zblock:
real 11m1.971s
user 30m51.411s
sys 6m18.752s
zswpin 306020
zswpout 732145
zswpwb 2215
Zswap: 291016 kB
Zswapped: 741176 kB
2. make -j8:
2.1 zsmalloc
real 11m40.640s
user 33m3.675s
sys 6m28.126s
Zswap: 281336 kB
Zswapped: 785344 kB
zswpin 308624
zswpout 641576
zswpwb 2674
2.2 zblock
real 11m21.161s
user 32m21.012s
sys 5m53.864s
zswpin 207039
zswpout 621107
zswpwb 3391
Zswap: 326580 kB
Zswapped: 836088 kB
3. make -j16:
3.1 zsmalloc
real 12m42.544s
user 36m3.171s
sys 6m46.036s
Zswap: 249192 kB
Zswapped: 695664 kB
zswpin 193202
zswpout 778746
zswpwb 2611
3.2 zblock
real 12m12.276s
user 35m41.100s
sys 6m30.100s
zswpin 211179
zswpout 853612
zswpwb 2610
Zswap: 327544 kB
Zswapped: 721828 kB
We can observe that the gap between zsmalloc and zblock increases with
the increase of thread number.
>> My concern with this allocator, and the other alternatives to zsmalloc
>> before, is the following:
>> You might be faster at allocating objects. But if storage density is
>> worse, it means you have to zswap more pages to meet the same incoming
>> allocation demand. That means more object allocations and more
>> compression, and often a higher rate of refaults and decompressions.
I don't think we see a substantial difference in storage density, and on
16K pages zblock seems to act even better than zsmalloc. I wasn't able
to test 16K pages on x86_64 because there are multiple dependencies on
the PAGE_SIZE as 4K in the code, but the testing on ARM64 does suggest
that.
Besides, now when we use rbtree for search for the right block type we
can enlarge the table and get better matches and better storage density
without significant impact on performance.
Thanks,
Igor
Powered by blists - more mailing lists