[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c612aff8-1b07-43aa-b909-f555da511da2@konsulko.se>
Date: Thu, 1 May 2025 14:41:29 +0200
From: Vitaly Wool <vitaly.wool@...sulko.se>
To: Yosry Ahmed <yosry.ahmed@...ux.dev>
Cc: linux-mm@...ck.org, akpm@...ux-foundation.org,
linux-kernel@...r.kernel.org, Nhat Pham <nphamcs@...il.com>,
Shakeel Butt <shakeel.butt@...ux.dev>, Johannes Weiner <hannes@...xchg.org>,
Igor Belousov <igor.b@...dev.am>, Minchan Kim <minchan@...nel.org>,
Sergey Senozhatsky <senozhatsky@...omium.org>
Subject: Re: [PATCH v4] mm: add zblock allocator
Hi Yosry,
On 4/30/25 14:27, Yosry Ahmed wrote:
> On Wed, Apr 23, 2025 at 09:53:48PM +0200, Vitaly Wool wrote:
>> On 4/22/25 12:46, Yosry Ahmed wrote:
>>> I didn't look too closely but I generally agree that we should improve
>>> zsmalloc where possible rather than add a new allocator. We are trying
>>> not to repeat the zbud/z3fold or slub/slob stories here. Zsmalloc is
>>> getting a lot of mileage from both zswap and zram, and is more-or-less
>>> battle-tested. Let's work toward building upon that instead of starting
>>> over.
>>
>> The thing here is, zblock is using a very different approach to small object
>> allocation. The idea is: we have an array of descriptors which correspond to
>> multi-page blocks divided in chunks of equal size (block_size[i]). For each
>> object of size x we find the descriptor n such as:
>> block_size[n-1] < n < block_size[n]
>> and then we store that object in an empty slot in one of the blocks. Thus,
>> the density is high, the search is fast (rbtree based) and there are no
>> objects spanning over 2 pages, so no extra memcpy involved.
>
> The block sizes seem to be similar in principle to class sizes in
> zsmalloc. It seems to me that there are two apparent differentiating
> properties to zblock:
>
> - Block lookup uses an rbtree, so it's faster than zsmalloc's list
> iteration. On the other hand, zsmalloc divides each class into
> fullness groups and tries to pack almost full groups first. Not sure
> if zblock's approach is strictly better.
If we free a slot in a fully packed block we put it on top of the list.
zswap's normal operation pattern is that there will be more free slots
in that block so it's roughly the same.
> - Zblock uses higher order allocations vs. zsmalloc always using order-0
> allocations. I think this may be the main advantage and I remember
> asking if zsmalloc can support this. Always using order-0 pages is
> more reliable but may not always be the best choice.
There's a patch we'll be posting soon with "opportunistic" high order
allocations (i. e. if try_alloc_pages fails, allocate order-0 pages
instead). This will leverage the benefits of higher order allocations
without putting too much stress on the system.
> On the other hand, zblock is lacking in other regards. For example:
> - The lack of compaction means that certain workloads will see a lot of
> fragmentation. It purely depends on the access patterns. We could end
> up with a lot of blocks each containing a single object and there is
> no way to recover AFAICT.
We have been giving many variants of stress load on the memory subsystem
and the worst compression ratio *after* the stress load was 2.8x using
zstd as the compressor (and about 4x under load). With zsmalloc under
the same conditions the ratio was 3.6x after and 4x under load.
With more normal (but still stressing) usage patterns the numbers
*after* the stress load were around 3.8x and 4.1x, respectively.
Bottom line, ending up with a lot of blocks each containing a single
object is not a real life scenario. With that said, we have a quite
simple solution in the making that will get zblock on par with zsmalloc
even in the cases described above.
> - Zblock will fail if a high order allocation cannot be satisfied, which
> is more likely to happen under memory pressure, and it's usually when
> zblock is needed in the first place.
See above, this issue will be addressed in the patch coming in a really
short while.
> - There's probably more, I didn't check too closely, and I am hoping
> that Minchan and Sergey will chime in here.
>
>>
>> And with the latest zblock, we see that it has a clear advantage in
>> performance over zsmalloc, retaining roughly the same allocation density for
>> 4K pages and scoring better on 16K pages. E. g. on a kernel compilation:
>>
>> * zsmalloc/zstd/make -j32 bzImage
>> real 8m0.594s
>> user 39m37.783s
>> sys 8m24.262s
>> Zswap: 200600 kB <-- after build completion
>> Zswapped: 854072 kB <-- after build completion
>> zswpin 309774
>> zswpout 1538332
>>
>> * zblock/zstd/make -j32 bzImage
>> real 7m35.546s
>> user 38m03.475s
>> sys 7m47.407s
>> Zswap: 250940 kB <-- after build completion
>> Zswapped: 870660 kB <-- after build completion
>> zswpin 248606
>> zswpout 1277319
>>
>> So what we see here is that zblock is definitely faster and at least not
>> worse with regard to allocation density under heavy load. It has slightly
>> worse _idle_ allocation density but since it will quickly catch up under
>> load it is not really important. What is important is that its
>> characteristics don't deteriorate over time. Overall, zblock is simple and
>> efficient and there is /raison d'etre/ for it.
>
> Zblock is performing better for this specific workload, but as I
> mentioned earlier there are other aspects that zblock is missing.
> Zsmalloc has seen a very large range of workloads of different types,
> and we cannot just dismiss this.
We've been running many different work loads with both allocators but
posting all the results in the patch description will go well beyond the
purpose of a patch submission. If there are some workloads you are
interested in in particular, please let me know, odds are high we have
some results for those too.
>> Now, it is indeed possible to partially rework zsmalloc using zblock's
>> algorithm but this will be a rather substantial change, equal or bigger in
>> effort to implementing the approach described above from scratch (and this
>> is what we did), and with such drastic changes most of the testing that has
>> been done with zsmalloc would be invalidated, and we'll be out in the wild
>> anyway. So even though I see your point, I don't think it applies in this
>> particular case.
>
>
> Well, we should start by breaking down the differences and finding out
> why zblock is performing better, as I mentioned above. If it's the
> faster lookups or higher order allocations, we can work to support that
> in zsmalloc. Similarly, if zsmalloc has unnecessary complexity it'd be
> great to get rid of it rather than starting over.
>
> Also, we don't have to do it all at once and invalidate the testing that
> zsmalloc has seen. These can be incremental changes that get spread over
> multiple releases, getting incremental exposure in the process.
I believe we are a lot closer now to having a zblock without the initial
drawbacks you have pointed out than a faster zsmalloc, retaining the
code simplicity of the former.
~Vitaly
Powered by blists - more mailing lists