lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aBIXJrbxCmYSoCuz@Asmaa.>
Date: Wed, 30 Apr 2025 05:27:18 -0700
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Vitaly Wool <vitaly.wool@...sulko.se>
Cc: linux-mm@...ck.org, akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org, Nhat Pham <nphamcs@...il.com>,
	Shakeel Butt <shakeel.butt@...ux.dev>,
	Johannes Weiner <hannes@...xchg.org>,
	Igor Belousov <igor.b@...dev.am>, Minchan Kim <minchan@...nel.org>,
	Sergey Senozhatsky <senozhatsky@...omium.org>
Subject: Re: [PATCH v4] mm: add zblock allocator

On Wed, Apr 23, 2025 at 09:53:48PM +0200, Vitaly Wool wrote:
> On 4/22/25 12:46, Yosry Ahmed wrote:
> > I didn't look too closely but I generally agree that we should improve
> > zsmalloc where possible rather than add a new allocator. We are trying
> > not to repeat the zbud/z3fold or slub/slob stories here. Zsmalloc is
> > getting a lot of mileage from both zswap and zram, and is more-or-less
> > battle-tested. Let's work toward building upon that instead of starting
> > over.
> 
> The thing here is, zblock is using a very different approach to small object
> allocation. The idea is: we have an array of descriptors which correspond to
> multi-page blocks divided in chunks of equal size (block_size[i]). For each
> object of size x we find the descriptor n such as:
> 	block_size[n-1] < n < block_size[n]
> and then we store that object in an empty slot in one of the blocks. Thus,
> the density is high, the search is fast (rbtree based) and there are no
> objects spanning over 2 pages, so no extra memcpy involved.

The block sizes seem to be similar in principle to class sizes in
zsmalloc. It seems to me that there are two apparent differentiating
properties to zblock:

- Block lookup uses an rbtree, so it's faster than zsmalloc's list
  iteration. On the other hand, zsmalloc divides each class into
  fullness groups and tries to pack almost full groups first. Not sure
  if zblock's approach is strictly better.

- Zblock uses higher order allocations vs. zsmalloc always using order-0
  allocations. I think this may be the main advantage and I remember
  asking if zsmalloc can support this. Always using order-0 pages is
  more reliable but may not always be the best choice.

On the other hand, zblock is lacking in other regards. For example:
- The lack of compaction means that certain workloads will see a lot of
  fragmentation. It purely depends on the access patterns. We could end
  up with a lot of blocks each containing a single object and there is
  no way to recover AFAICT.

- Zblock will fail if a high order allocation cannot be satisfied, which
  is more likely to happen under memory pressure, and it's usually when
  zblock is needed in the first place.

- There's probably more, I didn't check too closely, and I am hoping
  that Minchan and Sergey will chime in here.

> 
> And with the latest zblock, we see that it has a clear advantage in
> performance over zsmalloc, retaining roughly the same allocation density for
> 4K pages and scoring better on 16K pages. E. g. on a kernel compilation:
> 
> * zsmalloc/zstd/make -j32 bzImage
> 	real	8m0.594s
> 	user	39m37.783s
> 	sys	8m24.262s
> 	Zswap:            200600 kB <-- after build completion
> 	Zswapped:         854072 kB <-- after build completion
> 	zswpin 309774
> 	zswpout 1538332
> 
> * zblock/zstd/make -j32 bzImage
> 	real	7m35.546s
> 	user	38m03.475s
> 	sys	7m47.407s
> 	Zswap:            250940 kB <-- after build completion
> 	Zswapped:         870660 kB <-- after build completion
> 	zswpin 248606
> 	zswpout 1277319
> 
> So what we see here is that zblock is definitely faster and at least not
> worse with regard to allocation density under heavy load. It has slightly
> worse _idle_ allocation density but since it will quickly catch up under
> load it is not really important. What is important is that its
> characteristics don't deteriorate over time. Overall, zblock is simple and
> efficient and there is /raison d'etre/ for it.

Zblock is performing better for this specific workload, but as I
mentioned earlier there are other aspects that zblock is missing.
Zsmalloc has seen a very large range of workloads of different types,
and we cannot just dismiss this.

> 
> Now, it is indeed possible to partially rework zsmalloc using zblock's
> algorithm but this will be a rather substantial change, equal or bigger in
> effort to implementing the approach described above from scratch (and this
> is what we did), and with such drastic changes most of the testing that has
> been done with zsmalloc would be invalidated, and we'll be out in the wild
> anyway. So even though I see your point, I don't think it applies in this
> particular case.


Well, we should start by breaking down the differences and finding out
why zblock is performing better, as I mentioned above. If it's the
faster lookups or higher order allocations, we can work to support that
in zsmalloc. Similarly, if zsmalloc has unnecessary complexity it'd be
great to get rid of it rather than starting over.

Also, we don't have to do it all at once and invalidate the testing that
zsmalloc has seen. These can be incremental changes that get spread over
multiple releases, getting incremental exposure in the process.

> 
> ~Vitaly

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ