lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <yr7cekjrssjiwqlkrmreugl6fhywssutjzg3ll45mcvdjklnzy@5vkgju4wwtrg>
Date: Mon, 5 Jan 2026 16:01:54 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sergey Senozhatsky <senozhatsky@...omium.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, 
	Nhat Pham <nphamcs@...il.com>, Minchan Kim <minchan@...nel.org>, 
	Johannes Weiner <hannes@...xchg.org>, Brian Geffon <bgeffon@...gle.com>, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org
Subject: Re: [RFC PATCH 2/2] zsmalloc: chain-length configuration should
 consider other metrics

On Mon, Jan 05, 2026 at 04:23:39PM +0900, Sergey Senozhatsky wrote:
> On (26/01/05 10:42), Sergey Senozhatsky wrote:
> > On (26/01/02 18:29), Yosry Ahmed wrote:
> > > On Thu, Jan 01, 2026 at 10:38:14AM +0900, Sergey Senozhatsky wrote:
> > [..]
> > > 
> > > I worry that the heuristics are too hand-wavy
> > 
> > I don't disagree.  Am not super excited about the heuristics either.
> > 
> > > and I wonder if the memcpy savings actually show up as perf improvements
> > > in any real life workload. Do we have data about this?
> > 
> > I don't have real life 16K PAGE_SIZE devices.  However, on 16K PAGE_SIZE
> > systems we have "normal" size-classes up to a very large size, and normal
> > class means chaining of 0-order physical pages, and chaining means spanning.
> > So on 16K memcpy overhead is expected to be somewhat noticeable.
> 
> By the way, while looking at it, I think we need to "fix" obj_read_begin().
> Currently, it uses "off + class->size" to detect spanning objects, which is
> incorrect: size classes get merged, so a typical size class can hold a range
> of sizes, using padding for smaller objects.  So instead of class->size we
> need to use the actual compressed objects size, just in case if actual written
> size was small enough to fit into the first physical page (we do that in
> obj_write()).  I'll cook a patch.

We also need to handle zs_obj_read_end() to do the kunmap() call
correctly.

> 
> Something like this:
> 
> ---
> 
>  drivers/block/zram/zram_drv.c | 8 +++++---
>  include/linux/zsmalloc.h      | 2 +-
>  mm/zsmalloc.c                 | 4 ++--
>  mm/zswap.c                    | 3 ++-
>  4 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index a6587bed6a03..b371ba6bfec2 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -2065,7 +2065,7 @@ static int read_incompressible_page(struct zram *zram, struct page *page,
>  	void *src, *dst;
>  
>  	handle = get_slot_handle(zram, index);
> -	src = zs_obj_read_begin(zram->mem_pool, handle, NULL);
> +	src = zs_obj_read_begin(zram->mem_pool, handle, PAGE_SIZE, NULL);
>  	dst = kmap_local_page(page);
>  	copy_page(dst, src);
>  	kunmap_local(dst);
> @@ -2087,7 +2087,8 @@ static int read_compressed_page(struct zram *zram, struct page *page, u32 index)
>  	prio = get_slot_comp_priority(zram, index);
>  
>  	zstrm = zcomp_stream_get(zram->comps[prio]);
> -	src = zs_obj_read_begin(zram->mem_pool, handle, zstrm->local_copy);
> +	src = zs_obj_read_begin(zram->mem_pool, handle, size,
> +				zstrm->local_copy);
>  	dst = kmap_local_page(page);
>  	ret = zcomp_decompress(zram->comps[prio], zstrm, src, size, dst);
>  	kunmap_local(dst);
> @@ -2114,7 +2115,8 @@ static int read_from_zspool_raw(struct zram *zram, struct page *page, u32 index)
>  	 * takes place here, as we read raw compressed data.
>  	 */
>  	zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]);
> -	src = zs_obj_read_begin(zram->mem_pool, handle, zstrm->local_copy);
> +	src = zs_obj_read_begin(zram->mem_pool, handle, size,
> +				zstrm->local_copy);
>  	memcpy_to_page(page, 0, src, size);
>  	zs_obj_read_end(zram->mem_pool, handle, src);
>  	zcomp_stream_put(zstrm);
> diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
> index f3ccff2d966c..64f65c1f14d6 100644
> --- a/include/linux/zsmalloc.h
> +++ b/include/linux/zsmalloc.h
> @@ -40,7 +40,7 @@ unsigned int zs_lookup_class_index(struct zs_pool *pool, unsigned int size);
>  void zs_pool_stats(struct zs_pool *pool, struct zs_pool_stats *stats);
>  
>  void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
> -			void *local_copy);
> +			size_t mem_len, void *local_copy);
>  void zs_obj_read_end(struct zs_pool *pool, unsigned long handle,
>  		     void *handle_mem);
>  void zs_obj_write(struct zs_pool *pool, unsigned long handle,
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index be385609ef8a..2da60c23cd18 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1070,7 +1070,7 @@ unsigned long zs_get_total_pages(struct zs_pool *pool)
>  EXPORT_SYMBOL_GPL(zs_get_total_pages);
>  
>  void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
> -			void *local_copy)
> +			size_t mem_len, void *local_copy)
>  {
>  	struct zspage *zspage;
>  	struct zpdesc *zpdesc;
> @@ -1092,7 +1092,7 @@ void *zs_obj_read_begin(struct zs_pool *pool, unsigned long handle,
>  	class = zspage_class(pool, zspage);
>  	off = offset_in_page(class->size * obj_idx);
>  
> -	if (off + class->size <= PAGE_SIZE) {
> +	if (off + mem_len <= PAGE_SIZE) {
>  		/* this object is contained entirely within a page */
>  		addr = kmap_local_zpdesc(zpdesc);
>  		addr += off;
> diff --git a/mm/zswap.c b/mm/zswap.c
> index de8858ff1521..291352629616 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -937,7 +937,8 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
>  	u8 *src, *obj;
>  
>  	acomp_ctx = acomp_ctx_get_cpu_lock(pool);
> -	obj = zs_obj_read_begin(pool->zs_pool, entry->handle, acomp_ctx->buffer);
> +	obj = zs_obj_read_begin(pool->zs_pool, entry->handle, entry->length,
> +				acomp_ctx->buffer);
>  
>  	/* zswap entries of length PAGE_SIZE are not compressed. */
>  	if (entry->length == PAGE_SIZE) {
> -- 
> 2.52.0.351.gbe84eed79e-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ