[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4xy3mksbwj61qnNrSpcFgkanEK0tCzJcjQgVF-oAyXe8A@mail.gmail.com>
Date: Fri, 29 Aug 2025 11:28:33 +1200
From: Barry Song <21cnbao@...il.com>
To: "Sridhar, Kanchana P" <kanchana.p.sridhar@...el.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
"hannes@...xchg.org" <hannes@...xchg.org>, "yosry.ahmed@...ux.dev" <yosry.ahmed@...ux.dev>,
"nphamcs@...il.com" <nphamcs@...il.com>, "chengming.zhou@...ux.dev" <chengming.zhou@...ux.dev>,
"usamaarif642@...il.com" <usamaarif642@...il.com>, "ryan.roberts@....com" <ryan.roberts@....com>,
"ying.huang@...ux.alibaba.com" <ying.huang@...ux.alibaba.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"senozhatsky@...omium.org" <senozhatsky@...omium.org>,
"linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
"herbert@...dor.apana.org.au" <herbert@...dor.apana.org.au>, "davem@...emloft.net" <davem@...emloft.net>,
"clabbe@...libre.com" <clabbe@...libre.com>, "ardb@...nel.org" <ardb@...nel.org>,
"ebiggers@...gle.com" <ebiggers@...gle.com>, "surenb@...gle.com" <surenb@...gle.com>,
"Accardi, Kristen C" <kristen.c.accardi@...el.com>, "Gomes, Vinicius" <vinicius.gomes@...el.com>,
"Feghali, Wajdi K" <wajdi.k.feghali@...el.com>, "Gopal, Vinodh" <vinodh.gopal@...el.com>
Subject: Re: [PATCH v11 22/24] mm: zswap: Allocate pool batching resources if
the compressor supports batching.
> >
> > If ZSWAP_MAX_BATCH_SIZE is set to 8 and there is no hardware batching,
> > compression is done with a step size of 1. If the hardware step size is 4,
> > compression occurs in two steps. If the hardware step size is 6, the first
> > compression uses a step size of 6, and the second uses a step size of 2.
> > Do you think this will work?
>
> Hi Barry,
>
> This would be non-optimal from code simplicity and latency perspectives.
> One of the benefits of using the hardware accelerator's "batch parallelism"
> is cost amortization across the batch. We might lose this benefit if we make
> multiple calls to zswap_compress() to ask the hardware accelerator to
> batch compress in smaller batches. Compression throughput would also
> be sub-optimal.
I guess it wouldn’t be an issue if both ZSWAP_MAX_BATCH_SIZE and
pool->compr_batch_size are powers of two. As you mentioned, we still
gain improvement with ZSWAP_MAX_BATCH_SIZE batching even when
pool->compr_batch_size == 1, by compressing pages one by one but
batching other work such as zswap_entries_cache_alloc_batch() ?
>
> In my patch-series, the rule is simple: if an algorithm has specified a
> batch-size, carve out the folio by that "batch-size" # of pages to be
> compressed as a batch in zswap_compress(). This custom batch-size
> is capped at ZSWAP_MAX_BATCH_SIZE.
>
> If an algorithm has not specified a batch-size, the default batch-size
> is 1. In this case, carve out the folio by ZSWAP_MAX_BATCH_SIZE
> # of pages to be compressed as a batch in zswap_compress().
Yes, I understand your rule. However, having two global variables is still
somewhat confusing. It might be clearer to use a single variable with a
comment, since one variable can clearly determine the value of the other.
Can we get the batch_size at runtime based on pool->compr_batch_size?
/*
* If hardware compression supports batching, we use the hardware step size.
* Otherwise, we use ZSWAP_MAX_BATCH_SIZE for batching, but still compress
* one page at a time.
*/
batch_size = pool->compr_batch_size > 1 ? pool->compr_batch_size :
ZSWAP_MAX_BATCH_SIZE;
We probably don’t need this if both pool->compr_batch_size and
ZSWAP_MAX_BATCH_SIZE are powers of two?
>
> >
> > I don’t quite understand why you want to save
> > ZSWAP_MAX_BATCH_SIZE - X resources, since even without hardware
> > batching
> > you are still allocating all ZSWAP_MAX_BATCH_SIZE resources. This is the
> > case for all platforms except yours.
>
> Not sure I understand.. Just to clarify, this is not done to save on resources,
> rather for the reasons stated above.
>
> We are already saving on resources by only allocating only
> "pool->compr_batch_size" number of resources
> (*not* ZSWAP_MAX_BATCH_SIZE resources):
>
> pool->compr_batch_size = min(ZSWAP_MAX_BATCH_SIZE,
> crypto_acomp_batch_size(acomp_ctx->acomp));
>
> For non-Intel platforms, this means only 1 dst buffer is allocated, as
> explained in the commit log for this patch.
you are right. I misunderstood your code :-)
>
> " A "u8 compr_batch_size" member is added to "struct zswap_pool", as per
> Yosry's suggestion. pool->compr_batch_size is set as the minimum of the
> compressor's max batch-size and ZSWAP_MAX_BATCH_SIZE. Accordingly, it
> proceeds to allocate the necessary compression dst buffers in the
> per-CPU acomp_ctx."
This is fine, but it still doesn’t provide a strong reason for having
two global variables when one can fully determine the value of the other.
Thanks
Barry
Powered by blists - more mailing lists