[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGsJ_4x38iz1XmRp_j3jX-8fY8o_3RNXLx78wc3s_4-o+N0URQ@mail.gmail.com>
Date: Sat, 30 Aug 2025 16:40:57 +0800
From: Barry Song <21cnbao@...il.com>
To: "Sridhar, Kanchana P" <kanchana.p.sridhar@...el.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
"hannes@...xchg.org" <hannes@...xchg.org>, "yosry.ahmed@...ux.dev" <yosry.ahmed@...ux.dev>,
"nphamcs@...il.com" <nphamcs@...il.com>, "chengming.zhou@...ux.dev" <chengming.zhou@...ux.dev>,
"usamaarif642@...il.com" <usamaarif642@...il.com>, "ryan.roberts@....com" <ryan.roberts@....com>,
"ying.huang@...ux.alibaba.com" <ying.huang@...ux.alibaba.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"senozhatsky@...omium.org" <senozhatsky@...omium.org>,
"linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>,
"herbert@...dor.apana.org.au" <herbert@...dor.apana.org.au>, "davem@...emloft.net" <davem@...emloft.net>,
"clabbe@...libre.com" <clabbe@...libre.com>, "ardb@...nel.org" <ardb@...nel.org>,
"ebiggers@...gle.com" <ebiggers@...gle.com>, "surenb@...gle.com" <surenb@...gle.com>,
"Accardi, Kristen C" <kristen.c.accardi@...el.com>, "Gomes, Vinicius" <vinicius.gomes@...el.com>,
"Feghali, Wajdi K" <wajdi.k.feghali@...el.com>, "Gopal, Vinodh" <vinodh.gopal@...el.com>
Subject: Re: [PATCH v11 22/24] mm: zswap: Allocate pool batching resources if
the compressor supports batching.
> > >
> > > I am not sure I understand this rationale, but I do want to reiterate
> > > that the patch-set implements a simple set of rules/design choices
> > > to provide a batching framework for software and hardware compressors,
> > > that has shown good performance improvements with both, while
> > > unifying zswap_store()/zswap_compress() code paths for both.
> >
> > I’m really curious: if ZSWAP_MAX_BATCH_SIZE = 8 and
> > compr_batch_size = 4, why wouldn’t batch_size = 8 and
> > compr_batch_size = 4 perform better than batch_size = 4 and
> > compr_batch_size = 4?
> >
> > In short, I’d like the case of compr_batch_size == 1 to be treated the same
> > as compr_batch_size == 2, 4, etc., since you can still see performance
> > improvements when ZSWAP_MAX_BATCH_SIZE = 8 and compr_batch_size ==
> > 1,
> > as batching occurs even outside compression.
> >
> > Therefore, I would expect batch_size == 8 and compr_batch_size == 2 to
> > perform better than when both are 2.
> >
> > The only thing preventing this from happening is that compr_batch_size
> > might be 5, 6, or 7, which are not powers of two?
>
> It would be interesting to see if a generalization of pool->compr_batch_size
> being a factor "N" (where N > 1) of ZSWAP_MAX_BATCH_SIZE yields better
> performance than the current set of rules. However, we would still need to
> handle the case where it is not, as you mention, which might still necessitate
> the use of a distinct pool->batch_size to avoid re-calculating this dynamically,
> when this information doesn't change after pool creation.
>
> The current implementation gives preference to the algorithm to determine
> not just the batch compression step-size, but also the working-set size for
> other zswap processing for the batch, i.e., bulk allocation of entries,
> zpool writes, etc. The algorithm's batch-size is what zswap uses for the latter
> (the zswap_store_pages() in my patch-set). This has been shown to work
> well.
>
> To change this design to be driven instead by ZSWAP_MAX_BATCH_SIZE
> always (while handling non-factor pool->compr_batch_size) requires more
> data gathering. I am inclined to keep the existing implementation and
> we can continue to improve upon this if its Ok with you.
Right, I have no objection at this stage. I’m just curious—since some hardware
now supports HW compression with only one queue, and in the future may
increase to two or four queues but not many overall—whether batch_size ==
compr_batch_size is always the best rule.
BTW, is HW compression always better than software? For example, when
kswapd, proactive reclamation, and direct reclamation all run simultaneously,
the CPU-based approach can leverage multiple CPUs to perform compression
in parallel. But if the hardware only provides a limited number of queues,
software might actually perform better. An extreme case is when multiple
threads are running MADV_PAGEOUT at the same time.
I’m not opposing your current patchset, just sharing some side thoughts :-)
Thanks
Barry
Powered by blists - more mailing lists