[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkbRHkb7Znzto6=RRDQA9zXZSva43GukhBEfjrgm1qOxHw@mail.gmail.com>
Date: Mon, 6 Jan 2025 17:46:01 -0800
From: Yosry Ahmed <yosryahmed@...gle.com>
To: "Sridhar, Kanchana P" <kanchana.p.sridhar@...el.com>
Cc: Herbert Xu <herbert@...dor.apana.org.au>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
"hannes@...xchg.org" <hannes@...xchg.org>, "nphamcs@...il.com" <nphamcs@...il.com>,
"chengming.zhou@...ux.dev" <chengming.zhou@...ux.dev>,
"usamaarif642@...il.com" <usamaarif642@...il.com>, "ryan.roberts@....com" <ryan.roberts@....com>,
"21cnbao@...il.com" <21cnbao@...il.com>, "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>, "davem@...emloft.net" <davem@...emloft.net>,
"clabbe@...libre.com" <clabbe@...libre.com>, "ardb@...nel.org" <ardb@...nel.org>,
"ebiggers@...gle.com" <ebiggers@...gle.com>, "surenb@...gle.com" <surenb@...gle.com>,
"Accardi, Kristen C" <kristen.c.accardi@...el.com>,
"Feghali, Wajdi K" <wajdi.k.feghali@...el.com>, "Gopal, Vinodh" <vinodh.gopal@...el.com>
Subject: Re: [PATCH v5 02/12] crypto: acomp - Define new interfaces for
compress/decompress batching.
On Mon, Jan 6, 2025 at 5:38 PM Sridhar, Kanchana P
<kanchana.p.sridhar@...el.com> wrote:
>
> Hi Yosry,
>
> > -----Original Message-----
> > From: Yosry Ahmed <yosryahmed@...gle.com>
> > Sent: Monday, January 6, 2025 3:24 PM
> > To: Sridhar, Kanchana P <kanchana.p.sridhar@...el.com>
> > Cc: Herbert Xu <herbert@...dor.apana.org.au>; linux-
> > kernel@...r.kernel.org; linux-mm@...ck.org; hannes@...xchg.org;
> > nphamcs@...il.com; chengming.zhou@...ux.dev;
> > usamaarif642@...il.com; ryan.roberts@....com; 21cnbao@...il.com;
> > akpm@...ux-foundation.org; linux-crypto@...r.kernel.org;
> > davem@...emloft.net; clabbe@...libre.com; ardb@...nel.org;
> > ebiggers@...gle.com; surenb@...gle.com; Accardi, Kristen C
> > <kristen.c.accardi@...el.com>; Feghali, Wajdi K <wajdi.k.feghali@...el.com>;
> > Gopal, Vinodh <vinodh.gopal@...el.com>
> > Subject: Re: [PATCH v5 02/12] crypto: acomp - Define new interfaces for
> > compress/decompress batching.
> >
> > On Mon, Jan 6, 2025 at 9:37 AM Sridhar, Kanchana P
> > <kanchana.p.sridhar@...el.com> wrote:
> > >
> > > Hi Herbert,
> > >
> > > > -----Original Message-----
> > > > From: Herbert Xu <herbert@...dor.apana.org.au>
> > > > Sent: Saturday, December 28, 2024 3:46 AM
> > > > To: Sridhar, Kanchana P <kanchana.p.sridhar@...el.com>
> > > > Cc: linux-kernel@...r.kernel.org; linux-mm@...ck.org;
> > > > hannes@...xchg.org; yosryahmed@...gle.com; nphamcs@...il.com;
> > > > chengming.zhou@...ux.dev; usamaarif642@...il.com;
> > > > ryan.roberts@....com; 21cnbao@...il.com; akpm@...ux-
> > foundation.org;
> > > > linux-crypto@...r.kernel.org; davem@...emloft.net;
> > clabbe@...libre.com;
> > > > ardb@...nel.org; ebiggers@...gle.com; surenb@...gle.com; Accardi,
> > > > Kristen C <kristen.c.accardi@...el.com>; Feghali, Wajdi K
> > > > <wajdi.k.feghali@...el.com>; Gopal, Vinodh <vinodh.gopal@...el.com>
> > > > Subject: Re: [PATCH v5 02/12] crypto: acomp - Define new interfaces for
> > > > compress/decompress batching.
> > > >
> > > > On Fri, Dec 20, 2024 at 10:31:09PM -0800, Kanchana P Sridhar wrote:
> > > > > This commit adds get_batch_size(), batch_compress() and
> > > > batch_decompress()
> > > > > interfaces to:
> > > >
> > > > First of all we don't need a batch compress/decompress interface
> > > > because the whole point of request chaining is to supply the data
> > > > in batches.
> > > >
> > > > I'm also against having a get_batch_size because the user should
> > > > be supplying as much data as they're comfortable with. In other
> > > > words if the user is happy to give us 8 requests for iaa then it
> > > > should be happy to give us 8 requests for every implementation.
> > > >
> > > > The request chaining interface should be such that processing
> > > > 8 requests is always better than doing 1 request at a time as
> > > > the cost is amortised.
> > >
> > > Thanks for your comments. Can you please elaborate on how
> > > request chaining would enable cost amortization for software
> > > compressors? With the current implementation, a module like
> > > zswap would need to do the following to invoke request chaining
> > > for software compressors (in addition to pushing the chaining
> > > to the user layer for IAA, as per your suggestion on not needing a
> > > batch compress/decompress interface):
> > >
> > > zswap_batch_compress():
> > > for (i = 0; i < nr_pages_in_batch; ++i) {
> > > /* set up the acomp_req "reqs[i]". */
> > > [ ... ]
> > > if (i)
> > > acomp_request_chain(reqs[i], reqs[0]);
> > > else
> > > acomp_reqchain_init(reqs[0], 0, crypto_req_done, crypto_wait);
> > > }
> > >
> > > /* Process the request chain in series. */
> > > err = crypto_wait_req(acomp_do_req_chain(reqs[0],
> > crypto_acomp_compress), crypto_wait);
> > >
> > > Internally, acomp_do_req_chain() would sequentially process the
> > > request chain by:
> > > 1) adding all requests to a list "state"
> > > 2) call "crypto_acomp_compress()" for the next list element
> > > 3) when this request completes, dequeue it from the list "state"
> > > 4) repeat for all requests in "state"
> > > 5) When the last request in "state" completes, call "reqs[0]-
> > >base.complete()",
> > > which notifies crypto_wait.
> > >
> > > From what I can understand, the latency cost should be the same for
> > > processing a request chain in series vs. processing each request as it is
> > > done today in zswap, by calling:
> > >
> > > comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx-
> > >reqs[0]), &acomp_ctx->wait);
> > >
> > > It is not clear to me if there is a cost amortization benefit for software
> > > compressors. One of the requirements from Yosry was that there should
> > > be no change for the software compressors, which is what I have
> > > attempted to do in v5.
> > >
> > > Can you please help us understand if there is a room for optimizing
> > > the implementation of the synchronous "acomp_do_req_chain()" API?
> > > I would also like to get inputs from the zswap maintainers on using
> > > request chaining for a batching implementation for software compressors.
> >
> > Is there a functional change in doing so, or just using different
> > interfaces to accomplish the same thing we do today?
>
> The code paths for software compressors are considerably different between
> these two scenarios:
>
> 1) Given a batch of 8 pages: for each page, call zswap_compress() that does this:
>
> comp_ret = crypto_wait_req(crypto_acomp_compress(acomp_ctx->reqs[0]), &acomp_ctx->wait);
>
> 2) Given a batch of 8 pages:
> a) Create a request chain of 8 acomp_reqs, starting with reqs[0], as
> described earlier.
> b) Process the request chain by calling:
>
> err = crypto_wait_req(acomp_do_req_chain(reqs[0], crypto_acomp_compress), &acomp_ctx->wait);
> /* Get each req's error status. */
> for (i = 0; i < nr_pages; ++i) {
> errors[i] = acomp_request_err(reqs[i]);
> if (errors[i]) {
> pr_debug("Request chaining req %d compress error %d\n", i, errors[i]);
> } else {
> dlens[i] = reqs[i]->dlen;
> }
> }
>
> What I mean by considerably different code paths is that request chaining
> internally overwrites the req's base.complete and base.data (after saving the
> original values) to implement the algorithm described earlier. Basically, the
> chain is processed in series by getting the next req in the chain, setting it's
> completion function to "acomp_reqchain_done()", which gets called when
> the "op" (crypto_acomp_compress()) is completed for that req.
> acomp_reqchain_done() will cause the next req to be processed in the
> same manner. If this next req happens to be the last req to be processed,
> it will notify the original completion function of reqs[0], with the crypto_wait
> that zswap sets up in zswap_cpu_comp_prepare():
>
> acomp_request_set_callback(acomp_ctx->reqs[0], CRYPTO_TFM_REQ_MAY_BACKLOG,
> crypto_req_done, &acomp_ctx->wait);
>
> Patch [1] in v5 of this series has the full implementation of acomp_do_req_chain()
> in case you want to understand this in more detail.
>
> The "functional change" wrt request chaining is limited to the above.
For software compressors, the batch size should be 1. In that
scenario, from a zswap perspective (without going into the acomp
implementation details please), is there a functional difference? If
not, we can just use the request chaining API regardless of batching
if that is what Herbert means.
>
> [1]: https://patchwork.kernel.org/project/linux-mm/patch/20241221063119.29140-2-kanchana.p.sridhar@intel.com/
>
> Thanks,
> Kanchana
>
Powered by blists - more mailing lists