[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201109102909.u34zzudqqng6nhg6@linutronix.de>
Date: Mon, 9 Nov 2020 11:29:09 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Barry Song <song.bao.hua@...ilicon.com>
Cc: linux-mm@...ck.org, linux-crypto@...r.kernel.org,
akpm@...ux-foundation.org, linuxarm@...wei.com,
fanghao11@...wei.com, linux-kernel@...r.kernel.org,
Vitaly Wool <vitalywool@...il.com>,
"Luis Claudio R . Goncalves" <lgoncalv@...hat.com>,
Herbert Xu <herbert@...dor.apana.org.au>,
"David S . Miller" <davem@...emloft.net>,
Mahipal Challa <mahipalreddy2006@...il.com>,
Seth Jennings <sjenning@...hat.com>,
Dan Streetman <ddstreet@...e.org>,
Zhou Wang <wangzhou1@...ilicon.com>,
Colin Ian King <colin.king@...onical.com>
Subject: Re: [PATCH v7] mm/zswap: move to use crypto_acomp API for hardware
acceleration
I've been looking at the patch and it looks like it should work. Having
numbers to backup the performance in the pure-software version and with
HW acceleration would _very_ nice to have.
On 2020-11-07 19:53:32 [+1300], Barry Song wrote:
> index fbb7829..73f04de 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -415,30 +445,54 @@ static int zswap_dstmem_dead(unsigned int cpu)
…
> + acomp_ctx->req = req;
> +
> + crypto_init_wait(&acomp_ctx->wait);
> + /*
> + * if the backend of acomp is async zip, crypto_req_done() will wakeup
> + * crypto_wait_req(); if the backend of acomp is scomp, the callback
> + * won't be called, crypto_wait_req() will return without blocking.
> + */
> + acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
> + crypto_req_done, &acomp_ctx->wait);
> +
> + acomp_ctx->mutex = per_cpu(zswap_mutex, cpu);
> + acomp_ctx->dstmem = per_cpu(zswap_dstmem, cpu);
You added a comment here and there you never mentioned that this single
per-CPU mutex protects the per-CPU context (which you can have more than
one on a single CPU) and the scratch/dstmem which is one per-CPU. Of
course if you read the code you figure it out.
I still think that you should have a pool of memory and crypto contexts
which you can use instead of having them strictly per-CPU. The code is
fully preemptible and you may have multiple requests on the same CPU.
Yes, locking works but at the same you block processing while waiting on
a lock and the "reserved memory" on other CPUs remains unused.
Sebastian
Powered by blists - more mailing lists