[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dfffdbc8-63ca-46d7-bfc2-4212b8df22b1@bytedance.com>
Date: Thu, 28 Dec 2023 16:23:14 +0800
From: Chengming Zhou <zhouchengming@...edance.com>
To: Barry Song <21cnbao@...il.com>, Herbert Xu <herbert@...dor.apana.org.au>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Seth Jennings <sjenning@...hat.com>, Johannes Weiner <hannes@...xchg.org>,
Vitaly Wool <vitaly.wool@...sulko.com>, Nhat Pham <nphamcs@...il.com>,
Chris Li <chriscli@...gle.com>, Yosry Ahmed <yosryahmed@...gle.com>,
Dan Streetman <ddstreet@...e.org>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, Chris Li <chrisl@...nel.org>
Subject: Re: [PATCH v4 2/6] mm/zswap: reuse dstmem when decompress
On 2023/12/28 16:03, Barry Song wrote:
> On Wed, Dec 27, 2023 at 7:32 PM Chengming Zhou
> <zhouchengming@...edance.com> wrote:
>>
>> On 2023/12/27 09:24, Barry Song wrote:
>>> On Wed, Dec 27, 2023 at 4:56 AM Chengming Zhou
>>> <zhouchengming@...edance.com> wrote:
>>>>
>>>> In the !zpool_can_sleep_mapped() case such as zsmalloc, we need to first
>>>> copy the entry->handle memory to a temporary memory, which is allocated
>>>> using kmalloc.
>>>>
>>>> Obviously we can reuse the per-compressor dstmem to avoid allocating
>>>> every time, since it's percpu-compressor and protected in percpu mutex.
>>>
>>> what is the benefit of this since we are actually increasing lock contention
>>> by reusing this buffer between multiple compression and decompression
>>> threads?
>>
>> This mutex is already reused in all compress/decompress paths even before
>> the reuse optimization. I think the best way maybe to use separate crypto_acomp
>> for compression and decompression.
>>
>> Do you think the lock contention will be increased because we now put zpool_map_handle()
>> and memcpy() in the lock section? Actually, we can move zpool_map_handle() before
>> the lock section if needed, but that memcpy() should be protected in lock section.
>>
>>>
>>> this mainly affects zsmalloc which can't sleep? do we have performance
>>> data?
>>
>> Right, last time when test I remembered there is very minor performance difference.
>> The main benefit here is to simply the code much and delete one failure case.
>
> ok.
>
> For the majority of hardware, people are using CPU-based
> compression/decompression,
> there is no chance they will sleep. Thus, all
> compression/decompression can be done
> in a zpool_map section, there is *NO* need to copy at all! Only for
Yes, very good for zsmalloc.
> those hardware which
> can provide a HW-accelerator to offload CPU, crypto will actually wait
> for completion by
>
> static inline int crypto_wait_req(int err, struct crypto_wait *wait)
> {
> switch (err) {
> case -EINPROGRESS:
> case -EBUSY:
> wait_for_completion(&wait->completion);
> reinit_completion(&wait->completion);
> err = wait->err;
> break;
> }
>
> return err;
> }
>
> for CPU-based alg, we have completed the compr/decompr within
> crypto_acomp_decompress()
> synchronously. they won't return EINPROGRESS, EBUSY.
Ok, this is useful to know.
>
> The problem is that crypto_acomp won't expose this information to its
> users. if it does,
> we can use this info, we will totally avoid the code of copying
> zsmalloc's data to a tmp
> buffer for the most majority users of zswap.
Agree, I think it's worthwhile to export, so zsmalloc users don't need to
prepare the temporary buffer and copy in the majority case.
Thanks!
>
> But I am not sure if we can find a way to convince Herbert(+To) :-)
>
Powered by blists - more mailing lists