linux-kernel - Re: [PATCH v2 00/11] dm-pcache – persistent-memory cache for block devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <41d1245c-8a7f-4c5a-ba84-8e7e33b896b2@linux.dev>
Date: Thu, 10 Jul 2025 18:59:40 +0800
From: Dongsheng Yang <dongsheng.yang@...ux.dev>
To: Mikulas Patocka <mpatocka@...hat.com>
Cc: agk@...hat.com, snitzer@...nel.org, axboe@...nel.dk, hch@....de,
 dan.j.williams@...el.com, Jonathan.Cameron@...wei.com,
 linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-cxl@...r.kernel.org, nvdimm@...ts.linux.dev, dm-devel@...ts.linux.dev
Subject: Re: [PATCH v2 00/11] dm-pcache – persistent-memory cache for block devices


在 7/9/2025 5:45 PM, Dongsheng Yang 写道:
>
> 在 7/8/2025 4:16 AM, Mikulas Patocka 写道:
>>
>> On Mon, 7 Jul 2025, Dongsheng Yang wrote:
>>
>>> Hi Mikulas,
>>>     This is V2 for dm-pcache, please take a look.
>>>
>>> Code:
>>>      https://github.com/DataTravelGuide/linux tags/pcache_v2
>>>
>>> Changelogs
>>>
>>> V2 from V1:
>>>     - introduce req_alloc() and req_init() in backing_dev.c, then we
>>>       can do req_alloc() before holding spinlock and do req_init()
>>>       in subtree_walk().
>>>     - introduce pre_alloc_key and pre_alloc_req in walk_ctx, that
>>>       means we can pre-allocate cache_key or backing_dev_request
>>>       before subtree walking.
>>>     - use mempool_alloc() with NOIO for the allocation of cache_key
>>>       and backing_dev_req.
>>>     - some coding style changes from comments of Jonathan.
>> Hi
>>
>> mempool_alloc with GFP_NOIO never fails - so you don't have to check the
>> returned value for NULL and propagate the error upwards.
>
>
> Hi Mikulas:
>
>    I noticed that the implementation of mempool_alloc—it waits for 5 
> seconds and retries when allocation fails.
>
> With this in mind, I propose that we handle -ENOMEM inside defer_req() 
> using a similar mechanism. something like this commit:
>
>
> https://github.com/DataTravelGuide/linux/commit/e6fc2e5012b1fe2312ed7dd02d6fbc2d038962c0 
>
>
>
> Here are two key reasons why:
>
> (1) If we manage -ENOMEM in defer_req(), we don’t need to modify every 
> lower-level allocation to use mempool to avoid failures—for example,
>
> cache_key, backing_req, and the kmem.bvecs you mentioned. More 
> importantly, there’s no easy way to prevent allocation failure in some 
> places—for instance, bio_init_clone() could still return -ENOMEM.
>
> (2) If we use a mempool, it will block and wait indefinitely when 
> memory is unavailable, preventing the process from exiting.
>
> But with defer_req(), the user can still manually stop the pcache 
> device using dmsetup remove, releasing some memory if user want.
>
>
> What do you think?


BTW, I added a test case for NOMEM scenario by using failslab:


https://github.com/DataTravelGuide/dtg-tests/blob/main/pcache.py.data/pcache_failslab.sh

>
> Thanx
>
> Dongsheng
>
>>
>> "backing_req->kmem.bvecs = kmalloc_array(n_vecs, sizeof(struct bio_vec),
>> GFP_NOIO)" - this call may fail and you should handle the error 
>> gracefully
>> (i.e. don't end the bio with an error). Would it be possible to trim the
>> request to BACKING_DEV_REQ_INLINE_BVECS vectors and retry it?
>> Alternativelly, you can create a mempool for the largest possible n_vecs
>> and allocate from this mempool if kmalloc_array fails.
>>
>> I'm sending two patches for dm-pcache - the first patch adds the include
>> file linux/bitfield.h - it is needed in my config. The second patch 
>> makes
>> slab caches per-module rather than per-device, if you have them
>> per-device, there are warnings about duplicate cache names.
>>
>>
>> BTW. What kind of persistent memory do you use? (afaik Intel killed the
>> Optane products and I don't know of any replacement)
>>
>> Some times ago I created a filesystem for persistent memory - see
>> git://leontynka.twibright.com/nvfs.git - I'd be interested if you can 
>> test
>> it on your persistent memory implementation.
>>
>> Mikulas
>>
>