lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1ff2da54-60dd-9719-eb55-386ff32ae421@redhat.com>
Date: Mon, 14 Jul 2025 17:43:42 +0200 (CEST)
From: Mikulas Patocka <mpatocka@...hat.com>
To: Dongsheng Yang <dongsheng.yang@...ux.dev>
cc: agk@...hat.com, snitzer@...nel.org, axboe@...nel.dk, hch@....de, 
    dan.j.williams@...el.com, Jonathan.Cameron@...wei.com, 
    linux-block@...r.kernel.org, linux-kernel@...r.kernel.org, 
    linux-cxl@...r.kernel.org, nvdimm@...ts.linux.dev, 
    dm-devel@...ts.linux.dev
Subject: Re: [PATCH v2 00/11] dm-pcache – persistent-memory cache for block devices



On Wed, 9 Jul 2025, Dongsheng Yang wrote:

> 
> 在 7/8/2025 4:16 AM, Mikulas Patocka 写道:
> > 
> > On Mon, 7 Jul 2025, Dongsheng Yang wrote:
> > 
> > > Hi Mikulas,
> > > 	This is V2 for dm-pcache, please take a look.
> > > 
> > > Code:
> > >      https://github.com/DataTravelGuide/linux tags/pcache_v2
> > > 
> > > Changelogs
> > > 
> > > V2 from V1:
> > > 	- introduce req_alloc() and req_init() in backing_dev.c, then we
> > > 	  can do req_alloc() before holding spinlock and do req_init()
> > > 	  in subtree_walk().
> > > 	- introduce pre_alloc_key and pre_alloc_req in walk_ctx, that
> > > 	  means we can pre-allocate cache_key or backing_dev_request
> > > 	  before subtree walking.
> > > 	- use mempool_alloc() with NOIO for the allocation of cache_key
> > > 	  and backing_dev_req.
> > > 	- some coding style changes from comments of Jonathan.
> > Hi
> > 
> > mempool_alloc with GFP_NOIO never fails - so you don't have to check the
> > returned value for NULL and propagate the error upwards.
> 
> 
> Hi Mikulas:
> 
>    I noticed that the implementation of mempool_alloc—it waits for 5 seconds
> and retries when allocation fails.

No, this is incorrect observation.

mempool_alloc will add the current process to a wait queue:
  prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
then, it will execute the wait:
  io_schedule_timeout(5*HZ);
and then remove the current process from a wait queue:
  finish_wait(&pool->wait, &wait);

but the io_schedule_timeout function will wait at most 5 seconds - it 
doesn't wait exactly 5 seconds. If some other piece of code frees some 
data into the mempool, it executes:
  wake_up(&pool->wait);
so that the process that is waiting in io_schedule_timeout(5*HZ) is woken 
up immediatelly.

See this comment in mempool_alloc:
        /*
         * FIXME: this should be io_schedule().  The timeout is there as a
         * workaround for some DM problems in 2.6.18.
         */
        io_schedule_timeout(5*HZ);

- the timeout is actually a workaround for some buggy code in device 
mapper where the code allocated more and more entries from the mempool 
without freeing them. I fixed this bug many years ago. But people 
shouldn't add more buggy code that depends on this timeout.

> With this in mind, I propose that we handle -ENOMEM inside defer_req() using a
> similar mechanism. something like this commit:
> 
> 
> https://github.com/DataTravelGuide/linux/commit/e6fc2e5012b1fe2312ed7dd02d6fbc2d038962c0
> 
> 
> Here are two key reasons why:
> 
> (1) If we manage -ENOMEM in defer_req(), we don’t need to modify every
> lower-level allocation to use mempool to avoid failures—for example,
> 
> cache_key, backing_req, and the kmem.bvecs you mentioned. More importantly,
> there’s no easy way to prevent allocation failure in some places—for instance,
> bio_init_clone() could still return -ENOMEM.
> 
> (2) If we use a mempool, it will block and wait indefinitely when memory is
> unavailable, preventing the process from exiting.

Mempool will wait until some other code frees some data into the mempool. 
So, as long as your code can make forward progress and finish some 
requests and free their data into the mempool, it should work properly.

On the other hand, non-mempool GFP_NOIO allocations can wait indefinitely.

> But with defer_req(), the user can still manually stop the pcache device using
> dmsetup remove, releasing some memory if user want.
> 
> What do you think?

I think that you should go back to version 2 (that had mempools) and 
change the non-mempool allocation "backing_req->kmem.bvecs = 
kmalloc_array(n_vecs, sizeof(struct bio_vec), GFP_NOIO);" to use a 
mempool.

> Thanx
> 
> Dongsheng

Mikulas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ