[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1ff2da54-60dd-9719-eb55-386ff32ae421@redhat.com>
Date: Mon, 14 Jul 2025 17:43:42 +0200 (CEST)
From: Mikulas Patocka <mpatocka@...hat.com>
To: Dongsheng Yang <dongsheng.yang@...ux.dev>
cc: agk@...hat.com, snitzer@...nel.org, axboe@...nel.dk, hch@....de,
dan.j.williams@...el.com, Jonathan.Cameron@...wei.com,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-cxl@...r.kernel.org, nvdimm@...ts.linux.dev,
dm-devel@...ts.linux.dev
Subject: Re: [PATCH v2 00/11] dm-pcache – persistent-memory cache for block devices
On Wed, 9 Jul 2025, Dongsheng Yang wrote:
>
> 在 7/8/2025 4:16 AM, Mikulas Patocka 写道:
> >
> > On Mon, 7 Jul 2025, Dongsheng Yang wrote:
> >
> > > Hi Mikulas,
> > > This is V2 for dm-pcache, please take a look.
> > >
> > > Code:
> > > https://github.com/DataTravelGuide/linux tags/pcache_v2
> > >
> > > Changelogs
> > >
> > > V2 from V1:
> > > - introduce req_alloc() and req_init() in backing_dev.c, then we
> > > can do req_alloc() before holding spinlock and do req_init()
> > > in subtree_walk().
> > > - introduce pre_alloc_key and pre_alloc_req in walk_ctx, that
> > > means we can pre-allocate cache_key or backing_dev_request
> > > before subtree walking.
> > > - use mempool_alloc() with NOIO for the allocation of cache_key
> > > and backing_dev_req.
> > > - some coding style changes from comments of Jonathan.
> > Hi
> >
> > mempool_alloc with GFP_NOIO never fails - so you don't have to check the
> > returned value for NULL and propagate the error upwards.
>
>
> Hi Mikulas:
>
> I noticed that the implementation of mempool_alloc—it waits for 5 seconds
> and retries when allocation fails.
No, this is incorrect observation.
mempool_alloc will add the current process to a wait queue:
prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
then, it will execute the wait:
io_schedule_timeout(5*HZ);
and then remove the current process from a wait queue:
finish_wait(&pool->wait, &wait);
but the io_schedule_timeout function will wait at most 5 seconds - it
doesn't wait exactly 5 seconds. If some other piece of code frees some
data into the mempool, it executes:
wake_up(&pool->wait);
so that the process that is waiting in io_schedule_timeout(5*HZ) is woken
up immediatelly.
See this comment in mempool_alloc:
/*
* FIXME: this should be io_schedule(). The timeout is there as a
* workaround for some DM problems in 2.6.18.
*/
io_schedule_timeout(5*HZ);
- the timeout is actually a workaround for some buggy code in device
mapper where the code allocated more and more entries from the mempool
without freeing them. I fixed this bug many years ago. But people
shouldn't add more buggy code that depends on this timeout.
> With this in mind, I propose that we handle -ENOMEM inside defer_req() using a
> similar mechanism. something like this commit:
>
>
> https://github.com/DataTravelGuide/linux/commit/e6fc2e5012b1fe2312ed7dd02d6fbc2d038962c0
>
>
> Here are two key reasons why:
>
> (1) If we manage -ENOMEM in defer_req(), we don’t need to modify every
> lower-level allocation to use mempool to avoid failures—for example,
>
> cache_key, backing_req, and the kmem.bvecs you mentioned. More importantly,
> there’s no easy way to prevent allocation failure in some places—for instance,
> bio_init_clone() could still return -ENOMEM.
>
> (2) If we use a mempool, it will block and wait indefinitely when memory is
> unavailable, preventing the process from exiting.
Mempool will wait until some other code frees some data into the mempool.
So, as long as your code can make forward progress and finish some
requests and free their data into the mempool, it should work properly.
On the other hand, non-mempool GFP_NOIO allocations can wait indefinitely.
> But with defer_req(), the user can still manually stop the pcache device using
> dmsetup remove, releasing some memory if user want.
>
> What do you think?
I think that you should go back to version 2 (that had mempools) and
change the non-mempool allocation "backing_req->kmem.bvecs =
kmalloc_array(n_vecs, sizeof(struct bio_vec), GFP_NOIO);" to use a
mempool.
> Thanx
>
> Dongsheng
Mikulas
Powered by blists - more mailing lists