[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7b1265bc-835e-4c7d-af75-f237c46bc3a7@suse.cz>
Date: Sun, 23 Nov 2025 12:34:40 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Hugh Dickins <hughd@...gle.com>, Christoph Hellwig <hch@....de>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Christoph Lameter <cl@...two.org>, David Rientjes <rientjes@...gle.com>,
Roman Gushchin <roman.gushchin@...ux.dev>, Harry Yoo <harry.yoo@...cle.com>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Brendan Jackman <jackmanb@...gle.com>, Zi Yan <ziy@...dia.com>,
Eric Biggers <ebiggers@...nel.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool
helper
On 11/23/25 04:42, Hugh Dickins wrote:
> On Thu, 13 Nov 2025, Christoph Hellwig wrote:
>
>> Add a helper for the mempool_alloc slowpath to better separate it from the
>> fast path, and also use it to implement mempool_alloc_preallocated which
>> shares the same logic.
>>
>> Signed-off-by: Christoph Hellwig <hch@....de>
>> ---
> ...
>> @@ -413,8 +457,6 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
>> {
>> gfp_t gfp_temp = mempool_adjust_gfp(&gfp_mask);
>> void *element;
>> - unsigned long flags;
>> - wait_queue_entry_t wait;
>>
>> VM_WARN_ON_ONCE(gfp_mask & __GFP_ZERO);
>> might_alloc(gfp_mask);
>> @@ -428,53 +470,22 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
>> element = pool->alloc(gfp_temp, pool->pool_data);
>> }
>>
>> - if (likely(element))
>> - return element;
>> -
>> - spin_lock_irqsave(&pool->lock, flags);
>> - if (likely(pool->curr_nr)) {
>> - element = remove_element(pool);
>> - spin_unlock_irqrestore(&pool->lock, flags);
>> - /* paired with rmb in mempool_free(), read comment there */
>> - smp_wmb();
>> + if (unlikely(!element)) {
>> /*
>> - * Update the allocation stack trace as this is more useful
>> - * for debugging.
>> + * Try to allocate an element from the pool.
>> + *
>> + * The first pass won't have __GFP_DIRECT_RECLAIM and won't
>> + * sleep in mempool_alloc_from_pool. Retry the allocation
>> + * with all flags set in that case.
>> */
>> - kmemleak_update_trace(element);
>> - return element;
>> - }
>> -
>> - /*
>> - * We use gfp mask w/o direct reclaim or IO for the first round. If
>> - * alloc failed with that and @pool was empty, retry immediately.
>> - */
>> - if (gfp_temp != gfp_mask) {
>> - spin_unlock_irqrestore(&pool->lock, flags);
>> - gfp_temp = gfp_mask;
>> - goto repeat_alloc;
>> - }
>> -
>> - /* We must not sleep if !__GFP_DIRECT_RECLAIM */
>> - if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) {
>> - spin_unlock_irqrestore(&pool->lock, flags);
>> - return NULL;
>> + element = mempool_alloc_from_pool(pool, gfp_mask);
>> + if (!element && gfp_temp != gfp_mask) {
>
> No, that is wrong, it breaks the mempool promise: linux-next oopses
> in swap_writepage_bdev_async(), which relies on bio_alloc(,,,GFP_NOIO)
> to return a good bio.
>
> The refactoring makes it hard to see, but the old version always used
> to go back to repeat_alloc at the end, if __GFP_DIRECT_RECLAIM,
> whereas here it only does so the first time, when gfp_temp != gfp_mask.
>
> After bisecting to here, I changed that "gfp_temp != gfp_mask" to
> "(gfp & __GFP_DIRECT_RECLAIM)", and it worked again. But other patches
> have come in on top, so below is a patch to the final mm/mempool.c...
Thanks a lot Hugh and sorry for the trouble.
Looking closer I noticed we're also not doing as the comment says about
passing the limited flags to mempool_alloc_from_pool() on the first attempt.
I would also rather keep distinguishing the "retry with full flags" and
"retry because we can sleep" for now, in case there are callers that can't
sleep, but can benefit from memalloc context. It's hypothetical and I haven't
made an audit, but we can clean that up deliberately later and not as part
of a refactor patch.
So I'd amend this patch with:
diff --git a/mm/mempool.c b/mm/mempool.c
index c28087a3b8a9..224a4dead239 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -478,10 +478,15 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
* sleep in mempool_alloc_from_pool. Retry the allocation
* with all flags set in that case.
*/
- element = mempool_alloc_from_pool(pool, gfp_mask);
- if (!element && gfp_temp != gfp_mask) {
- gfp_temp = gfp_mask;
- goto repeat_alloc;
+ element = mempool_alloc_from_pool(pool, gfp_temp);
+ if (!element) {
+ if (gfp_temp != gfp_mask) {
+ gfp_temp = gfp_mask;
+ goto repeat_alloc;
+ }
+ if (gfp_mask & __GFP_DIRECT_RECLAIM) {
+ goto repeat_alloc;
+ }
}
}
With the followup commit fixed up during rebase, the diff of the whole
branch before/after is:
diff --git a/mm/mempool.c b/mm/mempool.c
index 5953fe801395..bb596cac57ff 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -555,10 +555,14 @@ void *mempool_alloc_noprof(struct mempool *pool, gfp_t gfp_mask)
* sleep in mempool_alloc_from_pool. Retry the allocation
* with all flags set in that case.
*/
- if (!mempool_alloc_from_pool(pool, &element, 1, 0, gfp_mask) &&
- gfp_temp != gfp_mask) {
- gfp_temp = gfp_mask;
- goto repeat_alloc;
+ if (!mempool_alloc_from_pool(pool, &element, 1, 0, gfp_temp)) {
+ if (gfp_temp != gfp_mask) {
+ gfp_temp = gfp_mask;
+ goto repeat_alloc;
+ }
+ if (gfp_mask & __GFP_DIRECT_RECLAIM) {
+ goto repeat_alloc;
+ }
}
}
Powered by blists - more mailing lists