linux-kernel - Re: [PATCH 06/11] mempool: factor out a mempool_alloc_from

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <23fac529-7b4d-4095-8c64-0d4a9d08c9b1@suse.cz>
Date: Sun, 23 Nov 2025 22:22:47 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Hugh Dickins <hughd@...gle.com>
Cc: Christoph Hellwig <hch@....de>, Andrew Morton
 <akpm@...ux-foundation.org>, Christoph Lameter <cl@...two.org>,
 David Rientjes <rientjes@...gle.com>,
 Roman Gushchin <roman.gushchin@...ux.dev>, Harry Yoo <harry.yoo@...cle.com>,
 Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
 Brendan Jackman <jackmanb@...gle.com>, Zi Yan <ziy@...dia.com>,
 Eric Biggers <ebiggers@...nel.org>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 06/11] mempool: factor out a mempool_alloc_from_pool
 helper

On 11/23/25 18:49, Hugh Dickins wrote:
> On Sun, 23 Nov 2025, Vlastimil Babka wrote:
>> On 11/23/25 04:42, Hugh Dickins wrote:
>> > On Thu, 13 Nov 2025, Christoph Hellwig wrote:
>> > 
>> > 
>> > No, that is wrong, it breaks the mempool promise: linux-next oopses
>> > in swap_writepage_bdev_async(), which relies on bio_alloc(,,,GFP_NOIO)
>> > to return a good bio.
>> > 
>> > The refactoring makes it hard to see, but the old version always used
>> > to go back to repeat_alloc at the end, if __GFP_DIRECT_RECLAIM,
>> > whereas here it only does so the first time, when gfp_temp != gfp_mask.
>> > 
>> > After bisecting to here, I changed that "gfp_temp != gfp_mask" to
>> > "(gfp & __GFP_DIRECT_RECLAIM)", and it worked again.  But other patches
>> > have come in on top, so below is a patch to the final mm/mempool.c...
>> 
>> Thanks a lot Hugh and sorry for the trouble.
>> 
>> Looking closer I noticed we're also not doing as the comment says about
>> passing the limited flags to mempool_alloc_from_pool() on the first attempt.
>> 
>> I would also rather keep distinguishing the "retry with full flags" and
>> "retry because we can sleep" for now, in case there are callers that can't
>> sleep, but can benefit from memalloc context. It's hypothetical and I haven't
>> made an audit, but we can clean that up deliberately later and not as part
>> of a refactor patch.
>> 
>> So I'd amend this patch with:
>> 
>> diff --git a/mm/mempool.c b/mm/mempool.c
>> index c28087a3b8a9..224a4dead239 100644
>> --- a/mm/mempool.c
>> +++ b/mm/mempool.c
>> @@ -478,10 +478,15 @@ void *mempool_alloc_noprof(mempool_t *pool, gfp_t gfp_mask)
>>                  * sleep in mempool_alloc_from_pool.  Retry the allocation
>>                  * with all flags set in that case.
>>                  */
>> -               element = mempool_alloc_from_pool(pool, gfp_mask);
>> -               if (!element && gfp_temp != gfp_mask) {
>> -                       gfp_temp = gfp_mask;
>> -                       goto repeat_alloc;
>> +               element = mempool_alloc_from_pool(pool, gfp_temp);
> 
> Haha, no.
> 
> I had got excited when I too thought that should be gfp_temp not gfp_mask,
> but (a) it didn't fix the bug and (b) I then came to see that gfp_mask
> there is correct.
> 
> It's looking ahead to what will be tried next: mempool_alloc_from_pool()
> is trying to alloc from mempool, and then, if it will be allowed to wait,
> waiting a suitable length of time, before letting the caller try again.
> If you substitute gfp_temp there, then it just does the same pool->alloc,
> alloc from mempool sequence twice in a row with no delay between (because
> gfp_temp does not at first allow waiting).

But it's not exactly the same sequence, because in the second pass the
pool->alloc() has the original gfp flags restored (by gfp_temp = gfp_mask)
so it can now e.g. reclaim there. It's preferred to try that first before
waiting on a mempool refill. AFAIU the idea is to try succeeding quickly if
objects to allocate are either cheaply available to alloc() or in the pool,
and if that fails, go for the more expensive allocations or waiting for refill.

AFAICS both the code before Christoph's changes, and after the changes with
my fixup do this:

1. pool->alloc(limited gfp)
2. allocate from pool, but don't wait if there's nothing
3. pool->alloc(full gfp)
4. allocate from pool, wait if there's nothing
5. goto 3

Am I missing something?

> I agree it's confusing, and calls into question whether that was a good
> refactoring.  Maybe there's a form of words for the comment above which

I'd say it was intended to be good, apart from the bugs.

> will make it clearer.  Perhaps mempool_alloc_from_pool() is better split
> into two functions.  Maybe gfp_temp could be named better.  Etc etc: I
> preferred not to mess around further with how Christoph did it, not now.
> 
> (I also wondered if it's right to pool->alloc before alloc from mempool
> after the wait was for a mempool element to be freed: but that's how it
> was before, and I expect it's been proved in the past that a strict
> pool->alloc before alloc from mempool is the best strategy.)

I'd think we better do it that way, otherwise se might be recovering more
slowly from a temporary memory shortage that cause a number of tasks to wait
in the mempool, which would then have to wait for mempool refills even
though new objects might be available to allocate thanks to the shortage
resolved.

>> +               if (!element) {
>> +                       if (gfp_temp != gfp_mask) {
>> +                               gfp_temp = gfp_mask;
>> +                               goto repeat_alloc;
>> +                       }
>> +                       if (gfp_mask & __GFP_DIRECT_RECLAIM) {
>> +                               goto repeat_alloc;
>> +                       }
> 
> I still prefer what I posted.
> 
> Hugh
> 
>>                 }
>>         }
>> 
>> 
>> With the followup commit fixed up during rebase, the diff of the whole
>> branch before/after is:
>> 
>> diff --git a/mm/mempool.c b/mm/mempool.c
>> index 5953fe801395..bb596cac57ff 100644
>> --- a/mm/mempool.c
>> +++ b/mm/mempool.c
>> @@ -555,10 +555,14 @@ void *mempool_alloc_noprof(struct mempool *pool, gfp_t gfp_mask)
>>                  * sleep in mempool_alloc_from_pool.  Retry the allocation
>>                  * with all flags set in that case.
>>                  */
>> -               if (!mempool_alloc_from_pool(pool, &element, 1, 0, gfp_mask) &&
>> -                   gfp_temp != gfp_mask) {
>> -                       gfp_temp = gfp_mask;
>> -                       goto repeat_alloc;
>> +               if (!mempool_alloc_from_pool(pool, &element, 1, 0, gfp_temp)) {
>> +                       if (gfp_temp != gfp_mask) {
>> +                               gfp_temp = gfp_mask;
>> +                               goto repeat_alloc;
>> +                       }
>> +                       if (gfp_mask & __GFP_DIRECT_RECLAIM) {
>> +                               goto repeat_alloc;
>> +                       }
>>                 }
>>         }
>>