linux-kernel - Re: [PATCH] mm/page_alloc: make percpu_pagelist_high

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4a9950fb-471e-4b04-8a0e-0f34e8b6a082@kernel.org>
Date: Wed, 3 Dec 2025 12:22:29 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Michal Hocko <mhocko@...e.com>
Cc: Gregory Price <gourry@...rry.net>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Aboorva Devarajan <aboorvad@...ux.ibm.com>, vbabka@...e.cz,
 surenb@...gle.com, jackmanb@...gle.com, hannes@...xchg.org, ziy@...dia.com,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 Oscar Salvador <OSalvador@...e.com>
Subject: Re: [PATCH] mm/page_alloc: make percpu_pagelist_high_fraction reads
 lock-free

On 12/3/25 10:42, Michal Hocko wrote:
> On Wed 03-12-25 10:15:04, David Hildenbrand (Red Hat) wrote:
>> On 12/3/25 09:59, Gregory Price wrote:
>>> On Wed, Dec 03, 2025 at 09:42:59AM +0100, Michal Hocko wrote:
>>>> On Wed 03-12-25 03:35:51, Gregory Price wrote:
>>>>> 		if (!ret) {
>>>>> 			/*
>>>>> 			 * TODO: fatal migration failures should bail
>>>>> 			 * out
>>>>> 			 */
>>>>> 			do_migrate_range(pfn, end_pfn);
>>>>> 		}
>>>>>
>>>>> Maybe it's time to implement the bail out?
>>>>
>>>> That would be great but can we tell transient from permanent migration
>>>> failures? Maybe long term pins could be treated as permanent failure.
>>>>
>>>
>>> I see deep in migration code `migrate_pages_batch()` we would return
>>> "Some other failure" as fatal:
>>>
>>> 	switch(rc) {
>>> 	case -ENOMEM:
>>> 		...
>>> 		/* Note: some long-term pin handing is done here */
>>> 		break;
>>> 	case -EAGAIN:
>>> 		...
>>> 		break;
>>> 	case 0:
>>> 		...
>>> 		list_move_tail(&folio->lru, &unmap_folios);
>>> 		list_add_tail(&dst->lru, &dst_folios);
>>> 		break;
>>> 	default:
>>> 		/*
>>> 		 * Permanent failure (-EBUSY, etc.):
>>> 		 * unlike -EAGAIN case, the failed folio is
>>> 		 * removed from migration folio list and not
>>> 		 * retried in the next outer loop.
>>> 		 */
>>> 		nr_failed++;
>>> 		stats->nr_thp_failed += is_thp;
>>> 		stats->nr_failed_pages += nr_pages;
>>> 		break;
>>> 	}
>>>
>>> So at a minimum we could at least check for !(ENOMEM,EAGAIN) I suppose?
>>>
>>> It's unclear to me based on this code here how longerm pinning would
>>> return.  Maybe David knows.
>>
>> I would assume that additional references will always result in -EAGAIN.
>> Remember that we cannot distinguish short-term pins from long-term pins.
>>
>> We should never have longterm-pins on ZONE_MOVABLE, unless something broke
>> that contract and needs to be fixed.
> 
> Right. But what should the hotplug code do under that condition. Loop
> for ever or fail reporting the broken contract? I would lean towards the
> latter. 

If you can detect it reliably.

> We have never promised that offlining will not fail ever for
> movable zones. We just guarantee that the operation is resistant against
> recovarable failures.

Right, but we don't want it to fail for reasons where retrying a bit longer
would just have worked.


What we document is:

Memory Offlining and ZONE_MOVABLE
---------------------------------

Even with ZONE_MOVABLE, there are some corner cases where offlining a memory
block might fail:

... list of corner cases

Further, when running into out of memory situations while migrating pages, or
when still encountering permanently unmovable pages within ZONE_MOVABLE
(-> BUG), memory offlining will keep retrying until it eventually succeeds.

When offlining is triggered from user space, the offlining context can be
terminated by sending a signal. A timeout based offlining can easily be
implemented via::

	% timeout $TIMEOUT offline_block | failure_handling

-- 
Cheers

David