[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4a9950fb-471e-4b04-8a0e-0f34e8b6a082@kernel.org>
Date: Wed, 3 Dec 2025 12:22:29 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Michal Hocko <mhocko@...e.com>
Cc: Gregory Price <gourry@...rry.net>,
Andrew Morton <akpm@...ux-foundation.org>,
Aboorva Devarajan <aboorvad@...ux.ibm.com>, vbabka@...e.cz,
surenb@...gle.com, jackmanb@...gle.com, hannes@...xchg.org, ziy@...dia.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Oscar Salvador <OSalvador@...e.com>
Subject: Re: [PATCH] mm/page_alloc: make percpu_pagelist_high_fraction reads
lock-free
On 12/3/25 10:42, Michal Hocko wrote:
> On Wed 03-12-25 10:15:04, David Hildenbrand (Red Hat) wrote:
>> On 12/3/25 09:59, Gregory Price wrote:
>>> On Wed, Dec 03, 2025 at 09:42:59AM +0100, Michal Hocko wrote:
>>>> On Wed 03-12-25 03:35:51, Gregory Price wrote:
>>>>> if (!ret) {
>>>>> /*
>>>>> * TODO: fatal migration failures should bail
>>>>> * out
>>>>> */
>>>>> do_migrate_range(pfn, end_pfn);
>>>>> }
>>>>>
>>>>> Maybe it's time to implement the bail out?
>>>>
>>>> That would be great but can we tell transient from permanent migration
>>>> failures? Maybe long term pins could be treated as permanent failure.
>>>>
>>>
>>> I see deep in migration code `migrate_pages_batch()` we would return
>>> "Some other failure" as fatal:
>>>
>>> switch(rc) {
>>> case -ENOMEM:
>>> ...
>>> /* Note: some long-term pin handing is done here */
>>> break;
>>> case -EAGAIN:
>>> ...
>>> break;
>>> case 0:
>>> ...
>>> list_move_tail(&folio->lru, &unmap_folios);
>>> list_add_tail(&dst->lru, &dst_folios);
>>> break;
>>> default:
>>> /*
>>> * Permanent failure (-EBUSY, etc.):
>>> * unlike -EAGAIN case, the failed folio is
>>> * removed from migration folio list and not
>>> * retried in the next outer loop.
>>> */
>>> nr_failed++;
>>> stats->nr_thp_failed += is_thp;
>>> stats->nr_failed_pages += nr_pages;
>>> break;
>>> }
>>>
>>> So at a minimum we could at least check for !(ENOMEM,EAGAIN) I suppose?
>>>
>>> It's unclear to me based on this code here how longerm pinning would
>>> return. Maybe David knows.
>>
>> I would assume that additional references will always result in -EAGAIN.
>> Remember that we cannot distinguish short-term pins from long-term pins.
>>
>> We should never have longterm-pins on ZONE_MOVABLE, unless something broke
>> that contract and needs to be fixed.
>
> Right. But what should the hotplug code do under that condition. Loop
> for ever or fail reporting the broken contract? I would lean towards the
> latter.
If you can detect it reliably.
> We have never promised that offlining will not fail ever for
> movable zones. We just guarantee that the operation is resistant against
> recovarable failures.
Right, but we don't want it to fail for reasons where retrying a bit longer
would just have worked.
What we document is:
Memory Offlining and ZONE_MOVABLE
---------------------------------
Even with ZONE_MOVABLE, there are some corner cases where offlining a memory
block might fail:
... list of corner cases
Further, when running into out of memory situations while migrating pages, or
when still encountering permanently unmovable pages within ZONE_MOVABLE
(-> BUG), memory offlining will keep retrying until it eventually succeeds.
When offlining is triggered from user space, the offlining context can be
terminated by sending a signal. A timeout based offlining can easily be
implemented via::
% timeout $TIMEOUT offline_block | failure_handling
--
Cheers
David
Powered by blists - more mailing lists