[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aTcVvMFtKcVerNyz@tiehlicka>
Date: Mon, 8 Dec 2025 19:15:24 +0100
From: Michal Hocko <mhocko@...e.com>
To: Aboorva Devarajan <aboorvad@...ux.ibm.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, gourry@...rry.net,
david@...nel.org, vbabka@...e.cz, surenb@...gle.com,
jackmanb@...gle.com, hannes@...xchg.org, ziy@...dia.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/page_alloc: make percpu_pagelist_high_fraction reads
lock-free
On Mon 08-12-25 23:00:46, Aboorva Devarajan wrote:
> On Mon, 2025-12-01 at 09:41 -0800, Andrew Morton wrote:
> > On Mon, 1 Dec 2025 11:30:09 +0530 Aboorva Devarajan <aboorvad@...ux.ibm.com> wrote:
[...]
> [83315.383433] page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dc68
> [83315.383442] flags: 0x23ffffe00000000(node=2|zone=0|lastcpupid=0x1fffff)
> [83315.383448] page_type: f5(slab)
> [83315.383454] raw: 023ffffe00000000 c0000028e001fa00 5deadbeef0000100 5deadbeef0000122
> [83315.383462] raw: 0000000000000000 0000000001e101e1 00000002f5000000 0000000000000000
> [83315.383470] page dumped because: isolation failed
> ...
> ...
> ...
>
>
> Given the following statement in the documentation, should this behavior be considered
> expected?
>
> >From Documentation/admin-guide/mm/memory-hotplug.rst:
> "Further, memory offlining might retry for a long time (or even forever), until
> aborted by the user."
This is in line with trying to offline memory blocks containing the
kernel memory as seen above. Retrying for ever on movable zones is a
different issue as discussed in other reply.
> There's also a TODO in the code that confirms this issue:
>
> mm/memory_hotplug.c
> /*
> * TODO: fatal migration failures should bail
> * out
> */
> do_migrate_range(pfn, end_pfn);
>
>
> A possible improvement would be to add a retry limit or timeout for pages that repeatedly
> fail isolation, returning -EBUSY after N attempts instead of looping indefinitely for
> umovable pages. This would make the behavior more predictable.
I disagree. It is trivial to implement timeout retry in the userspace.
Any retry attempts limit behavior will be much less predictable. It
could have been a matter of timing that an operation succeeds. We've had
exactly that kind of behavior before.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists