linux-kernel - Re: [PATCH] mm/page_alloc: make percpu_pagelist_high

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aTcVvMFtKcVerNyz@tiehlicka>
Date: Mon, 8 Dec 2025 19:15:24 +0100
From: Michal Hocko <mhocko@...e.com>
To: Aboorva Devarajan <aboorvad@...ux.ibm.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, gourry@...rry.net,
	david@...nel.org, vbabka@...e.cz, surenb@...gle.com,
	jackmanb@...gle.com, hannes@...xchg.org, ziy@...dia.com,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/page_alloc: make percpu_pagelist_high_fraction reads
 lock-free

On Mon 08-12-25 23:00:46, Aboorva Devarajan wrote:
> On Mon, 2025-12-01 at 09:41 -0800, Andrew Morton wrote:
> > On Mon,  1 Dec 2025 11:30:09 +0530 Aboorva Devarajan <aboorvad@...ux.ibm.com> wrote:
[...]
> [83315.383433] page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dc68
> [83315.383442] flags: 0x23ffffe00000000(node=2|zone=0|lastcpupid=0x1fffff)
> [83315.383448] page_type: f5(slab)
> [83315.383454] raw: 023ffffe00000000 c0000028e001fa00 5deadbeef0000100 5deadbeef0000122
> [83315.383462] raw: 0000000000000000 0000000001e101e1 00000002f5000000 0000000000000000
> [83315.383470] page dumped because: isolation failed
> ...
> ...
> ...
> 
> 
> Given the following statement in the documentation, should this behavior be considered
> expected?
> 
> >From Documentation/admin-guide/mm/memory-hotplug.rst:
> "Further, memory offlining might retry for a long time (or even forever), until
> aborted by the user."

This is in line with trying to offline memory blocks containing the
kernel memory as seen above. Retrying for ever on movable zones is a
different issue as discussed in other reply.

> There's also a TODO in the code that confirms this issue:
> 
> mm/memory_hotplug.c
> /*
>  * TODO: fatal migration failures should bail
>  * out
>  */
> do_migrate_range(pfn, end_pfn);
> 
> 
> A possible improvement would be to add a retry limit or timeout for pages that repeatedly
> fail isolation, returning -EBUSY after N attempts instead of looping indefinitely for
> umovable pages. This would make the behavior more predictable.

I disagree. It is trivial to implement timeout retry in the userspace.
Any retry attempts limit behavior will be much less predictable. It
could have been a matter of timing that an operation succeeds. We've had
exactly that kind of behavior before.
-- 
Michal Hocko
SUSE Labs