linux-kernel - Re: [PATCH] mm/page_alloc: Consider PCP pages as part of pfmemalloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251021145427.3580609-1-joshua.hahnjy@gmail.com>
Date: Tue, 21 Oct 2025 07:54:26 -0700
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: zhongjinji <zhongjinji@...or.com>
Cc: akpm@...ux-foundation.org,
	david@...hat.com,
	lorenzo.stoakes@...cle.com,
	Liam.Howlett@...cle.com,
	vbabka@...e.cz,
	rppt@...nel.org,
	surenb@...gle.com,
	mhocko@...e.com,
	jackmanb@...gle.com,
	hannes@...xchg.org,
	ziy@...dia.com,
	zhengqi.arch@...edance.com,
	shakeel.butt@...ux.dev,
	axelrasmussen@...gle.com,
	yuanchu@...gle.com,
	weixugc@...gle.com,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	liulu.liu@...or.com,
	feng.han@...or.com
Subject: Re: [PATCH] mm/page_alloc: Consider PCP pages as part of pfmemalloc_reserve

On Tue, 21 Oct 2025 17:50:04 +0800 zhongjinji <zhongjinji@...or.com> wrote:

Hello Zhongjinji, thank you for your patch!

> When free_pages becomes critically low, the kernel prevents other tasks
> from entering the slow path to ensure that reclaiming tasks can
> successfully allocate memory.
> 
> This blocking is important to avoid memory contention with reclaiming
> tasks. However, in some cases it is unnecessary because the PCP list may
> already contain sufficient pages, as freed pages are first placed there
> and are not immediately visible to the buddy system.

Based on my limiting understanding of pcp free pages, I had a concern here
on whether this would really provide the desired effect. That is, the pages
in the pcp are not available to the buddy allocator unless we drain the pcp
lists (and this operation is not free), I was unsure if there was a clear
benefit to allowing the system to go unblocked.

If we are already at the point where we need the pcp pages to have enough
free pages to go over the watermark, perhaps it makes sense to just block
tasks for now, and enter direct reclaim? Allowing more allocations might
lead the system to be in a worse state than before, and will have to
go through direct reclaim anyways.

Please let me know if this makes sense! 

> By accounting PCP pages as part of pfmemalloc_reserve, we can reduce
> unnecessary blocking and improve system responsiveness under low-memory
> conditions.
> 
> Signed-off-by: zhongjinji <zhongjinji@...or.com>

[...snip...]

> +int zone_pcp_pages_count(struct zone *zone)
> +{
> +	struct per_cpu_pages *pcp;
> +	int total_pcp_pages = 0;
> +	int cpu;
> +
> +	for_each_online_cpu(cpu) {
> +		pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
> +		total_pcp_pages += pcp->count;

Could this be racy? What is stopping the pcp count from decreasing while we
are iterating over each online cpu, over each managed zone? Under the
memory pressure conditions that this patch is aiming to fix, I think that
there is a good chance the numer we get here will be very outdated by the time
we try to take action based on it, and we may require the system to be
further stalled since we don't take action to reclaim memory.

[...snip...]

Please feel free to let me know if I am missing something obvious. Again,
I am not very familiar with the pcp code, so there is a good chance that
you are seeing something that I am not : -)

Thank you for the patch, I hope you have a great day!
Joshua