[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251021180435.11789-1-jinji.z.zhong@gmail.com>
Date: Tue, 21 Oct 2025 18:04:35 +0000
From: jinji zhong <jinji.z.zhong@...il.com>
To: joshua.hahnjy@...il.com
Cc: Liam.Howlett@...cle.com,
akpm@...ux-foundation.org,
axelrasmussen@...gle.com,
david@...hat.com,
feng.han@...or.com,
hannes@...xchg.org,
jackmanb@...gle.com,
linux-kernel@...r.kernel.org,
linux-mm@...ck.org,
liulu.liu@...or.com,
lorenzo.stoakes@...cle.com,
mhocko@...e.com,
rppt@...nel.org,
shakeel.butt@...ux.dev,
surenb@...gle.com,
vbabka@...e.cz,
weixugc@...gle.com,
yuanchu@...gle.com,
zhengqi.arch@...edance.com,
zhongjinji@...or.com,
ziy@...dia.com
Subject: Re: [PATCH] mm/page_alloc: Consider PCP pages as part of
> Zhongjinji, thank you for your patch!
>
>> When free_pages becomes critically low, the kernel prevents other tasks
>> from entering the slow path to ensure that reclaiming tasks can
>> successfully allocate memory.
>>
>> This blocking is important to avoid memory contention with reclaiming
>> tasks. However, in some cases it is unnecessary because the PCP list may
>> already contain sufficient pages, as freed pages are first placed there
>> and are not immediately visible to the buddy system.
>
>Based on my limiting understanding of pcp free pages, I had a concern here
>on whether this would really provide the desired effect. That is, the pages
>in the pcp are not available to the buddy allocator unless we drain the pcp
>lists (and this operation is not free), I was unsure if there was a clear
>benefit to allowing the system to go unblocked.
The purpose of this patch is to account for the memory in the pcp list
within memalloc_reserve, which would allow more tasks to enter the slow
path. For critical tasks, entering the slow path represents a better
alternative than being throttled until kswapd wakes up.
>If we are already at the point where we need the pcp pages to have enough
>free pages to go over the watermark, perhaps it makes sense to just block
>tasks for now, and enter direct reclaim? Allowing more allocations might
>lead the system to be in a worse state than before, and will have to
>go through direct reclaim anyways.
>
>Please let me know if this makes sense!
>
>> By accounting PCP pages as part of pfmemalloc_reserve, we can reduce
>> unnecessary blocking and improve system responsiveness under low-memory
>> conditions.
>>
>> Signed-off-by: zhongjinji <zhongjinji@...or.com>
>
>[...snip...]
>
>> +int zone_pcp_pages_count(struct zone *zone)
>> +{
>> + struct per_cpu_pages *pcp;
>> + int total_pcp_pages = 0;
>> + int cpu;
>> +
>> + for_each_online_cpu(cpu) {
>> + pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
>> + total_pcp_pages += pcp->count;
>
>Could this be racy? What is stopping the pcp count from decreasing while we
>are iterating over each online cpu, over each managed zone? Under the
>memory pressure conditions that this patch is aiming to fix, I think that
>there is a good chance the numer we get here will be very outdated by the time
>we try to take action based on it, and we may require the system to be
>further stalled since we don't take action to reclaim memory.
Thank you, Joshua. Indeed, the pcp->count might be outdated by the time.
And kswapd will fail to allocate memory if the pages in the pcp lists are
on other CPUs. While drain_all_pages() may be triggered by direct reclaim
on other CPUs, some hard-to-predict scenarios might still exist.
Perhaps performing drain_all_pages() before actually calling
throttle_direct_reclaim() would be better.
Like the following code.
@@ -6535,6 +6535,7 @@ static bool allow_direct_reclaim(pg_data_t *pgdat)
static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
nodemask_t *nodemask)
{
+ bool drained = false;
struct zoneref *z;
struct zone *zone;
pg_data_t *pgdat = NULL;
@@ -6570,6 +6571,7 @@ static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
* for remote pfmemalloc reserves and processes on different nodes
* should make reasonable progress.
*/
+retry:
for_each_zone_zonelist_nodemask(zone, z, zonelist,
gfp_zone(gfp_mask), nodemask) {
if (zone_idx(zone) > ZONE_NORMAL)
@@ -6586,6 +6588,12 @@ static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
if (!pgdat)
goto out;
+ if (!drained) {
+ drained = true;
+ drain_all_pages(NULL);
+ goto retry;
+ }
+
/* Account for the throttling */
count_vm_event(PGSCAN_DIRECT_THROTTLE);
>[...snip...]
>
>Please feel free to let me know if I am missing something obvious. Again,
>I am not very familiar with the pcp code, so there is a good chance that
>you are seeing something that I am not : -)
>
>Thank you for the patch, I hope you have a great day!
>Joshua
>
Powered by blists - more mailing lists