[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251001234814.7896-1-hdanton@sina.com>
Date: Thu, 2 Oct 2025 07:48:13 +0800
From: Hillf Danton <hdanton@...a.com>
To: Joshua Hahn <joshua.hahnjy@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
linux-kernel@...r.kernel.org,
linux-mm@...ck.org,
kernel-team@...a.com
Subject: Re: [PATCH v2 2/4] mm/page_alloc: Perform appropriate batching in drain_pages_zone
On Wed, 1 Oct 2025 08:37:16 -0700 Joshua Hahn wrote:
>
> While I definitely agree that spreading out 1TB across multiple NUMA nodes
> is an option that should be considered, I am unsure if it makes sense to
> dismiss this issue as simply a misconfiguration problem.
>
> The reality is that these machines do exist, and we see zone lock contention
> on these machines. You can also see that I ran performance evaluation tests
> on relatively smaller machines (250G) and saw some performance gains.
>
If NUMA node could not be an option, there is still much room in the zone types
for adding new zones on top of the current pcp and zone mechanism to mitigate
zone lock contention, see diff below. Then the issue falls in the config category.
> The other point that I wanted to mention is that simply adding more NUMA
> nodes is not always strictly beneficial; it changes how the scheduler
> has to work, workloads would require more numa-aware tuning, etc.
Feel safe to sit back with Netflix on as PeterZ is taking care of NUMA nodes
and eevdf, haha.
--- x/include/linux/mmzone.h
+++ y/include/linux/mmzone.h
@@ -779,6 +779,9 @@ enum zone_type {
#ifdef CONFIG_ZONE_DMA32
ZONE_DMA32,
#endif
+#ifdef CONFIG_ZONE_EXP
+ ZONE_EXP0, ZONE_EXP1, ZONE_EXP2, /* experiment */
+#endif
/*
* Normal addressable memory is in ZONE_NORMAL. DMA operations can be
* performed on pages in ZONE_NORMAL if the DMA devices support
Powered by blists - more mailing lists