[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87728994-b928-45b3-a6a0-258af6e81294@amd.com>
Date: Fri, 18 Apr 2025 10:56:04 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Vincent Guittot <vincent.guittot@...aro.org>, Shrikanth Hegde
<sshegde@...ux.ibm.com>, "Chen, Yu C" <yu.c.chen@...el.com>, Tim Chen
<tim.c.chen@...ux.intel.com>, Ingo Molnar <mingo@...nel.org>, Doug Nelson
<doug.nelson@...el.com>, Mohini Narkhede <mohini.narkhede@...el.com>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched: Skip useless sched_balance_running acquisition if
load balance is not due
Hello Peter,
On 4/17/2025 5:31 PM, Peter Zijlstra wrote:
>> o Since this is a single flag across the entire system, it also implies
>> CPUs cannon concurrently do load balancing across different NUMA
>> domains which seems reasonable since a load balance at lower NUMA
>> domain can potentially change the "nr_numa_running" and
>> "nr_preferred_running" stats for the higher domain but if this is the
>> case, a newidle balance at lower NUMA domain can interfere with a
>> concurrent busy / newidle load balancing at higher NUMA domain.
>> Is this expected? Should newidle balance be serialized too?
>
> Serializing new-idle might create too much idle time.
In the context of busy and idle balancing, What are your thoughts on a
per sd "serialize' flag?
>
>> (P.S. I copied over the serialize logic from sched_balance_domains()
>> into sched_balance_newidle() and did not see any difference in my
>> testing but perhaps there are benchmarks out there that care for
>> this)
>>
>> o If the intention of SD_SERIALIZE was to actually "serializes
>> load-balancing passes over large domains (above the NODE topology
>> level)" as the comment above "sched_balance_running" states, and
>> this question is specific to x86 - when enabling SNC on Intel or
>> NPS on AMD servers, the first NUMA domain is in fact as big as the
>> NODE (now PKG domain) if not smaller. Is it okay to clear
>> SD_SERIALIZE for these domains since they are small enough now?
>
> You'll have to dive into the history here, but IIRC it was from SGI back
> in the day, where NUMA factors were quite large and load-balancing
> across numa was a pain.
Let me dig up the git history and see if any interesting details hide
there.
>
> Small really isn't the criteria, but inter-node latency might be, we
> also have this node_reclaim_distance thing.
>
> Not quite sure what makes sense, someone should tinker I suppose, see
> what works with today's hardare.
I'll try some experiments over the weekend to see if my machine turns
up happy with non-serialized lb for inter-PKG load balancing with NPS
turned on. I'll probably piggy back off of "node_reclaim_distance"
heuristics.
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists