[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b1ff9a6d-4593-4120-b989-5a0fdba8329a@amd.com>
Date: Tue, 17 Jun 2025 08:34:53 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Steve Wahl <steve.wahl@....com>, Leon Romanovsky <leon@...nel.org>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
linux-kernel@...r.kernel.org, Vishal Chourasia <vishalc@...ux.ibm.com>,
samir <samir@...ux.ibm.com>, Naman Jain <namjain@...ux.microsoft.com>,
Saurabh Singh Sengar <ssengar@...ux.microsoft.com>, srivatsa@...il.mit.edu,
Michael Kelley <mhklinux@...look.com>, Russ Anderson <rja@....com>,
Dimitri Sivanich <sivanich@....com>
Subject: Re: [PATCH v4 1/2] sched/topology: improve topology_span_sane speed
Hello Steve,
On 6/16/2025 7:48 PM, Steve Wahl wrote:
> On Sun, Jun 15, 2025 at 09:42:07AM +0300, Leon Romanovsky wrote:
>> On Thu, Jun 12, 2025 at 04:11:52PM +0530, K Prateek Nayak wrote:
>>> On 6/12/2025 3:00 PM, K Prateek Nayak wrote:
>>>> Ah! Since this happens so early topology isn't created yet for
>>>> the debug prints to hit! Is it possible to get a dmesg with
>>>> "ignore_loglevel" and "sched_verbose" on an older kernel that
>>>> did not throw this error on the same host?
>>
>> This is dmesg with reverted two commits "ched/topology: Refinement to
>> topology_span_sane speedup" and "sched/topology: improve
>> topology_span_sane speed"
>
> I would be interested in whether there's a difference with only the
> second patch being reverted. The first patch is expected to get the
> exact same results as previous code, only faster. The second had
> simplifications suggested by others that could give different results
> under conditions that were not expected to exist. The commit message
> for the second patch explains this.
Since NUMA domains are skipped as a result of SD_OVERLAP, the remaining
PKG domains don't show any discrepancy that would fail the current
check:
CPU0 attaching sched-domain(s):
domain-0: span=0-1 level=PKG id:0 span:0-1
groups: 0:{ span=0 }, 1:{ span=1 }
CPU1 attaching sched-domain(s):
domain-0: span=0-1 level=PKG id:0 span:0-1
groups: 1:{ span=1 }, 0:{ span=0 }
CPU2 attaching sched-domain(s):
domain-0: span=2-3 level=PKG id:2 span:2-3
groups: 2:{ span=2 }, 3:{ span=3 }
CPU3 attaching sched-domain(s):
domain-0: span=2-3 level=PKG id:2 span:2-3
groups: 3:{ span=3 }, 2:{ span=2 }
CPU4 attaching sched-domain(s):
domain-0: span=4-5 level=PKG id:4 span:4-5
groups: 4:{ span=4 }, 5:{ span=5 }
CPU5 attaching sched-domain(s):
domain-0: span=4-5 level=PKG id:4 span:4-5
groups: 5:{ span=5 }, 4:{ span=4 }
CPU6 attaching sched-domain(s):
domain-0: span=6-7 level=PKG id:6 span:6-7
groups: 6:{ span=6 }, 7:{ span=7 }
CPU7 attaching sched-domain(s):
domain-0: span=6-7 level=PKG id:6 span:6-7
groups: 7:{ span=7 }, 6:{ span=6 }
CPU8 attaching sched-domain(s):
domain-0: span=8-9 level=PKG id:8 span:8-9
groups: 8:{ span=8 }, 9:{ span=9 }
CPU9 attaching sched-domain(s):
domain-0: span=8-9 level=PKG id:8 span:8-9
groups: 9:{ span=9 }, 8:{ span=8 }
I suspect a topology level that gets degenerated for the failed check
but looking at the degeneration path, the degenerated domains should
either have a single CPU in it (SMT,CLS,MC) or it should have the
same span as PKG (NODE domain) for it to degenerate which should be
sane.
Leon, could you also paste the output of numactl -H from within the
guest please. I'm wondering if the NUMA topology makes a difference
here somehow.
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists