lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dcc9a2de-95ee-466e-b6d4-64658e315781@amd.com>
Date: Tue, 17 Jun 2025 14:52:19 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Leon Romanovsky <leon@...nel.org>, Steve Wahl <steve.wahl@....com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 linux-kernel@...r.kernel.org, Vishal Chourasia <vishalc@...ux.ibm.com>,
 samir <samir@...ux.ibm.com>, Naman Jain <namjain@...ux.microsoft.com>,
 Saurabh Singh Sengar <ssengar@...ux.microsoft.com>, srivatsa@...il.mit.edu,
 Michael Kelley <mhklinux@...look.com>, Russ Anderson <rja@....com>,
 Dimitri Sivanich <sivanich@....com>
Subject: Re: [PATCH v4 1/2] sched/topology: improve topology_span_sane speed

Hello Leon,

On 6/17/2025 1:04 PM, Leon Romanovsky wrote:
> On Mon, Jun 16, 2025 at 09:18:41AM -0500, Steve Wahl wrote:
>> On Sun, Jun 15, 2025 at 09:42:07AM +0300, Leon Romanovsky wrote:
>>> On Thu, Jun 12, 2025 at 04:11:52PM +0530, K Prateek Nayak wrote:
>>>> On 6/12/2025 3:00 PM, K Prateek Nayak wrote:
>>>>> Ah! Since this happens so early topology isn't created yet for
>>>>> the debug prints to hit! Is it possible to get a dmesg with
>>>>> "ignore_loglevel" and "sched_verbose" on an older kernel that
>>>>> did not throw this error on the same host?
>>>
>>> This is dmesg with reverted two commits "ched/topology: Refinement to
>>> topology_span_sane speedup" and "sched/topology: improve
>>> topology_span_sane speed"
> 
> <...>
> 
>>>>
>>>> One better would be running with the following diff on top of v6.16-rc1
>>>> is possible:
>>>
>>> We are working to get this one too.

Thank you for all the data! Using the NUMA topology from the other
thread:

On 6/17/2025 1:25 PM, Leon Romanovsky wrote:
> [leonro@vm ~]$ sudo numactl -H
> available: 5 nodes (0-4)
> node 0 cpus: 0 1
> node 0 size: 2927 MB
> node 0 free: 1603 MB
> node 1 cpus: 2 3
> node 1 size: 3023 MB
> node 1 free: 3008 MB
> node 2 cpus: 4 5
> node 2 size: 3023 MB
> node 2 free: 3007 MB
> node 3 cpus: 6 7
> node 3 size: 3023 MB
> node 3 free: 3002 MB
> node 4 cpus: 8 9
> node 4 size: 3022 MB
> node 4 free: 2718 MB
> node distances:
> node   0   1   2   3   4
>    0:  10  39  38  37  36
>    1:  39  10  38  37  36
>    2:  38  38  10  37  36
>    3:  37  37  37  10  36
>    4:  36  36  36  36  10 

I could reproduce the warning using:

     sudo ~/dev/qemu/build/qemu-system-x86_64 -enable-kvm \
     -cpu host \
     -m 20G -smp cpus=10,sockets=10 -machine q35 \
     -object memory-backend-ram,size=4G,id=m0 \
     -object memory-backend-ram,size=4G,id=m1 \
     -object memory-backend-ram,size=4G,id=m2 \
     -object memory-backend-ram,size=4G,id=m3 \
     -object memory-backend-ram,size=4G,id=m4 \
     -numa node,cpus=0-1,memdev=m0,nodeid=0 \
     -numa node,cpus=2-3,memdev=m1,nodeid=1 \
     -numa node,cpus=4-5,memdev=m2,nodeid=2 \
     -numa node,cpus=6-7,memdev=m3,nodeid=3 \
     -numa node,cpus=8-9,memdev=m4,nodeid=4 \
     -numa dist,src=0,dst=1,val=39 \
     -numa dist,src=0,dst=2,val=38 \
     -numa dist,src=0,dst=3,val=37 \
     -numa dist,src=0,dst=4,val=36 \
     -numa dist,src=1,dst=0,val=39 \
     -numa dist,src=1,dst=2,val=38 \
     -numa dist,src=1,dst=3,val=37 \
     -numa dist,src=1,dst=4,val=36 \
     -numa dist,src=2,dst=0,val=38 \
     -numa dist,src=2,dst=1,val=38 \
     -numa dist,src=2,dst=3,val=37 \
     -numa dist,src=2,dst=4,val=36 \
     -numa dist,src=3,dst=0,val=37 \
     -numa dist,src=3,dst=1,val=37 \
     -numa dist,src=3,dst=2,val=37 \
     -numa dist,src=3,dst=4,val=36 \
     -numa dist,src=4,dst=0,val=36 \
     -numa dist,src=4,dst=1,val=36 \
     -numa dist,src=4,dst=2,val=36 \
     -numa dist,src=4,dst=3,val=36 \
     ...

> 
>   [    0.435961] smp: Bringing up secondary CPUs ...
>   [    0.437573] smpboot: x86: Booting SMP configuration:
>   [    0.438611] .... node  #0, CPUs:        #1
>   [    0.440449] .... node  #1, CPUs:    #2  #3
>   [    0.442906] .... node  #2, CPUs:    #4  #5
>   [    0.445298] .... node  #3, CPUs:    #6  #7
>   [    0.447715] .... node  #4, CPUs:    #8  #9
>   [    0.481482] smp: Brought up 5 nodes, 10 CPUs
>   [    0.483160] smpboot: Total of 10 processors activated (45892.16 BogoMIPS)
>   [    0.486872] tl(SMT) CPU(0) ID(0) CPU_TL_SPAN(0) ID_TL_SPAN(0)
>   [    0.488029] tl(SMT) CPU(1) ID(1) CPU_TL_SPAN(1) ID_TL_SPAN(1)
>   [    0.489151] tl(SMT) CPU(2) ID(2) CPU_TL_SPAN(2) ID_TL_SPAN(2)
>   [    0.489761] tl(SMT) CPU(3) ID(3) CPU_TL_SPAN(3) ID_TL_SPAN(3)
>   [    0.490876] tl(SMT) CPU(4) ID(4) CPU_TL_SPAN(4) ID_TL_SPAN(4)
>   [    0.491996] tl(SMT) CPU(5) ID(5) CPU_TL_SPAN(5) ID_TL_SPAN(5)
>   [    0.493115] tl(SMT) CPU(6) ID(6) CPU_TL_SPAN(6) ID_TL_SPAN(6)
>   [    0.493754] tl(SMT) CPU(7) ID(7) CPU_TL_SPAN(7) ID_TL_SPAN(7)
>   [    0.494875] tl(SMT) CPU(8) ID(8) CPU_TL_SPAN(8) ID_TL_SPAN(8)
>   [    0.496008] tl(SMT) CPU(9) ID(9) CPU_TL_SPAN(9) ID_TL_SPAN(9)
>   [    0.497129] tl(PKG) CPU(0) ID(0) CPU_TL_SPAN(0-1) ID_TL_SPAN(0-1)
>   [    0.497763] tl(PKG) CPU(1) ID(0) CPU_TL_SPAN(0-1) ID_TL_SPAN(0-1)
>   [    0.498954] tl(PKG) CPU(2) ID(2) CPU_TL_SPAN(2-3) ID_TL_SPAN(2-3)
>   [    0.500167] tl(PKG) CPU(3) ID(2) CPU_TL_SPAN(2-3) ID_TL_SPAN(2-3)
>   [    0.501371] tl(PKG) CPU(4) ID(4) CPU_TL_SPAN(4-5) ID_TL_SPAN(4-5)
>   [    0.501792] tl(PKG) CPU(5) ID(4) CPU_TL_SPAN(4-5) ID_TL_SPAN(4-5)
>   [    0.503001] tl(PKG) CPU(6) ID(6) CPU_TL_SPAN(6-7) ID_TL_SPAN(6-7)
>   [    0.504202] tl(PKG) CPU(7) ID(6) CPU_TL_SPAN(6-7) ID_TL_SPAN(6-7)
>   [    0.505419] tl(PKG) CPU(8) ID(8) CPU_TL_SPAN(8-9) ID_TL_SPAN(8-9)
>   [    0.506637] tl(PKG) CPU(9) ID(8) CPU_TL_SPAN(8-9) ID_TL_SPAN(8-9)
>   [    0.507843] tl(NODE) CPU(0) ID(0) CPU_TL_SPAN(0-1,8-9) ID_TL_SPAN(0-1,8-9)
>   [    0.509199] tl(NODE) CPU(1) ID(0) CPU_TL_SPAN(0-1,8-9) ID_TL_SPAN(0-1,8-9)
>   [    0.509792] tl(NODE) CPU(2) ID(2) CPU_TL_SPAN(2-3,8-9) ID_TL_SPAN(2-3,8-9)

Looking at this, NODE should be a SD_OVERLAP domain here since the spans
across the nodes overlap. The following solves the warning for me:

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8e06b1d22e91..759f7b8e24e6 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2010,6 +2010,7 @@ void sched_init_numa(int offline_node)
  	 */
  	tl[i++] = (struct sched_domain_topology_level){
  		.mask = sd_numa_mask,
+		.flags = SDTL_OVERLAP,
  		.numa_level = 0,
  		SD_INIT_NAME(NODE)
  	};
--

NODE domain gets degenerated eventually via the default return in
sd_parent_degenerate() based on my tracing since "~cflags & pflags"
between PKG and NODE is 0 (node always has 1 group) but I'm not
sure if this requires more fundamental modification to
"sd_numa_mask".

Valentin, Peter, what is the right solution here?

>   [    0.511143] Failed tl: NODE
>   [    0.511789] Failed for CPU: 2
>   [    0.512466] ID CPU at tl: 2
>   [    0.513115] Failed CPU span at tl: 2-3,8-9
>   [    0.513701] ID CPU span: 2-3,8-9
>   [    0.514419] ID CPUs seen: 0
>   [    0.515055] CPUs covered: 0-1,8-9 
-- 
Thanks and Regards,
Prateek


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ