lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 28 Jun 2017 11:12:52 +0200
From:   Borislav Petkov <bp@...en8.de>
To:     Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>
Cc:     x86@...nel.org, linux-kernel@...r.kernel.org, leo.duran@....com,
        yazen.ghannam@....com, Peter Zijlstra <peterz@...radead.org>,
        "Lendacky, Thomas" <Thomas.Lendacky@....com>,
        Matt Fleming <matt@...eblueprint.co.uk>,
        Mel Gorman <mgorman@...e.com>
Subject: Re: [PATCH 1/2] x86/CPU/AMD: Present package as die instead of socket

+ Matt and Mel.

On Wed, Jun 28, 2017 at 03:26:10AM +0700, Suravee Suthikulpanit wrote:
> So, from the definition above, we would like all those 16 threads to be in
> the same sched-domain, where threads from C0,1,2,3 are in the same
> sched-group, and threads in C4,5,6,7 are in another sched-group.

Figures, you want to have a sched group per L3.

> > Now that thing has a memory controller attached to it, correct?
> 
> Yes
> 
> > If so, why is this thing not a logical NUMA node, as described in
> > SRAT/SLIT?
> 
> Yes, this thing is a logical NUMA node and represented correctly in the SRAT/SLIT.
> 
> > Now, SRAT should contain the assignment which core belongs to which
> > node. Why is that not sufficient?
> 
> Yes, SRAT provides cpu-to-node mapping, which is sufficient to tell
> scheduler what are the cpus within a NUMA node.
> 
> However, looking at the current sched-domain below. Notice that there is no
> sched-domain with 16 threads to represent a NUMA node:
> 
> cpu0
> domain0 00000000,00000001,00000000,00000001 (SMT)
> domain1 00000000,0000000f,00000000,0000000f (MC)
> domain2 00000000,ffffffff,00000000,ffffffff (NUMA)
> domain3 ffffffff,ffffffff,ffffffff,ffffffff (NUMA)
> 
> sched-domain2 (which represents a sched-domain containing all cpus within a
> socket) would have 8 sched-groups (based on the cpumasks from domain1).
> According to the documentation snippet above regarding balancing within a
> sched-domain, scheduler will try to do (NUMA) load-balance between 8 groups
> (spanning 4 NUMA node). Here, IINM, it would be more beneficial if the
> scheduler would try to load balance between the two groups within the same
> NUMA node first before, going across NUMA node in order to minimize memory
> latency. This would require another sched-domain between domain 1 and 2,
> which represent all 16 threads within a NUMA node (i.e. die sched-domain),
> this would allow scheduler to load balance within the NUMA node first,
> before going across NUMA node.
> 
> However, since the current code decides that x86_has_numa_in_package is
> true, it omits the die sched-domain. In order to avoid this, we are
> proposing to represent cpuinfo_x86.phys_proc_id using NUMA node ID (i.e. die
> ID). And this is the main point of the patch series.

Ok, so here's what I'm reading from all this: a 16-thread package *is*
represented as a logical NUMA node properly by SRAT/SLIT, after all.
However, sched_init_numa() doesn't parse it properly or something else
along that path keeps the scheduler from seeing the 4 NUMA nodes on the
physical socket.

Because we do use the SLIT table to build the NUMA domains. So
considering that 16T package really is represented properly as a logical
NUMA node, why doesn't it get detected as such?

This is what needs asnwering first and not some forceful fitting the
topology into a DIE domain.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ