lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 27 Jun 2017 18:32:34 +0000
From:   "Ghannam, Yazen" <Yazen.Ghannam@....com>
To:     Borislav Petkov <bp@...en8.de>,
        "Suthikulpanit, Suravee" <Suravee.Suthikulpanit@....com>
CC:     "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Duran, Leo" <leo.duran@....com>,
        "Peter Zijlstra" <peterz@...radead.org>
Subject: RE: [PATCH 1/2] x86/CPU/AMD: Present package as die instead of socket

> -----Original Message-----
> From: linux-kernel-owner@...r.kernel.org [mailto:linux-kernel-
> owner@...r.kernel.org] On Behalf Of Borislav Petkov
> Sent: Tuesday, June 27, 2017 1:44 PM
> To: Suthikulpanit, Suravee <Suravee.Suthikulpanit@....com>
> Cc: x86@...nel.org; linux-kernel@...r.kernel.org; Duran, Leo
> <leo.duran@....com>; Ghannam, Yazen <Yazen.Ghannam@....com>;
> Peter Zijlstra <peterz@...radead.org>
> Subject: Re: [PATCH 1/2] x86/CPU/AMD: Present package as die instead of
> socket
> 
> On Tue, Jun 27, 2017 at 11:54:12PM +0700, Suravee Suthikulpanit wrote:
> > The 8 threads sharing each L3 are already in the same sched-domain1
> > (MC CCX). So, cpu0 is in the same sched-domain1 as
> > cpu1,2,3,64,65,66,67. Here, we need the DIE sched-domain because it
> > represents all cpus that are in the same NUMA node (since we have one
> memory controller per DIE).
> 
> So this is still confusing. Please drop the "DIE sched-domain" as that is
> something you're trying to define and I'm trying to parse what you're trying to
> define and why.
> 
> > IIUC, for Zen, w/o the DIE sched-domain, the scheduler could try to
> > re-balance the tasks from one CCX (schedule group) to another CCX
> > across NUMA node, and
> 
> CCX, schedule group, NUMA node, ... now my head is spinning. Do you see
> what I mean with agreeing on the nomenclature and proper term definitions
> first?
> 
> > potentially causing unnecessary performance due to remote memory
> access.
> >
> > Please note also that SRAT/SLIT information are used to derive the
> > NUMA sched-domains, while the DIE sched-domain is non-NUMA
> > sched-domain (derived from CPUID topology extension which is available on
> newer families).
> 
> So let's try to discuss this without using DIE sched-domain, CCX, etc, and let's
> start simple.
> 
> So in that die graphic:
> 
>               ----------------------------
>           C0  | T0 T1 |    ||    | T0 T1 | C4
>               --------|    ||    |--------
>           C1  | T0 T1 | L3 || L3 | T0 T1 | C5
>               --------|    ||    |--------
>           C2  | T0 T1 | #0 || #1 | T0 T1 | C6
>               --------|    ||    |--------
>           C3  | T0 T1 |    ||    | T0 T1 | C7
>               ----------------------------
> 
> you want all those threads to belong to a single scheduling group.
> Correct?
> 
> Now that thing has a memory controller attached to it, correct?
> 
> If so, why is this thing not a logical NUMA node, as described in SRAT/SLIT?
> 
> If not, what does a NUMA node entail on Zen as described by SRAT/SLIT?
> I.e., what is the difference between the two things? I.e., how many dies as
> above are in a NUMA node?
> 
> Now, SRAT should contain the assignment which core belongs to which node.
> Why is that not sufficient?
> 
> Ok, that should be enough questions for now. Let's start with them.
> 

This group is a NUMA node. It is the "identity" NUMA node. Linux skips the
identity NUMA node when finding the NUMA levels. This is fine as long as the
MC domain is equivalent to the identity NUMA node. However, this is not the
case on Zen systems.

We could patch the sched/topology.c to not skip the identity NUMA node.
Though this will affect all systems not just AMD.

Something like this:

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 1b0b4fb..98d856c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1103,6 +1103,8 @@ void sched_init_numa(void)
 	 * node_distance(i,j) in order to avoid cubic time.
 	 */
 	next_distance = curr_distance;
+	sched_domains_numa_distance[level++] = next_distance;
+	sched_domains_numa_levels = level;
 	for (i = 0; i < nr_node_ids; i++) {
 		for (j = 0; j < nr_node_ids; j++) {
 			for (k = 0; k < nr_node_ids; k++) {

Thanks,
Yazen 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ