lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 27 Jun 2017 20:07:10 +0700
From:   Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     x86@...nel.org, linux-kernel@...r.kernel.org, leo.duran@....com,
        yazen.ghannam@....com, Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH 1/2] x86/CPU/AMD: Present package as die instead of socket

Hi Boris,

On 6/27/17 17:48, Borislav Petkov wrote:
> On Tue, Jun 27, 2017 at 01:40:52AM -0500, Suravee Suthikulpanit wrote:
>> According to the Documentation/x86/topology.txt, AMD nomenclature for
>> package is NUMA node (or die).
>
> You guys keep talking about die so you should add the definition of
> "die" to that document so that we're all on the same page. In a separate
> patch.

What we are trying point out here is that (NUMA) "node" and "die" are the same 
thing in most of AMD processors, not necessary trying to introduce another term 
here.

>> However, this is not the case on AMD family17h multi-die processor
>> platforms, which can have up to 4 dies per socket as shown in the
>> following system topology.
>
> So what exactly does that mean? A die is a package on ZN and you can have up to 4 packages on a physical socket?

Yes. 4 packages (or 4 dies, or 4 NUMA nodes) in a socket.

>> Current logic interpretes package as socket (i.e. phys_proc_id is
>> socket id), which results in setting x86_has_numa_in_package, and omits
>> the DIE schedule domain.
>
> Well, this is what we do on MCM Magny-Cours. Btw, here's where you
> explain *why* you need the DIE domain.

As I have described in the cover letter, this patch series changes how kernel 
derives cpu "package" from package-as-socket to package-as-die in order to fix 
following issues on AMD family17h multi-die processor platforms:

  * irqbalance fails to allocating IRQs to individual CPU within the die.

  * The scheduler fails to load-balance across 8 threads within a die`
    (e.g. running 8-thread application w/ taskset -c 0-7 ) with
    the DIE schedule domain omitted due to x86_has_numa_in_package.

These issues are fixed when properly intepretes package as DIE.

For MCM Magny-Cours (family15h), we do not have this issue because the MC 
sched-domain has the same cpumask as the DIE sched-domain. This is not the same 
as for Zen cores (family17h) where the MC sched-domain is the CCX (2 CCX per die)

>> However, NUMA schedule domains are derived from
>> SRAT/SLIT, which assumes NUMA node is a die,
>
> Ok, so I don't understand this: SRAT/SLIT should basically say that you
> have 4 *logical* NUMA nodes on that *physical* socket. Is that what you
> want? Or is not what you want? Because it makes sense to me that BIOS
> should describe how the logical grouping of the cores in a node is.
> And x86_numa_in_package_topology roughly means that we rely on SRAT/SLIT
> to tell us the topology.

SRAT/SLIT tables are doing the right thing, where it describes topology among 
*logical* NUMA nodes (dies). So, for the system view below:

>> System View (with 2 socket) :
>>            --------------------
>>            |     -------------|------
>>            |     |            |     |
>>          ------------       ------------
>>          | D1 -- D0 |       | D7 -- D6 |
>>          | |  \/ |  |       | |  \/ |  |
>>  SOCKET0 | |  /\ |  |       | |  /\ |  | SOCKET1
>>          | D2 -- D3 |       | D4 -- D5 |
>>          ------------       ------------
>>            |     |            |     |
>>            ------|------------|     |
>>                  --------------------

SLIT table is showing

node   0   1   2   3   4   5   6   7
   0:  10  16  16  16  32  32  32  32
   1:  16  10  16  16  32  32  32  32
   2:  16  16  10  16  32  32  32  32
   3:  16  16  16  10  32  32  32  32
   4:  32  32  32  32  10  16  16  16
   5:  32  32  32  32  16  10  16  16
   6:  32  32  32  32  16  16  10  16
   7:  32  32  32  32  16  16  16  10

However, SRAT/SLIT does not describe the DIE. So, using 
x86_numa_in_packge_topology on multi-die Zen processor will result in missing 
the DIE sched-domain for cpus within a die.

> So why can't SRAT/SLIT do that just like they did on Magny-Cours MCM
> packages?

Again, Magny-Cours MCM DIE sched-domain is the same as MC sched-domain. So, we 
can omit the DIE sched-domain. Here is an example of /proc/schedstat

Magney-Cours cpu0
domain0 00000000,00000003 (SMT)
domain1 00000000,000000ff (MC which is the same as DIE)
domain2 00ff00ff,00ffffff (NUMA 1 hop)
domain3 ffffffff,ffffffff (NUMA platform)

Zen cpu0 (package-as-socket)
domain0 00000000,00000001,00000000,00000001 (SMT)
domain1 00000000,0000000f,00000000,0000000f (MC ccx)
domain2 00000000,ffffffff,00000000,ffffffff (NUMA socket)
domain3 ffffffff,ffffffff,ffffffff,ffffffff (NUMA platform)

Zen cpu0 (package-as-die)
domain0 00000000,00000001,00000000,00000001 (SMT)
domain1 00000000,0000000f,00000000,0000000f (MC ccx)
domain2 00000000,000000ff,00000000,000000ff (DIE)
domain3 00000000,ffffffff,00000000,ffffffff (NUMA socket)
domain4 ffffffff,ffffffff,ffffffff,ffffffff (NUMA platform)

Hope this helps clarify the purpose of this patch series. If no other concerns 
going into this direction, I'll split and clean up the patch series per your 
initial review comments.

Thanks,
Suravee

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ