linux-kernel - Re: [PATCH] sched/topology: Improve load balancing on AMD EPYC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190624142420.GC2978@techsingularity.net>
Date:   Mon, 24 Jun 2019 15:24:20 +0100
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Matt Fleming <matt@...eblueprint.co.uk>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        "Lendacky, Thomas" <Thomas.Lendacky@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Suthikulpanit, Suravee" <Suravee.Suthikulpanit@....com>,
        Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH] sched/topology: Improve load balancing on AMD EPYC

On Wed, Jun 19, 2019 at 10:34:37PM +0100, Matt Fleming wrote:
> On Tue, 18 Jun, at 02:33:18PM, Peter Zijlstra wrote:
> > On Tue, Jun 18, 2019 at 11:43:19AM +0100, Matt Fleming wrote:
> > > This works for me under all my tests. Thoughts?
> > > 
> > > --->8---
> > > 
> > > diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> > > index 80a405c2048a..4db4e9e7654b 100644
> > > --- a/arch/x86/kernel/cpu/amd.c
> > > +++ b/arch/x86/kernel/cpu/amd.c
> > > @@ -8,6 +8,7 @@
> > >  #include <linux/sched.h>
> > >  #include <linux/sched/clock.h>
> > >  #include <linux/random.h>
> > > +#include <linux/topology.h>
> > >  #include <asm/processor.h>
> > >  #include <asm/apic.h>
> > >  #include <asm/cacheinfo.h>
> > > @@ -824,6 +825,8 @@ static void init_amd_zn(struct cpuinfo_x86 *c)
> > >  {
> > >  	set_cpu_cap(c, X86_FEATURE_ZEN);
> > >  
> > 
> > I'm thinking this deserves a comment. Traditionally the SLIT table held
> > relative memory latency. So where the identity is 10, 16 would indicate
> > 1.6 times local latency and 32 would be 3.2 times local.
> > 
> > Now, even very early on BIOS monkeys went about their business and put
> > in random values in an attempt to 'tune' the system based on how
> > $random-os behaved, which is all sorts of fu^Wwrong.
> > 
> > Now, I suppose my question is; is that 32 Zen puts in an actual relative
> > memory latency metric, or a random value we somehow have to deal with.
> > And can we pretty please describe the whole sordid story behind this
> > 'tunable' somewhere?
> 
> This is one for the AMD folks. I don't know if the memory latency
> really is 3.2 times or not, only that that's the value in all the Zen
> machines I have access to. Even this 2-socket one:
> 
> node distances:
> node   0   1 
>   0:  10  32 
>   1:  32  10 
> 
> Tom, Suravee?

Do not consider this an authorative response but based on what I know
of the physical topology, it is not unreasonable to use 32 in the SLIT
table. There is a small latency when accessing another die on the same
socket (details are generation specific). It's not quite a local access
but it's not as much as a traditional remote access either (hence 16 being
the base unit for another die to hint that it's not quite local but not
quite remote either). 32 is based on accessing a die on a remote socket
based on the expected performance and latency of the interconnect.

To the best of my knowledge, the magic numbers are reflective of the real
topology and not just a gamification of the numbers for a random OS. If
anything, the fact that there is a load balancing issue on Linux would
indicate that they were not picking random numbers for Linux at least :P

-- 
Mel Gorman
SUSE Labs