linux-kernel - Re: [PATCH v2 1/8] x86/CPU/AMD: Save NodeId on AMD-based systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200914192039.GA39519@yaz-nikka.amd.com>
Date:   Mon, 14 Sep 2020 14:20:39 -0500
From:   Yazen Ghannam <yazen.ghannam@....com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org,
        tony.luck@...el.com, x86@...nel.org,
        Smita.KoralahalliChannabasappa@....com
Subject: Re: [PATCH v2 1/8] x86/CPU/AMD: Save NodeId on AMD-based systems

On Thu, Sep 10, 2020 at 12:14:43PM +0200, Borislav Petkov wrote:
> On Wed, Sep 09, 2020 at 03:17:55PM -0500, Yazen Ghannam wrote:
> > We need to access specific instances of hardware registers in the
> > Northbridge or Data Fabric. The code in arch/x86/kernel/amd_nb.c does
> > this.
> 
> So you don't need the node_id - you need the northbridge/data fabric ID?
> I'm guessing NB == DF, i.e., it was NB before Zen and it is DF now.
> 
> Yes?
>

Yes, that's right.

I called it "node_id" based on the AMD documentation and what it's
called today in the Linux code. It's called other things like nb_id and
nid too.

I think we can call it something else to avoid confusion with NUMA nodes
if that'll help.

> > Package = Socket, i.e. a field replaceable unit. Socket may not be
> > useful for software, but I think it helps users identify the hardware.
> > 
> > I think the following could be changed in the documentation:
> > 
> > "In the past a socket always contained a single package (see below), but
> > with the advent of Multi Chip Modules (MCM) a socket can hold more than one
> > package."
> > 
> > Replace "package" with "die".
> 
> So first of all, we have:
> 
> "AMD nomenclature for package is 'Node'."
> 
> so we either change that because as you explain, node != package on AMD.
> 
> What you need is the ID of that northbridge or data fabric instance,
> AFAIU.
> 
> > You take multiple dies from the foundry and you "package" them together
> > into a single unit.
> 
> I think you're overloading the word "package" here and that leads to
> more confusion. Package in our definition - Linux' - is:
> 
> "Packages contain a number of cores plus shared resources, e.g. DRAM
> controller, shared caches etc." If you glue several packages together,
> you get an MCM.
> 

Yes, you're right. The AMD documentation is different, so I'll try to
stick with the Linux documentation and qualify names with "AMD" when
noting the usage by the AMD docs.

> > They could be equal depending on the system. The values are different on
> > MCM systems like Bulldozer and Naples though.
> > 
> > The functions and structures in amd_nb.c are indexed by the node_id.
> > This is done implicitly right now by using amd_get_nb_id()/cpu_llc_id.
> > But the LLC isn't always equal to the Node/Die like in Naples. So the
> > patches in this set save and explicitly use the node_id when needed.
> > 
> > What do you think?
> 
> Sounds to me that you want to ID that data fabric instance which
> logically belongs to one or multiple packages. Or can a DF a single
> package?
> 
> So let's start simple: how does a DF instance map to a logical NUMA
> node or package? Can a DF serve multiple packages?
> 

There's one DF/NB per package and it's a fixed value, i.e. it shouldn't
change based on the NUMA configuration.

Here's an example of a 2 socket Naples system with 4 packages per socket
and setup to have 1 NUMA node. The "node_id" value is the AMD NodeId
from CPUID.

CPU=0 phys_proc_id=0 node_id=0 cpu_to_node()=0
CPU=8 phys_proc_id=0 node_id=1 cpu_to_node()=0
CPU=16 phys_proc_id=0 node_id=2 cpu_to_node()=0
CPU=24 phys_proc_id=0 node_id=3 cpu_to_node()=0
CPU=32 phys_proc_id=1 node_id=4 cpu_to_node()=0
CPU=40 phys_proc_id=1 node_id=5 cpu_to_node()=0
CPU=48 phys_proc_id=1 node_id=6 cpu_to_node()=0
CPU=56 phys_proc_id=1 node_id=7 cpu_to_node()=0

> You could use the examples at the end of Documentation/x86/topology.rst
> to explain how those things play together. And remember to not think
> about the physical aspect of the hardware structure because it doesn't
> mean anything to software. All you wanna do is address the proper DF
> instance so this needs to be enumerable and properly represented by sw.
>

Yeah, I think example 4b works here. The mismatch though is with
phys_proc_id and package on AMD systems. You can see above that
phys_proc_id gives a socket number, and the AMD NodeId gives a package
number.

Should we add a note under cpuinfo_x86.phys_proc_id to make this
distinction?

> Confused?
> 
> I am.
> 
> :-)
>

Yeah, me too. :)

Thanks,
Yazen