lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 8 Aug 2023 09:40:01 +0200
From:   Juergen Gross <jgross@...e.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>
Cc:     x86@...nel.org, Tom Lendacky <thomas.lendacky@....com>,
        Andrew Cooper <andrew.cooper3@...rix.com>,
        Arjan van de Ven <arjan@...ux.intel.com>,
        Huang Rui <ray.huang@....com>,
        Dimitri Sivanich <dimitri.sivanich@....com>,
        Michael Kelley <mikelley@...rosoft.com>,
        Sohil Mehta <sohil.mehta@...el.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Zhang Rui <rui.zhang@...el.com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Feng Tang <feng.tang@...el.com>,
        Andy Shevchenko <andy@...radead.org>
Subject: Re: [patch 00/53] x86/topology: The final installment

On 07.08.23 15:52, Thomas Gleixner wrote:
> Hi!
> 
> This is the (for now) last part of reworking topology enumeration and
> management. It's based on the APIC and CPUID rework series which can be
> found here:
> 
>        https://lore.kernel.org/lkml/20230802101635.459108805@linutronix.de
> 
> With these preparatory changes in place, it's now possible to address the
> real issues of the current topology code:
> 
>    - Wrong core count on hybrid systems
> 
>    - Heuristics based size information for packages and dies which
>      are failing to work correctly with certain command line parameters.
> 
>    - Full evaluation fail for a theoretical hybrid system which boots
>      from an E-core
> 
>    - The complete insanity of manipulating global data from firmware parsers
>      or the XEN/PV fake SMP enumeration. The latter is really a piece of art.
> 
> This series addresses this by
> 
>    - Mopping up some more historical technical debt
> 
>    - Consolidating all topology relevant functionality into one place
> 
>    - Providing separate interfaces for boot time and ACPI hotplug operations
> 
>    - A sane ordering of command line options and restrictions
> 
>    - A sensible way to handle the BSP problem in kdump kernels instead of
>      the unreliable command line option.
> 
>    - Confinement of topology relevant variables by replacing the XEN/PV SMP
>      enumeration fake with something halfways sensible.
> 
>    - Evaluation of sizes by analysing the topology via the CPUID provided
>      APIC ID segmentation and the actual APIC IDs which are registered at
>      boot time.
> 
>    - Removal of heuristics and broken size calculations
> 
> The idea behind this is the following:
> 
> The APIC IDs describe the system topology in multiple domain levels. The
> CPUID topology parser provides the information which part of the APIC ID is
> associated to the individual levels (Intel terminology):
> 
>     [ROOT][PACKAGE][DIE][TILE][MODULE][CORE][THREAD]
> 
> The root space contains the package (socket) IDs. Not enumerated levels
> consume 0 bits space, but conceptually they are always represented. If
> e.g. only CORE and THREAD levels are enumerated then the DIE, MODULE and
> TILE have the same physical ID as the PACKAGE.
> 
> If SMT is not supported, then the THREAD domain is still used. It then
> has the same physical ID as the CORE domain and is the only child of
> the core domain.
> 
> This allows an unified view on the system independent of the enumerated
> domain levels without requiring any conditionals in the code.
> 
> AMD does only expose 4 domain levels with obviously different terminology,
> but that can be easily mapped into the Intel variant with a trivial lookup
> table added to the CPUID parser.
> 
> The resulting topology information of an ADL hybrid system with 8 P-Cores
> and 8 E-Cores looks like this:
> 
>   CPU topo: Max. logical packages:   1
>   CPU topo: Max. logical dies:       1
>   CPU topo: Max. dies per package:   1
>   CPU topo: Max. threads per core:   2
>   CPU topo: Num. cores per package:    16
>   CPU topo: Num. threads per package:  24
>   CPU topo: Allowing 24 present CPUs plus 0 hotplug CPUs
>   CPU topo: Thread    :    24
>   CPU topo: Core      :    16
>   CPU topo: Module    :     1
>   CPU topo: Tile      :     1
>   CPU topo: Die       :     1
>   CPU topo: Package   :     1
> 
> This is happening on the boot CPU before any of the APs is started and
> provides correct size information right from the start.
> 
> Even the XEN/PV trainwreck makes use of this now. On Dom0 it utilizes the
> MADT and on DomU it provides fake APIC IDs, which combined with the
> provided CPUID information make it at least look halfways realistic instead
> of claiming to have one CPU per package as the current upstream code does.
> 
> This is solely addressing the core topology issues, but there is a plan for
> further consolidation of other topology related information into one single
> source of information instead of having a gazillion of localized special
> parsers and representations all over the place. There are quite some other
> things which can be simplified on top of this, like updating the various
> cpumasks during CPU bringup, but that's all left for later.
> 
> So another 53 patches later, the resulting diffstat is:
> 
>     64 files changed, 830 insertions(+), 955 deletions(-)
> 
> and the combo diffstat of all three series combined:
> 
>    115 files changed, 2414 insertions(+), 3035 deletions(-)
> 
> The current series applies on top of
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v3
> 
> and is available from git here:
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-full-v1

Tested on an Intel system with Xen:

- PV dom0 is working fine. I couldn't test physical cpu hotplug, but removing
   and then re-adding vcpus to dom0 worked.

- PV domU is working fine, too. A test with starting using 2 vcpus initially
   and onlining another 2 vcpus later was doing fine.

So for Xen PV you can add my:

Tested-by: Juergen Gross <jgross@...e.com>

One other thing to mention: with this series the reported topology via "lscpu"
and "cat /proc/cpuinfo" inside a PV guest/dom0 is looking sane for the first
time. :-)

Thanks for this significant improvement!


Juergen

Download attachment "OpenPGP_0xB0DE9DD628BF132F.asc" of type "application/pgp-keys" (3099 bytes)

Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (496 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ