[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200918131701.000061b2@Huawei.com>
Date: Fri, 18 Sep 2020 13:17:01 +0100
From: Jonathan Cameron <Jonathan.Cameron@...wei.com>
To: <linux-mm@...ck.org>, <linux-acpi@...r.kernel.org>,
<linux-arm-kernel@...ts.infradead.org>, <x86@...nel.org>,
Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
<rafael@...nel.org>, Ingo Molnar <mingo@...hat.com>
CC: Bjorn Helgaas <bhelgaas@...gle.com>,
<linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>, <linuxarm@...wei.com>,
Dan Williams <dan.j.williams@...el.com>,
Brice Goglin <Brice.Goglin@...ia.fr>,
"Sean V Kelley" <sean.v.kelley@...ux.intel.com>,
<linux-api@...r.kernel.org>, Hanjun Guo <guohanjun@...wei.com>
Subject: Re: [PATCH v10 0/6] ACPI: Support Generic Initiator proximity
domains
On Mon, 7 Sep 2020 22:03:01 +0800
Jonathan Cameron <Jonathan.Cameron@...wei.com> wrote:
> It would be very nice to finally merge this support during this cycle,
> so please take a look.
Hi All,
Just a quick reminder that this set is still looking for review.
Thanks,
Jonathan
>
> I think we need acks covering x86, ARM and ACPI. Rafael took a look back
> in November at v5 and was looking for x86 and ARM acks. Whilst there is
> no ARM specific code left we probably still need an Ack. If anyone is
> missing from the cc list, please add them.
>
> Introduces a new type of NUMA node for cases where we want to represent
> the access characteristics of a non CPU initiator of memory requests,
> as these differ from all those for existing nodes containing CPUs and/or
> memory.
>
> These Generic Initiators are presented by the node access0 class in
> sysfs in the same way as a CPU. It seems likely that there will be
> usecases in which the best 'CPU' is desired and Generic Initiators
> should be ignored. The final few patches in this series introduced
> access1 which is a new performance class in the sysfs node description
> which presents only CPU to memory relationships. Test cases for this
> are described below.
>
> Changes since v9:
> Thanks to Bjorn Helgaas for review.
> * Fix ordering of checks in patch 4 so we check the version number first.
>
> Changes since v8:
> * ifdef protections and stubs to avoid a build error on ia64. I'm assuming
> no one cares about Generic Initiators on IA64 (0-day)
> * Update OSC code to ensure we don't claim to support GIs except on x86 and
> ARM64
>
> Changes since V7:
>
> * Now independent from
> [PATCH v3 0/6] ACPI: Only create NUMA nodes from entries in SRAT or SRAT emulation
> * Minor documentation tweak.
> * Rebase on v5.9-rc1
>
> Changes since V6:
>
> * Rebase on 5.8-rc4 + Dependency as above.
> * Drop the ARM64 specific code. No specific calls are needed on ARM64
> as the generic node init is done for all nodes, whether or not they
> have memory. X86 does memoryless nodes separately from those with
> memory and hence needs to specifically intialize GI only nodes.
> * Fix up an error in the docs reported by Brice Goglin who also did
> quite a bit of testing of v5. Thanks!
>
> Changes since V5:
>
> 3 new patches:
> * A fix for a subtlety in how ACPI 6.3 changed part of the HMAT table.
> * Introduction of access1 class to represent characteristics between CPU
> and memory, ingnoring GIs unlike access0 which includes them.
> * Docs to describe the new access0 class.
>
> Note that I ran a number of test cases for the new class which are
> described at the end of this email.
>
> Changes since V4:
>
> At Rafael's suggestion:
>
> Rebase on top of Dan William's Specific Purpose Memory series as that
> moves srat.c Original patches cherry-picked fine onto mmotm with Dan's
> patches applied.
>
> Applies to mmotm-2019-09-25 +
> https://lore.kernel.org/linux-acpi/156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com/
> [PATCH v4 00/10] EFI Specific Purpose Memory Support
> (note there are some trivial conflicts to deal with when applying
> the SPM series).
>
> Change since V3.
> * Rebase.
>
> Changes since RFC V2.
> * RFC dropped as now we have x86 support, so the lack of guards in in the
> ACPI code etc should now be fine.
> * Added x86 support. Note this has only been tested on QEMU as I don't have
> a convenient x86 NUMA machine to play with. Note that this fitted together
> rather differently from arm64 so I'm particularly interested in feedback
> on the two solutions.
>
> Since RFC V1.
> * Fix incorrect interpretation of the ACPI entry noted by Keith Busch
> * Use the acpica headers definitions that are now in mmotm.
>
> It's worth noting that, to safely put a given device in a GI node, may
> require changes to the existing drivers as it's not unusual to assume
> you have local memory or processor core. There may be further constraints
> not yet covered by this patch.
>
> Original cover letter...
>
> ACPI 6.3 introduced a new entity that can be part of a NUMA proximity domain.
> It may share such a domain with the existing options (memory, CPU etc) but it
> may also exist on it's own.
>
> The intent is to allow the description of the NUMA properties (particularly
> via HMAT) of accelerators and other initiators of memory activity that are not
> the host processor running the operating system.
>
> This patch set introduces 'just enough' to make them work for arm64 and x86.
> It should be trivial to support other architectures, I just don't suitable
> NUMA systems readily available to test.
>
> There are a few quirks that need to be considered.
>
> 1. Fall back nodes
> ******************
>
> As pre ACPI 6.3 supporting operating systems do not have Generic Initiator
> Proximity Domains it is possible to specify, via _PXM in DSDT that another
> device is part of such a GI only node. This currently blows up spectacularly.
>
> Whilst we can obviously 'now' protect against such a situation (see the related
> thread on PCI _PXM support and the threadripper board identified there as
> also falling into the problem of using non existent nodes
> https://patchwork.kernel.org/patch/10723311/ ), there is no way to be sure
> we will never have legacy OSes that are not protected against this. It would
> also be 'non ideal' to fallback to a default node as there may be a better
> (non GI) node to pick if GI nodes aren't available.
>
> The work around is that we also have a new system wide OSC bit that allows
> an operating system to 'announce' that it supports Generic Initiators. This
> allows, the firmware to us DSDT magic to 'move' devices between the nodes
> dependent on whether our new nodes are there or not.
>
> 2. New ways of assigning a proximity domain for devices
> *******************************************************
>
> Until now, the only way firmware could indicate that a particular device
> (outside the 'special' set of cpus etc) was to be found in a particular
> Proximity Domain by the use of _PXM in DSDT.
>
> That is equally valid with GI domains, but we have new options. The SRAT
> affinity structure includes a handle (ACPI or PCI) to identify devices
> with the system and specify their proximity domain that way. If both _PXM
> and this are provided, they should give the same answer.
>
> For now this patch set completely ignores that feature as we don't need
> it to start the discussion. It will form a follow up set at some point
> (if no one else fancies doing it).
>
> Test cases for the access1 class
> ********************************
>
> Test cases for Generic Initiator additions to HMAT.
>
> Setup
>
> PXM0 (node 0) - CPU0 CPU1, 2G memory
> PXM1 (node 1) - CPU2 CPU3, 2G memory
> PXM2 (node 2) - CPU4 CPU5, 2G memory
> PXM3 (node 4) - 2G memory (GI in one case below)
> PXM4 (node 3) - GI only.
>
> Config 1: GI in PXM4 nearer to memory in PXM 3 than CPUs, not direct attached
>
> [ 2.384064] acpi/hmat: HMAT: Locality: Flags:00 Type:Access Latency Initiator Domains:4 Target Domains:4 Base:256
> [ 2.384913] acpi/hmat: Initiator-Target[0-0]:1 nsec
> [ 2.385190] acpi/hmat: Initiator-Target[0-1]:9 nsec
> [ 2.385736] acpi/hmat: Initiator-Target[0-2]:9 nsec
> [ 2.385984] acpi/hmat: Initiator-Target[0-3]:9 nsec
> [ 2.386447] acpi/hmat: Initiator-Target[1-0]:9 nsec
> [ 2.386740] acpi/hmat: Initiator-Target[1-1]:1 nsec
> [ 2.386964] acpi/hmat: Initiator-Target[1-2]:9 nsec
> [ 2.387174] acpi/hmat: Initiator-Target[1-3]:9 nsec
> [ 2.387624] acpi/hmat: Initiator-Target[2-0]:9 nsec
> [ 2.387953] acpi/hmat: Initiator-Target[2-1]:9 nsec
> [ 2.388155] acpi/hmat: Initiator-Target[2-2]:1 nsec
> [ 2.388607] acpi/hmat: Initiator-Target[2-3]:9 nsec
> [ 2.388861] acpi/hmat: Initiator-Target[4-0]:13 nsec
> [ 2.389126] acpi/hmat: Initiator-Target[4-1]:13 nsec
> [ 2.389574] acpi/hmat: Initiator-Target[4-2]:13 nsec
> [ 2.389805] acpi/hmat: Initiator-Target[4-3]:5 nsec
>
> # Sysfs reads the same for nodes 0-2 for access0 and access1 as no GI involved.
>
> /sys/bus/node/devices/...
> node0 #1 and 2 similar.
> access0
> initiators
> node0
> read_bandwidth 0 #not specificed in hmat
> read_latency 1
> write_bandwidth 0
> write_latency 1
> power
> targets
> node0
> uevent
> access1
> initiators
> node0
> read_bandwidth 0
> read_latency 1
> write_bandwidth 0
> read_bandwidth 1
> power
> targets
> node 0
> uevent
> compact
> cpu0
> cpu1
> ...
> node3 # Note PXM 4, contains GI only
> access0
> initiators
> *empty*
> power
> targets
> node4
> uevent
> compact
> ...
> node4
> access0
> initiators
> node3
> read_bandwidth 0
> read_latency 5
> write_bandwidth 0
> write_latency 5
> power
> targets
> *empty*
> uevent
> access1
> initiators
> node0
> node1
> node2
> read_bandwidth 0
> read_latency 9
> write_bandwidth 0
> write_latency 9
> power
> targets
> *empty*
> uevent
> compact
> ...
>
> Config 2: GI in PXM4 further to memory in PXM 3 than CPUs, not direct attached
>
> [ 4.073493] acpi/hmat: HMAT: Locality: Flags:00 Type:Access Latency Initiator Domains:4 Target Domains:4 Base:256
> [ 4.074785] acpi/hmat: Initiator-Target[0-0]:1 nsec
> [ 4.075150] acpi/hmat: Initiator-Target[0-1]:9 nsec
> [ 4.075423] acpi/hmat: Initiator-Target[0-2]:9 nsec
> [ 4.076184] acpi/hmat: Initiator-Target[0-3]:9 nsec
> [ 4.077116] acpi/hmat: Initiator-Target[1-0]:9 nsec
> [ 4.077366] acpi/hmat: Initiator-Target[1-1]:1 nsec
> [ 4.077640] acpi/hmat: Initiator-Target[1-2]:9 nsec
> [ 4.078156] acpi/hmat: Initiator-Target[1-3]:9 nsec
> [ 4.078471] acpi/hmat: Initiator-Target[2-0]:9 nsec
> [ 4.078994] acpi/hmat: Initiator-Target[2-1]:9 nsec
> [ 4.079277] acpi/hmat: Initiator-Target[2-2]:1 nsec
> [ 4.079505] acpi/hmat: Initiator-Target[2-3]:9 nsec
> [ 4.080126] acpi/hmat: Initiator-Target[4-0]:13 nsec
> [ 4.080995] acpi/hmat: Initiator-Target[4-1]:13 nsec
> [ 4.081351] acpi/hmat: Initiator-Target[4-2]:13 nsec
> [ 4.082125] acpi/hmat: Initiator-Target[4-3]:13 nsec
>
> /sys/bus/node/devices/...
> node0 #1 and 2 similar.
> access0
> initiators
> node0
> read_bandwidth 0 #not specificed in hmat
> read_latency 1
> write_bandwidth 0
> write_latency 1
> power
> targets
> node0
> node4
> uevent
> access1
> initiators
> node0
> read_bandwidth 0
> read_latency 1
> write_bandwidth 0
> read_bandwidth 1
> power
> targets
> node0
> node4
> uevent
> compact
> cpu0
> cpu1
> ...
> node3 # Note PXM 4, contains GI only
> #No accessX directories.
> compact
> ...
> node4
> access0
> initiators
> node0
> node1
> node2
> read_bandwidth 0
> read_latency 9
> write_bandwidth 0
> write_latency 9
> power
> targets
> *empty*
> uevent
> access1
> initiators
> node0
> node1
> node2
> read_bandwidth 0
> read_latency 9
> write_bandwidth 0
> write_latency 9
> power
> targets
> *empty*
> uevent
> compact
> ...
>
>
> case 3 - as per case 2 but now the memory in node 3 is direct attached to the
> GI but nearer the main nodes (not physically sensible :))
>
> /sys/bus/node/devices/...
> node0 #1 and 2 similar.
> access0
> initiators
> node0
> read_bandwidth 0 #not specificed in hmat
> read_latency 1
> write_bandwidth 0
> write_latency 1
> power
> targets
> node0
> node4
> uevent
> access1
> initiators
> node0
> read_bandwidth 0
> read_latency 1
> write_bandwidth 0
> read_bandwidth 1
> power
> targets
> node0
> node4
> uevent
> compact
> cpu0
> cpu1
> ...
> node3 # Note PXM 4, contains GI only
> access0
> initiators
> *empty*
> power
> targets
> node4
> uevent
> compact
> ...
> node4
> access0
> initiators
> node3
> read_bandwidth 0
> read_latency 13
> write_bandwidth 0
> write_latency 13
> power
> targets
> *empty*
> uevent
> access1
> initiators
> node0
> node1
> node2
> read_bandwidth 0
> read_latency 9
> write_bandwidth 0
> write_latency 9
> power
> targets
> *empty*
> uevent
> compact
> ...
>
> Case 4 - nearer the GI, but direct attached to one of the CPUS.
> # Another bonkers one.
>
> /sys/bus/node/devices/...
> node0 #1 similar.
> access0
> initiators
> node0
> read_bandwidth 0 #not specificed in hmat
> read_latency 1
> write_bandwidth 0
> write_latency 1
> power
> targets
> node0
> node4
> uevent
> access1
> initiators
> node0
> read_bandwidth 0
> read_latency 1
> write_bandwidth 0
> read_bandwidth 1
> power
> targets
> node0
> uevent
> compact
> cpu0
> cpu1
> ...
> node2 # Direct attached to memory in node 3
> access0
> initiators
> node2
> read_bandwidth 0 #not specificed in hmat
> read_latency 1
> write_bandwidth 0
> write_latency 1
> power
> targets
> node2
> node4 #direct attached
> uevent
> access1
> initiators
> node2
> read_bandwidth 0
> read_latency 1
> write_bandwidth 0
> read_bandwidth 1
> power
> targets
> node2
> node4 #direct attached
> uevent
> compact
> cpu0
> cpu1
> ...
>
> node3 # Note PXM 4, contains GI only
> #No accessX directories.
> compact
> ...
> node4
> access0
> initiators
> node3
> read_bandwidth 0
> read_latency 13
> write_bandwidth 0
> write_latency 13
> power
> targets
> *empty*
> uevent
> access1
> initiators
> node0
> node1
> node2
> read_bandwidth 0
> read_latency 9
> write_bandwidth 0
> write_latency 9
> power
> targets
> *empty*
> uevent
> compact
> ...
>
> case 5 memory and GI together in node 3 (added an extra GI to node 3)
> Note hmat should also reflect this extra initiator domain.
>
> /sys/bus/node/devices/...
> node0 #1 and 2 similar.
> access0
> initiators
> node0
> read_bandwidth 0 #not specificed in hmat
> read_latency 1
> write_bandwidth 0
> write_latency 1
> power
> targets
> node0
> node4
> uevent
> access1
> initiators
> node0
> read_bandwidth 0
> read_latency 1
> write_bandwidth 0
> read_bandwidth 1
> power
> targets
> node0
> uevent
> compact
> cpu0
> cpu1
> ...
> node3 # Note PXM 3, contains GI only
> #No accessX directories.
> compact
> ...
> node4 # Now memory and GI.
> access0
> initiators
> node4
> read_bandwidth 0
> read_latency 1
> write_bandwidth 0
> write_latency 1
> power
> targets
> node4
> uevent
> access1
> initiators
> node0
> node1
> node2
> read_bandwidth 0
> read_latency 9
> write_bandwidth 0
> write_latency 9
> power
> targets
> *empty* # as expected GI doesn't paticipate in access 1.
> uevent
> compact
> ...
>
> Jonathan Cameron (6):
> ACPI: Support Generic Initiator only domains
> x86: Support Generic Initiator only proximity domains
> ACPI: Let ACPI know we support Generic Initiator Affinity Structures
> ACPI: HMAT: Fix handling of changes from ACPI 6.2 to ACPI 6.3
> node: Add access1 class to represent CPU to memory characteristics
> docs: mm: numaperf.rst Add brief description for access class 1.
>
> Documentation/admin-guide/mm/numaperf.rst | 8 ++
> arch/x86/include/asm/numa.h | 2 +
> arch/x86/kernel/setup.c | 1 +
> arch/x86/mm/numa.c | 14 ++++
> drivers/acpi/bus.c | 4 +
> drivers/acpi/numa/hmat.c | 90 ++++++++++++++++++-----
> drivers/acpi/numa/srat.c | 69 ++++++++++++++++-
> drivers/base/node.c | 3 +
> include/linux/acpi.h | 1 +
> include/linux/nodemask.h | 1 +
> 10 files changed, 172 insertions(+), 21 deletions(-)
>
Powered by blists - more mailing lists