[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <MWHPR21MB15937B050A3E849A76384EA8D74D9@MWHPR21MB1593.namprd21.prod.outlook.com>
Date: Fri, 7 Jan 2022 15:34:15 +0000
From: "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
To: "longli@...uxonhyperv.com" <longli@...uxonhyperv.com>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
Purna Pavan Chandra Aekkaladevi <paekkaladevi@...rosoft.com>
CC: Long Li <longli@...rosoft.com>
Subject: RE: [PATCH] PCI: hv: Fix NUMA node assignment when kernel boots with
parameters affecting NUMA topology
From: longli@...uxonhyperv.com <longli@...uxonhyperv.com> Sent: Thursday, January 6, 2022 3:20 PM
>
> When the kernel boots with parameters restricting the number of cpus or NUMA
> nodes, e.g. maxcpus=X or numa=off, the vPCI driver should only set to the NUMA
> node to a value that is valid in the current running kernel.
>
> Signed-off-by: Long Li <longli@...rosoft.com>
> ---
> drivers/pci/controller/pci-hyperv.c | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-
> hyperv.c
> index fc1a29acadbb..8686343eff4c 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -1835,8 +1835,21 @@ static void hv_pci_assign_numa_node(struct
> hv_pcibus_device *hbus)
> if (!hv_dev)
> continue;
>
> - if (hv_dev->desc.flags & HV_PCI_DEVICE_FLAG_NUMA_AFFINITY)
> - set_dev_node(&dev->dev, hv_dev->desc.virtual_numa_node);
> + if (hv_dev->desc.flags & HV_PCI_DEVICE_FLAG_NUMA_AFFINITY) {
> + int cpu;
> + bool found_node = false;
> +
> + for_each_possible_cpu(cpu)
> + if (cpu_to_node(cpu) ==
> + hv_dev->desc.virtual_numa_node) {
> + found_node = true;
> + break;
> + }
> +
> + if (found_node)
> + set_dev_node(&dev->dev,
> + hv_dev->desc.virtual_numa_node);
> + }
I'm wondering about this approach vs. just comparing against nr_node_ids.
Comparing against nr_node_ids would handle the case of numa=off on the
kernel boot line, or a kernel built with CONFIG_NUMA=n, or the use of
numa=fake. Your approach is also affected by which CPUs are online,
since cpu_to_node() references percpu data. It would seem to produce
more variable results since CPUs can go online and offline while the VM
is running. If a network VF device was removed and re-added, the results
of your algorithm could be different for the re-add, depending on which
CPUs were online at the time.
My impression (which may be incorrect) is that the device numa_node
is primarily to allow the driver to allocate memory from the closest
NUMA node, and such memory allocations don't really need to be
affected by which CPUs are online.
Thoughts?
>
> put_pcichild(hv_dev);
> }
> --
> 2.25.1
Powered by blists - more mailing lists