[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BY5PR21MB15067D1C5AA731A340A7AF34CE4D9@BY5PR21MB1506.namprd21.prod.outlook.com>
Date: Fri, 7 Jan 2022 20:31:40 +0000
From: Long Li <longli@...rosoft.com>
To: "Michael Kelley (LINUX)" <mikelley@...rosoft.com>,
"longli@...uxonhyperv.com" <longli@...uxonhyperv.com>,
"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
Purna Pavan Chandra Aekkaladevi <paekkaladevi@...rosoft.com>
Subject: RE: [PATCH] PCI: hv: Fix NUMA node assignment when kernel boots with
parameters affecting NUMA topology
> Subject: RE: [PATCH] PCI: hv: Fix NUMA node assignment when kernel boots
> with parameters affecting NUMA topology
>
> From: longli@...uxonhyperv.com <longli@...uxonhyperv.com> Sent:
> Thursday, January 6, 2022 3:20 PM
> >
> > When the kernel boots with parameters restricting the number of cpus
> > or NUMA nodes, e.g. maxcpus=X or numa=off, the vPCI driver should only
> > set to the NUMA node to a value that is valid in the current running kernel.
> >
> > Signed-off-by: Long Li <longli@...rosoft.com>
> > ---
> > drivers/pci/controller/pci-hyperv.c | 17 +++++++++++++++--
> > 1 file changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/controller/pci-hyperv.c
> > b/drivers/pci/controller/pci- hyperv.c index
> > fc1a29acadbb..8686343eff4c 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -1835,8 +1835,21 @@ static void hv_pci_assign_numa_node(struct
> > hv_pcibus_device *hbus)
> > if (!hv_dev)
> > continue;
> >
> > - if (hv_dev->desc.flags &
> HV_PCI_DEVICE_FLAG_NUMA_AFFINITY)
> > - set_dev_node(&dev->dev, hv_dev-
> >desc.virtual_numa_node);
> > + if (hv_dev->desc.flags &
> HV_PCI_DEVICE_FLAG_NUMA_AFFINITY) {
> > + int cpu;
> > + bool found_node = false;
> > +
> > + for_each_possible_cpu(cpu)
> > + if (cpu_to_node(cpu) ==
> > + hv_dev->desc.virtual_numa_node) {
> > + found_node = true;
> > + break;
> > + }
> > +
> > + if (found_node)
> > + set_dev_node(&dev->dev,
> > + hv_dev-
> >desc.virtual_numa_node);
> > + }
>
> I'm wondering about this approach vs. just comparing against nr_node_ids.
I was trying to fix this by comparing with nr_node_ids. This worked for numa=off, but it didn't work with maxcpus=X.
maxcpus=X is commonly used in kdump kernels. In this config, the memory system is initialized in a way that only the NUMA nodes within maxcpus are setup and can be used by the drivers.
> Comparing against nr_node_ids would handle the case of numa=off on the
> kernel boot line, or a kernel built with CONFIG_NUMA=n, or the use of
> numa=fake. Your approach is also affected by which CPUs are online, since
> cpu_to_node() references percpu data. It would seem to produce more
> variable results since CPUs can go online and offline while the VM is running.
> If a network VF device was removed and re-added, the results of your
> algorithm could be different for the re-add, depending on which CPUs were
> online at the time.
>
> My impression (which may be incorrect) is that the device numa_node is
> primarily to allow the driver to allocate memory from the closest NUMA node,
> and such memory allocations don't really need to be affected by which CPUs
> are online.
Yes, this is the reason I'm using for_each_possible_cpu(). Even if some CPUs are not online, the memory system is setup in a way that allow driver to allocate memory on that NUMA node. The algorithm guarantees the value of NUMA node is valid when calling set_dev_node().
>
> Thoughts?
>
> >
> > put_pcichild(hv_dev);
> > }
> > --
> > 2.25.1
Powered by blists - more mailing lists