linux-kernel - RE: [PATCH] PCI: hv: Fix NUMA node assignment when kernel boots with parameters affecting NUMA topology

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <MWHPR21MB15937B050A3E849A76384EA8D74D9@MWHPR21MB1593.namprd21.prod.outlook.com>
Date:   Fri, 7 Jan 2022 15:34:15 +0000
From:   "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
To:     "longli@...uxonhyperv.com" <longli@...uxonhyperv.com>,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        Purna Pavan Chandra Aekkaladevi <paekkaladevi@...rosoft.com>
CC:     Long Li <longli@...rosoft.com>
Subject: RE: [PATCH] PCI: hv: Fix NUMA node assignment when kernel boots with
 parameters affecting NUMA topology

From: longli@...uxonhyperv.com <longli@...uxonhyperv.com> Sent: Thursday, January 6, 2022 3:20 PM
> 
> When the kernel boots with parameters restricting the number of cpus or NUMA
> nodes, e.g. maxcpus=X or numa=off, the vPCI driver should only set to the NUMA
> node to a value that is valid in the current running kernel.
> 
> Signed-off-by: Long Li <longli@...rosoft.com>
> ---
>  drivers/pci/controller/pci-hyperv.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-
> hyperv.c
> index fc1a29acadbb..8686343eff4c 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -1835,8 +1835,21 @@ static void hv_pci_assign_numa_node(struct
> hv_pcibus_device *hbus)
>  		if (!hv_dev)
>  			continue;
> 
> -		if (hv_dev->desc.flags & HV_PCI_DEVICE_FLAG_NUMA_AFFINITY)
> -			set_dev_node(&dev->dev, hv_dev->desc.virtual_numa_node);
> +		if (hv_dev->desc.flags & HV_PCI_DEVICE_FLAG_NUMA_AFFINITY) {
> +			int cpu;
> +			bool found_node = false;
> +
> +			for_each_possible_cpu(cpu)
> +				if (cpu_to_node(cpu) ==
> +				    hv_dev->desc.virtual_numa_node) {
> +					found_node = true;
> +					break;
> +				}
> +
> +			if (found_node)
> +				set_dev_node(&dev->dev,
> +					     hv_dev->desc.virtual_numa_node);
> +		}

I'm wondering about this approach vs. just comparing against nr_node_ids.
Comparing against nr_node_ids would handle the case of numa=off on the
kernel boot line, or a kernel built with CONFIG_NUMA=n, or the use of
numa=fake.  Your approach is also affected by which CPUs are online,
since cpu_to_node() references percpu data.  It would seem to produce
more variable results since CPUs can go online and offline while the VM
is running.  If a network VF device was removed and re-added, the results
of your algorithm could be different for the re-add, depending on which
CPUs were online at the time.

My impression (which may be incorrect) is that the device numa_node
is primarily to allow the driver to allocate memory from the closest
NUMA node, and such memory allocations don't really need to be
affected by which CPUs are online.

Thoughts?

> 
>  		put_pcichild(hv_dev);
>  	}
> --
> 2.25.1