[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH0PR21MB3025DD41E24D239C0ECB11A9D7D99@PH0PR21MB3025.namprd21.prod.outlook.com>
Date: Thu, 26 May 2022 20:45:33 +0000
From: "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
To: Saurabh Sengar <ssengar@...ux.microsoft.com>,
KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
"wei.liu@...nel.org" <wei.liu@...nel.org>,
Dexuan Cui <decui@...rosoft.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Saurabh Singh Sengar <ssengar@...rosoft.com>
Subject: RE: [PATCH] Drivers: hv: vmbus: Adding isolated cpu support for
channel interrupts mapping
From: Saurabh Sengar <ssengar@...ux.microsoft.com> Sent: Thursday, May 26, 2022 11:55 AM
> Subject: [PATCH] Drivers: hv: vmbus: Adding isolated cpu support for channel interrupts
> mapping
Let me suggest a more compact and precise Subject:
Drivers: hv: vmbus: Don't assign VMbus channel interrupts to isolated CPUs
>
> Adding support for vmbus channels to take isolated cpu in consideration
> while assigning interrupt to different cpus. This also prevents user from
> setting any isolated cpu to vmbus channel interrupt assignment by sysfs
> entry. Isolated cpu can be configured by kernel command line parameter
> 'isolcpus=managed_irq,<#cpu>'.
Also, for the commit statement:
When initially assigning a VMbus channel interrupt to a CPU, don't choose
a managed IRQ isolated CPU (as specified on the kernel boot line with
parameter 'isolcpus=managed_irq,<#cpu>'). Also, when using sysfs to
change the CPU that a VMbus channel will interrupt, don't allow changing
to a managed IRQ isolated CPU.
>
> Signed-off-by: Saurabh Sengar <ssengar@...ux.microsoft.com>
> ---
> drivers/hv/channel_mgmt.c | 18 ++++++++++++------
> drivers/hv/vmbus_drv.c | 6 ++++++
> 2 files changed, 18 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 97d8f56..e1fe029 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -21,6 +21,7 @@
> #include <linux/cpu.h>
> #include <linux/hyperv.h>
> #include <asm/mshyperv.h>
> +#include <linux/sched/isolation.h>
>
> #include "hyperv_vmbus.h"
>
> @@ -728,16 +729,20 @@ static void init_vp_index(struct vmbus_channel *channel)
> u32 i, ncpu = num_online_cpus();
> cpumask_var_t available_mask;
> struct cpumask *allocated_mask;
> + const struct cpumask *hk_mask = housekeeping_cpumask(HK_TYPE_MANAGED_IRQ);
> u32 target_cpu;
> int numa_node;
>
> if (!perf_chn ||
> - !alloc_cpumask_var(&available_mask, GFP_KERNEL)) {
> + !alloc_cpumask_var(&available_mask, GFP_KERNEL) ||
> + cpumask_empty(hk_mask)) {
> /*
> * If the channel is not a performance critical
> * channel, bind it to VMBUS_CONNECT_CPU.
> * In case alloc_cpumask_var() fails, bind it to
> * VMBUS_CONNECT_CPU.
> + * If all the cpus are isolated, bind it to
> + * VMBUS_CONNECT_CPU.
> */
> channel->target_cpu = VMBUS_CONNECT_CPU;
> if (perf_chn)
> @@ -758,17 +763,19 @@ static void init_vp_index(struct vmbus_channel *channel)
> }
> allocated_mask = &hv_context.hv_numa_map[numa_node];
>
> - if (cpumask_equal(allocated_mask, cpumask_of_node(numa_node))) {
> +retry:
> + cpumask_xor(available_mask, allocated_mask, cpumask_of_node(numa_node));
There's a bug here that existed in the code prior to this patch. The code
checks to make sure cpumask_of_node(numa_node) is not empty, and then
later references cpumask_of_node(numa_node) again. But in between the
check and the use, one or more CPUs could go offline, leaving
cpumask_of_node(numa_node) empty since that array of cpumasks contains
only online CPUs. In such a case, execution could get stuck in an infinite
loop with available_mask being empty.
The solution is to call cpus_read_lock() before starting the main "for"
loop and then call cpus_read_unlock() at the end. This lock will prevent
CPUs from going offline, and hence ensure that the node mask can't
become empty. You'll notice that target_cpu_store() uses that lock
to prevent a similar problem.
Fixing this locking problem should probably be a separate patch.
Michael
> + cpumask_and(available_mask, available_mask, hk_mask);
> +
> + if (cpumask_empty(available_mask)) {
> /*
> * We have cycled through all the CPUs in the node;
> * reset the allocated map.
> */
> cpumask_clear(allocated_mask);
> + goto retry;
> }
>
> - cpumask_xor(available_mask, allocated_mask,
> - cpumask_of_node(numa_node));
> -
> target_cpu = cpumask_first(available_mask);
> cpumask_set_cpu(target_cpu, allocated_mask);
>
> @@ -778,7 +785,6 @@ static void init_vp_index(struct vmbus_channel *channel)
> }
>
> channel->target_cpu = target_cpu;
> -
> free_cpumask_var(available_mask);
> }
Removing the blank line above is a gratuitous change that isn't needed.
Generally, a patch should avoid such changes unless the purpose of
the patch is code cleanup.
>
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 714d549..23660a8 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -21,6 +21,7 @@
> #include <linux/kernel_stat.h>
> #include <linux/clockchips.h>
> #include <linux/cpu.h>
> +#include <linux/sched/isolation.h>
> #include <linux/sched/task_stack.h>
>
> #include <linux/delay.h>
> @@ -1770,6 +1771,11 @@ static ssize_t target_cpu_store(struct vmbus_channel
> *channel,
> if (target_cpu >= nr_cpumask_bits)
> return -EINVAL;
>
> + if (!cpumask_test_cpu(target_cpu, housekeeping_cpumask(HK_TYPE_MANAGED_IRQ))) {
> + dev_err(&channel->device_obj->device,
> + "cpu (%d) is isolated, can't be assigned\n", target_cpu);
I don't think a message should be output here. The other errors in this
function don't output a message. Generally, the kernel doesn't output
a message just because a user provided bad input. Doing so makes it
too easy for a user (even a sysadmin) to cause the kernel to go wild
outputting messages.
Michael
> + return -EINVAL;
> + }
> /* No CPUs should come up or down during this. */
> cpus_read_lock();
>
> --
> 1.8.3.1
Powered by blists - more mailing lists