linux-kernel - Re: [PATCH v3 0/2] irqchip/gic-v3-its: Balance LPI affinity across CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <668f819c8747104814245cd6faebdd9a@kernel.org>
Date:   Fri, 15 May 2020 11:14:39 +0100
From:   Marc Zyngier <maz@...nel.org>
To:     John Garry <john.garry@...wei.com>
Cc:     linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        Jason Cooper <jason@...edaemon.net>,
        chenxiang <chenxiang66@...ilicon.com>,
        Robin Murphy <robin.murphy@....com>, luojiaxing@...wei.com,
        Ming Lei <ming.lei@...hat.com>,
        Zhou Wang <wangzhou1@...ilicon.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Will Deacon <will@...nel.org>
Subject: Re: [PATCH v3 0/2] irqchip/gic-v3-its: Balance LPI affinity across
 CPUs

Hi John,

On 2020-05-14 13:05, John Garry wrote:
>> 
>> +       its_inc_lpi_count(d, cpu);
>> +
>>          return IRQ_SET_MASK_OK_DONE;
>>   }
>> 
>> Results look ok:
>>      nvme.use_threaded_interrupts=1    =0*
>> Before    950K IOPs            1000K IOPs
>> After    1100K IOPs            1150K IOPs
>> 
>> * as mentioned before, this is quite unstable and causes lockups. 
>> JFYI, there was an attempt to fix this:
>> 
>> https://lore.kernel.org/linux-nvme/20191209175622.1964-1-kbusch@kernel.org/
>> 
> 
> Hi Marc,
> 
> Just wondering if we can try to get this series over the line?

Absolutely. Life has got in the way, so let me page it back in...

> So I tested the patches on v5.7-rc5, and get similar performance
> improvement to above.
> 
> I did apply a couple of patches, below, to remedy the issues I
> experienced for my D06CS.

Comments on that below.

> 
> Thanks,
> John
> 
> 
> ---->8
> 
> 
> [PATCH 1/2] irqchip/gic-v3-its: Don't double account for target CPU
>  assignment
> 
> In its_set_affinity(), when a managed irq is already assigned to a CPU,
> we may needlessly reassign the irq to another CPU.
> 
> This is because when selecting the target CPU, being the least loaded
> CPU in the mask, we account of that irq still being assigned to a CPU;
> thereby we may unfairly select another CPU.
> 
> Modify this behaviour to pre-decrement the current target CPU LPI count
> when finding the least loaded CPU.
> 
> Alternatively we may be able to just bail out early when the current
> target CPU already falls within the requested mask.
> 
> ---
>  drivers/irqchip/irq-gic-v3-its.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index 73f5c12..2b18feb 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1636,6 +1636,8 @@ static int its_set_affinity(struct irq_data *d,
> const struct cpumask *mask_val,
>  	if (irqd_is_forwarded_to_vcpu(d))
>  		return -EINVAL;
> 
> +	its_dec_lpi_count(d, its_dev->event_map.col_map[id]);
> +
>  	if (!force)
>  		cpu = its_select_cpu(d, mask_val);
>  	else
> @@ -1646,14 +1648,14 @@ static int its_set_affinity(struct irq_data
> *d, const struct cpumask *mask_val,
> 
>  	/* don't set the affinity when the target cpu is same as current one 
> */
>  	if (cpu != its_dev->event_map.col_map[id]) {
> -		its_inc_lpi_count(d, cpu);
> -		its_dec_lpi_count(d, its_dev->event_map.col_map[id]);
>  		target_col = &its_dev->its->collections[cpu];
>  		its_send_movi(its_dev, target_col, id);
>  		its_dev->event_map.col_map[id] = cpu;
>  		irq_data_update_effective_affinity(d, cpumask_of(cpu));
>  	}
> 
> +	its_inc_lpi_count(d, cpu);
> +

I'm OK with that change, as it removes unnecessary churn.

>  	return IRQ_SET_MASK_OK_DONE;
>  }
> 
> ---
> 
> 
> [PATCH 2/2] irqchip/gic-v3-its: Handle no overlap of non-managed irq
>  affinity mask
> 
> In selecting the target CPU for a non-managed interrupt, we may select 
> a
> target CPU outside the requested affinity mask.
> 
> This is because there may be no overlap of the ITS node mask and the
> requested CPU affinity mask. The requested affinity mask may be coming
> from userspace or some drivers which try to set irq affinity, see [0].
> 
> In this case, just ignore the ITS node cpumask. This is a deviation 
> from
> what Thomas described. Having said that, I am not sure if the
> interrupt is ever bound to a node for us.
> 
> [0] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/hisilicon/hisi_uncore_pmu.c#n417
> 
> ---
>  drivers/irqchip/irq-gic-v3-its.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index 2b18feb..12d5d4b4 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1584,10 +1584,6 @@ static int its_select_cpu(struct irq_data *d,
>  			cpumask_and(tmpmask, cpumask_of_node(node), aff_mask);
>  			cpumask_and(tmpmask, tmpmask, cpu_online_mask);
> 
> -			/* If that doesn't work, try the nodemask itself */
> -			if (cpumask_empty(tmpmask))
> -				cpumask_and(tmpmask, cpumask_of_node(node), cpu_online_mask);
> -
>  			cpu = cpumask_pick_least_loaded(d, tmpmask);
>  			if (cpu < nr_cpu_ids)
>  				goto out;

I'm really not sure. Shouldn't we then drop the wider search on
cpu_inline_mask, because userspace could have given us something
that we cannot deal with?

What you are advocating for is a strict adherence to the provided
mask, and it doesn't seem to be what other architectures are providing.
I consider the userspace-provided affinity as a hint more that anything
else, as in this case the kernel does know better (routing the interrupt
to a foreign node might be costly, or even impossible, see the TX1
erratum).

 From what I remember of the earlier discussion, you saw an issue on
a system with two sockets and a single ITS, with the node mask limited
to the first socket. Is that correct?

I'll respin the series today and report it with you first patch
squased in.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...