[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <668f819c8747104814245cd6faebdd9a@kernel.org>
Date: Fri, 15 May 2020 11:14:39 +0100
From: Marc Zyngier <maz@...nel.org>
To: John Garry <john.garry@...wei.com>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
Jason Cooper <jason@...edaemon.net>,
chenxiang <chenxiang66@...ilicon.com>,
Robin Murphy <robin.murphy@....com>, luojiaxing@...wei.com,
Ming Lei <ming.lei@...hat.com>,
Zhou Wang <wangzhou1@...ilicon.com>,
Thomas Gleixner <tglx@...utronix.de>,
Will Deacon <will@...nel.org>
Subject: Re: [PATCH v3 0/2] irqchip/gic-v3-its: Balance LPI affinity across
CPUs
Hi John,
On 2020-05-14 13:05, John Garry wrote:
>>
>> + its_inc_lpi_count(d, cpu);
>> +
>> return IRQ_SET_MASK_OK_DONE;
>> }
>>
>> Results look ok:
>> nvme.use_threaded_interrupts=1 =0*
>> Before 950K IOPs 1000K IOPs
>> After 1100K IOPs 1150K IOPs
>>
>> * as mentioned before, this is quite unstable and causes lockups.
>> JFYI, there was an attempt to fix this:
>>
>> https://lore.kernel.org/linux-nvme/20191209175622.1964-1-kbusch@kernel.org/
>>
>
> Hi Marc,
>
> Just wondering if we can try to get this series over the line?
Absolutely. Life has got in the way, so let me page it back in...
> So I tested the patches on v5.7-rc5, and get similar performance
> improvement to above.
>
> I did apply a couple of patches, below, to remedy the issues I
> experienced for my D06CS.
Comments on that below.
>
> Thanks,
> John
>
>
> ---->8
>
>
> [PATCH 1/2] irqchip/gic-v3-its: Don't double account for target CPU
> assignment
>
> In its_set_affinity(), when a managed irq is already assigned to a CPU,
> we may needlessly reassign the irq to another CPU.
>
> This is because when selecting the target CPU, being the least loaded
> CPU in the mask, we account of that irq still being assigned to a CPU;
> thereby we may unfairly select another CPU.
>
> Modify this behaviour to pre-decrement the current target CPU LPI count
> when finding the least loaded CPU.
>
> Alternatively we may be able to just bail out early when the current
> target CPU already falls within the requested mask.
>
> ---
> drivers/irqchip/irq-gic-v3-its.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c
> b/drivers/irqchip/irq-gic-v3-its.c
> index 73f5c12..2b18feb 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1636,6 +1636,8 @@ static int its_set_affinity(struct irq_data *d,
> const struct cpumask *mask_val,
> if (irqd_is_forwarded_to_vcpu(d))
> return -EINVAL;
>
> + its_dec_lpi_count(d, its_dev->event_map.col_map[id]);
> +
> if (!force)
> cpu = its_select_cpu(d, mask_val);
> else
> @@ -1646,14 +1648,14 @@ static int its_set_affinity(struct irq_data
> *d, const struct cpumask *mask_val,
>
> /* don't set the affinity when the target cpu is same as current one
> */
> if (cpu != its_dev->event_map.col_map[id]) {
> - its_inc_lpi_count(d, cpu);
> - its_dec_lpi_count(d, its_dev->event_map.col_map[id]);
> target_col = &its_dev->its->collections[cpu];
> its_send_movi(its_dev, target_col, id);
> its_dev->event_map.col_map[id] = cpu;
> irq_data_update_effective_affinity(d, cpumask_of(cpu));
> }
>
> + its_inc_lpi_count(d, cpu);
> +
I'm OK with that change, as it removes unnecessary churn.
> return IRQ_SET_MASK_OK_DONE;
> }
>
> ---
>
>
> [PATCH 2/2] irqchip/gic-v3-its: Handle no overlap of non-managed irq
> affinity mask
>
> In selecting the target CPU for a non-managed interrupt, we may select
> a
> target CPU outside the requested affinity mask.
>
> This is because there may be no overlap of the ITS node mask and the
> requested CPU affinity mask. The requested affinity mask may be coming
> from userspace or some drivers which try to set irq affinity, see [0].
>
> In this case, just ignore the ITS node cpumask. This is a deviation
> from
> what Thomas described. Having said that, I am not sure if the
> interrupt is ever bound to a node for us.
>
> [0]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/hisilicon/hisi_uncore_pmu.c#n417
>
> ---
> drivers/irqchip/irq-gic-v3-its.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c
> b/drivers/irqchip/irq-gic-v3-its.c
> index 2b18feb..12d5d4b4 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1584,10 +1584,6 @@ static int its_select_cpu(struct irq_data *d,
> cpumask_and(tmpmask, cpumask_of_node(node), aff_mask);
> cpumask_and(tmpmask, tmpmask, cpu_online_mask);
>
> - /* If that doesn't work, try the nodemask itself */
> - if (cpumask_empty(tmpmask))
> - cpumask_and(tmpmask, cpumask_of_node(node), cpu_online_mask);
> -
> cpu = cpumask_pick_least_loaded(d, tmpmask);
> if (cpu < nr_cpu_ids)
> goto out;
I'm really not sure. Shouldn't we then drop the wider search on
cpu_inline_mask, because userspace could have given us something
that we cannot deal with?
What you are advocating for is a strict adherence to the provided
mask, and it doesn't seem to be what other architectures are providing.
I consider the userspace-provided affinity as a hint more that anything
else, as in this case the kernel does know better (routing the interrupt
to a foreign node might be costly, or even impossible, see the TX1
erratum).
From what I remember of the earlier discussion, you saw an issue on
a system with two sockets and a single ITS, with the node mask limited
to the first socket. Is that correct?
I'll respin the series today and report it with you first patch
squased in.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
Powered by blists - more mailing lists