[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86a5ew41tp.wl-maz@kernel.org>
Date: Tue, 22 Oct 2024 16:03:30 +0100
From: Marc Zyngier <maz@...nel.org>
To: Yu Zhao <yuzhao@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Catalin Marinas <catalin.marinas@....com>,
Muchun Song <muchun.song@...ux.dev>,
Thomas Gleixner <tglx@...utronix.de>,
Will Deacon <will@...nel.org>,
Douglas Anderson <dianders@...omium.org>,
Mark Rutland <mark.rutland@....com>,
Nanyong Sun <sunnanyong@...wei.com>,
linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [PATCH v1 3/6] irqchip/gic-v3: support SGI broadcast
On Mon, 21 Oct 2024 05:22:15 +0100,
Yu Zhao <yuzhao@...gle.com> wrote:
>
> GIC v3 and later support SGI broadcast, i.e., the mode that routes
> interrupts to all PEs in the system excluding the local CPU.
>
> Supporting this mode can avoid looping through all the remote CPUs
> when broadcasting SGIs, especially for systems with 200+ CPUs. The
> performance improvement can be measured with the rest of this series
> booted with "hugetlb_free_vmemmap=on irqchip.gicv3_pseudo_nmi=1":
>
> cd /sys/kernel/mm/hugepages/
> echo 600 >hugepages-1048576kB/nr_hugepages
> echo 2048kB >hugepages-1048576kB/demote_size
> perf record -g -- bash -c "echo 600 >hugepages-1048576kB/demote"
>
> gic_ipi_send_mask() bash sys time
> Before: 38.14% 0m10.513s
> After: 0.20% 0m5.132s
>
> Signed-off-by: Yu Zhao <yuzhao@...gle.com>
> ---
> drivers/irqchip/irq-gic-v3.c | 20 +++++++++++++++++++-
> 1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index ce87205e3e82..42c39385e1b9 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -1394,9 +1394,20 @@ static void gic_send_sgi(u64 cluster_id, u16 tlist, unsigned int irq)
> gic_write_sgi1r(val);
> }
>
> +static void gic_broadcast_sgi(unsigned int irq)
> +{
> + u64 val;
> +
> + val = BIT(ICC_SGI1R_IRQ_ROUTING_MODE_BIT) | (irq << ICC_SGI1R_SGI_ID_SHIFT);
As picked up by the test bot, please fix the 32bit build.
> +
> + pr_devel("CPU %d: broadcasting SGI %u\n", smp_processor_id(), irq);
> + gic_write_sgi1r(val);
> +}
> +
> static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask)
> {
> int cpu;
> + cpumask_t broadcast;
>
> if (WARN_ON(d->hwirq >= 16))
> return;
> @@ -1407,6 +1418,13 @@ static void gic_ipi_send_mask(struct irq_data *d, const struct cpumask *mask)
> */
> dsb(ishst);
>
> + cpumask_copy(&broadcast, cpu_present_mask);
Why cpu_present_mask? I'd expect that cpu_online_mask should be the
correct mask to use -- we don't IPI offline CPUs, in general.
> + cpumask_clear_cpu(smp_processor_id(), &broadcast);
> + if (cpumask_equal(&broadcast, mask)) {
> + gic_broadcast_sgi(d->hwirq);
> + goto done;
> + }
So the (valid) case where you would IPI *everyone* is not handled as a
fast path? That seems a missed opportunity.
This also seem an like expensive way to do it. How about something
like:
int mcnt = cpumask_weight(mask);
int ocnt = cpumask_weight(cpu_online_mask);
if (mcnt == ocnt) {
/* Broadcast to all CPUs including self */
} else if (mcnt == (ocnt - 1) &&
!cpumask_test_cpu(smp_processor_id(), mask)) {
/* Broadcast to all but self */
}
which avoids the copy+update_full compare.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
Powered by blists - more mailing lists