[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZnrMFIEyr8SKLDKk@u40bc5e070a0153.ant.amazon.com>
Date: Tue, 25 Jun 2024 15:54:28 +0200
From: Roman Kagan <rkagan@...zon.de>
To: Marc Zyngier <maz@...nel.org>
CC: <linux-arm-kernel@...ts.infradead.org>, Catalin Marinas
<catalin.marinas@....com>, Will Deacon <will@...nel.org>,
<nh-open-source@...zon.com>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, Thomas Gleixner <tglx@...utronix.de>,
Jonathan Corbet <corbet@....net>
Subject: Re: [PATCH] irqchip/gicv3-its: Workaround for GIC-700 erratum 2195890
On Tue, Jun 25, 2024 at 09:45:22AM +0100, Marc Zyngier wrote:
> On Mon, 24 Jun 2024 17:55:41 +0100,
> Roman Kagan <rkagan@...zon.de> wrote:
> >
> > According to Arm CoreLink GIC-700 erratum 2195890, on GIC revisions
> > r0p0, r0p1, r1p0 under certain conditions LPIs may remain in the Pending
> > Table until one of a number of external events occurs.
>
> Please add a link to the errata document.
https://developer.arm.com/documentation/SDEN-1769194/
Will include when respinning.
> >
> > No LPIs are lost but they may not be delivered in a finite time.
> >
> > The workaround is to issue an INV using GICR_INVLPIR to an unused, in
> > range LPI ID to retrigger the search.
> >
> > Add this workaround to the quirk table. When the quirk is applicable,
> > carve out one LPI ID from the available range and run periodic work to
> > do INV to it, in order to prevent GIC from stalling.
>
> The errata document says a lot more:
>
> <quote>
> For physical LPIs the workaround is to issue an INV using GICR_INVLPIR
> to an unused, in range LPI ID to retrigger the search. This could be
> done periodically, for example, in line with a residency change, or as
> part of servicing LPIs. If using LPIs as the event, then the
> GICR_INVLPIR write could be issued after servicing every LPI.
>
> However, it only needs to be issued if:
>
> * At least 4 interrupts in the block of 32 are enabled and mapped to
> the current PE or, if easier,
>
> * At least 4 interrupts in the block of 32 are enabled and mapped to
> any PE
> </quote>
It didn't feel like worth optimizing for. I'll reconsider.
> > TT: https://t.corp.amazon.com/D82032616
>
> Gniii????
Indeed Q-/
> > Signed-off-by: Elad Rosner <eladros@...zon.com>
> > Signed-off-by: Mohamed Mediouni <mediou@...zon.com>
> > Signed-off-by: Roman Kagan <rkagan@...zon.de>
>
> Who is the author?
Joint effort aka inherited ownership. Will fix according to the
process doc.
> > +static void __maybe_unused its_quirk_gic700_2195890_work_handler(struct work_struct *work)
> > +{
> > + int cpu;
> > + void __iomem *rdbase;
> > + u64 gicr_invlpir_val;
> > +
> > + for_each_online_cpu(cpu) {
>
> The errata document doesn't say that this need to happen for *every*
> RD. Can you please clarify this?
(Digging out a year-old comms with ARM)
> > In multi-chip GIC system, does this write have to happen in each
> > chip or would a write to a single GICR trigger the search in all
> > GICDs?
> The write needs to occur for each physical PE - in other words, to
> each individual GICR that the search needs to be re-triggered for.
> > + raw_spin_lock(&gic_data_rdist_cpu(cpu)->rd_lock);
> > + gic_write_lpir(gicr_invlpir_val, rdbase + GICR_INVLPIR);
> > + raw_spin_unlock(&gic_data_rdist_cpu(cpu)->rd_lock);
>
> No synchronisation? How is that supposed to work?
>
> Also, if you need to dig into the internals of the driver, extract a
> helper from __direct_lpi_inv().
ACK
> > + }
> > +
> > + schedule_delayed_work(&its_quirk_gic700_2195890_data.work,
> > + msecs_to_jiffies(ITS_QUIRK_GIC700_2195890_PERIOD_MSEC));
>
> It would be pretty easy to detect whether an LPI was ack'ed since the
> last pass, and not issue the invalidate.
Makes sense, will look into it.
Overall, do you think this approach with a global work looping over cpus
is the right one, or we should better try and implement something
per-cpu?
> > +}
> > +
> > +static bool __maybe_unused its_enable_quirk_gic700_2195890(void *data)
> > +{
> > + struct its_node *its = data;
> > +
> > + if (its_quirk_gic700_2195890_data.lpi)
> > + return true;
> > +
> > + /*
> > + * Use one LPI INTID from the start of the LPI range for GIC prodding,
> > + * and make it unavailable for regular LPI use later.
> > + */
> > + its_quirk_gic700_2195890_data.lpi = lpi_id_base++;
> > +
> > + INIT_DELAYED_WORK(&its_quirk_gic700_2195890_data.work,
> > + its_quirk_gic700_2195890_work_handler);
> > + schedule_delayed_work(&its_quirk_gic700_2195890_data.work, 0);
> > +
> > + return true;
> > +}
>
> It is a bit odd to hook this on an ITS being probed when the ITS isn't
> really involved. Not a big deal, but a bit clumsy.
True, but the LPI allocation lives in this file so it looked easier to
wire it all up here. Where do you think it's more appropriate?
> > static const struct gic_quirk its_quirks[] = {
> > #ifdef CONFIG_CAVIUM_ERRATUM_22375
> > {
> > @@ -4822,6 +4879,17 @@ static const struct gic_quirk its_quirks[] = {
> > .property = "dma-noncoherent",
> > .init = its_set_non_coherent,
> > },
> > +#ifdef CONFIG_ARM64_ERRATUM_2195890
> > + {
> > + .desc = "ITS: GIC-700 erratum 2195890",
> > + /*
> > + * Applies to r0p0, r0p1, r1p0: iidr_var(bits 16..19) == 0 or 1
> > + */
> > + .iidr = 0x0400043b,
> > + .mask = 0xfffeffff,
> > + .init = its_enable_quirk_gic700_2195890,
>
> This catches r0p0 and r1p0, but not r0p1 (you require that bits 15:12
> are 0).
Ouch, right. Given the erratum exact wording
> Fault Status: Present in: r0p0, r0p1, r1p0 Fixed in: r2p0
I guess I should match everything below r2p0 and allow arbitrary bits
15:12 (i.e. set the third nibble in the mask to 0).
> Overall, this requires a bit of rework. Notably, this could be
> significantly relaxed to match the requirements of the published
> workaround.
Thanks for the propmpt review! Will rework and respin.
Roman.
Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
Powered by blists - more mailing lists