[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CADL8D3Zyx33t_jbUyER6tckXupS0PjPN6GxJjqNwHPAOiH6WMw@mail.gmail.com>
Date: Mon, 21 Oct 2024 10:48:32 -0400
From: Jon Cormier <jcormier@...ticallink.com>
To: Tomi Valkeinen <tomi.valkeinen@...asonboard.com>
Cc: Devarsh Thakkar <devarsht@...com>, jyri.sarha@....fi, airlied@...il.com,
maarten.lankhorst@...ux.intel.com, mripard@...nel.org, tzimmermann@...e.de,
dri-devel@...ts.freedesktop.org, simona@...ll.ch,
linux-kernel@...r.kernel.org, praneeth@...com, vigneshr@...com,
aradhya.bhatia@...ux.dev, s-jain1@...com, r-donadkar@...com, sam@...nborg.org,
bparrot@...com
Subject: Re: [PATCH 1/2] drm/tidss: Clear the interrupt status for interrupts
being disabled
Ah okay, go figure. There was an updated patch series emailed between
before I finished writing this email. So please ignore...
On Mon, Oct 21, 2024 at 10:46 AM Jon Cormier <jcormier@...ticallink.com> wrote:
>
> Adding the e2e thread that has instigated this change.
>
> https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1394222/am625-issue-about-tidss-rcu_preempt-self-detected-stall-on-cpu?pifragment-323307=1#pifragment-323307=2
>
> Summary of original problem: An AM62x device using the TIDSS driver,
> can lock up after hours of running. The lock ups are often detected
> by the rcu_preempt system. The lock ups turned out to be caused by an
> infinite interrupt loop (irq storm?) in the TIDSS_DISPC driver.
>
> The k3_clear_irqstatus function which is responsible for clearing the
> interrupt bits, only clear the the level 1 interrupts if the level 2
> ones are set. This leaves a small window where if for whatever reason
> the level 2 interrupts aren't set but the level 1's are, then we will
> never clear the level 1 interrupt.
>
> The change as submitted is not sufficient to prevent the irq storm.
> I've tested these two patches for several weeks now and they reduce
> the frequency of the irq storm from once a day to once every few days,
> but don't prevent it.
>
> I suggest that the k3_clear_irqstatus function needs to be updated
> such that it's not possible for the level 1 DISPC_IRQSTATUS bit to
> remain uncleared.
>
> The following hack proposed by Bin and team removes the possibility of
> the irq storm happening, while introducing a small chance of clearing
> interrupts that weren't intended. Though I would assume that if the
> level 2 interrupts aren't cleared, they would reassert the level 1
> DISPC_IRQSTATUS so maybe that's not much of a risk. Most other
> drivers when clearing interrupts do a read and then write to clear
> interrupts so there is precedence.
>
> diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c
> b/drivers/gpu/drm/tidss/tidss_dispc.c
> index 60f69be36692..0b8a3d999c54 100644
> --- a/drivers/gpu/drm/tidss/tidss_dispc.c
> +++ b/drivers/gpu/drm/tidss/tidss_dispc.c
> @@ -900,27 +900,27 @@ static
> void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t
> clearmask)
> {
> unsigned int i;
> - u32 top_clear = 0;
>
> for (i = 0; i < dispc->feat->num_vps; ++i) {
> if (clearmask & DSS_IRQ_VP_MASK(i)) {
> dispc_k3_vp_write_irqstatus(dispc, i, clearmask);
> - top_clear |= BIT(i);
> }
> }
> +
> for (i = 0; i < dispc->feat->num_planes; ++i) {
> if (clearmask & DSS_IRQ_PLANE_MASK(i)) {
> dispc_k3_vid_write_irqstatus(dispc, i, clearmask);
> - top_clear |= BIT(4 + i);
> }
> }
> +
> if (dispc->feat->subrev == DISPC_K2G)
> return;
>
> - dispc_write(dispc, DISPC_IRQSTATUS, top_clear);
> -
> - /* Flush posted writes */
> - dispc_read(dispc, DISPC_IRQSTATUS);
> + /* Always clear the level 1 irqstatus (DISPC_IRQSTATUS) unconditionally
> Note I'm not sure we are confident in the reasoning outlined in this comment
> + * due to an IP bug where level 1 irq status (DISPC_IRQSTATUS)
> would get set delayed even
> + * after level 2 interrupt (DISPC_VID_IRQSTATUS,
> DISPC_VP_IRQSTATUS) is cleared.
> + */
> + dispc_write(dispc, DISPC_IRQSTATUS, dispc_read(dispc, DISPC_IRQSTATUS));
>
> I had proposed a more complete version of this patch but there hasn't
> been much discussion about it and I've mostly tested Bins version.
>
> }
>
>
> On Mon, Oct 21, 2024 at 7:15 AM Tomi Valkeinen
> <tomi.valkeinen@...asonboard.com> wrote:
> >
> > Hi,
> >
> > On 12/10/2024 18:07, Devarsh Thakkar wrote:
> > > It is possible that dispc_{k2g/k3}_set_irqenable can be called for
> > > disabling some interrupt events which were previously enabled. However
> > > instead of clearing any pending events for the interrupt events that are
> > > required to be disabled, it was instead clearing the new interrupt events
> > > which were not even enabled.
> >
> > That's on purpose. When we enable a new interrupt, we want to first
> > clear the irqstatus for that interrupt to make sure there's no old
> > status left lying around. If I'm not mistaken, enabling an interrupt
> > with an irqstatus bit set will immediately trigger the interrupt.
> >
> > > For e.g. While disabling the vsync events, dispc_k3_set_irqenable tries to
> > > clear DSS_IRQ_DEVICE_OCP_ERR which was not enabled per the old_mask at all
> > > as shown below :
> > >
> > > "dispc_k3_set_irqenable : irqenabled - mask = 91, old = f0, clr = 1" where
> > > clr = (mask ^ old_mask) & old_mask
> >
> > That's a bit odd... Why was the DSS_IRQ_DEVICE_OCP_ERR not already
> > enabled? It is enabled in the tidss_irq_install().
> >
> > Or maybe it had been enabled by the driver, but as the HW doesn't
> > support that bit, it reads always as 0. I have an unsent patch to drop
> > DSS_IRQ_DEVICE_OCP_ERR.
> >
> > > This corrects the bit mask to make sure that it always clears any pending
> > > interrupt events that are requested to be disabled before disabling them
> > > actually.
> >
> > I think the point here makes sense: if we disable interrupts with
> > dispc_set_irqenable(), we don't want to see interrupt handling for the
> > disabled interrupts after the call.
> >
> > However, if you clear the irqstatus for an interrupt that will be
> > disabled, but clear it _before_ disabling the interrupt, the interrupt
> > might trigger right after clearing the irqstatus but before disabling it.
> >
> > Tomi
> >
> > > Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem")
> > > Reported-by: Jonathan Cormier <jcormier@...ticallink.com>
> > > Signed-off-by: Devarsh Thakkar <devarsht@...com>
> > > ---
> > > drivers/gpu/drm/tidss/tidss_dispc.c | 8 ++++----
> > > 1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
> > > index 1ad711f8d2a8..b04419b24863 100644
> > > --- a/drivers/gpu/drm/tidss/tidss_dispc.c
> > > +++ b/drivers/gpu/drm/tidss/tidss_dispc.c
> > > @@ -700,8 +700,8 @@ void dispc_k2g_set_irqenable(struct dispc_device *dispc, dispc_irq_t mask)
> > > {
> > > dispc_irq_t old_mask = dispc_k2g_read_irqenable(dispc);
> > >
> > > - /* clear the irqstatus for newly enabled irqs */
> > > - dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & mask);
> > > + /* clear the irqstatus for irqs that are being disabled now */
> > > + dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & old_mask);
> > >
> > > dispc_k2g_vp_set_irqenable(dispc, 0, mask);
> > > dispc_k2g_vid_set_irqenable(dispc, 0, mask);
> > > @@ -843,8 +843,8 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc,
> > >
> > > old_mask = dispc_k3_read_irqenable(dispc);
> > >
> > > - /* clear the irqstatus for newly enabled irqs */
> > > - dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask);
> > > + /* clear the irqstatus for irqs that are being disabled now */
> > > + dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & old_mask);
> > >
> > > for (i = 0; i < dispc->feat->num_vps; ++i) {
> > > dispc_k3_vp_set_irqenable(dispc, i, mask);
> >
>
>
> --
> Jonathan Cormier
> Software Engineer
>
> Voice: 315.425.4045 x222
>
>
>
> http://www.CriticalLink.com
> 6712 Brooklawn Parkway, Syracuse, NY 13211
--
Jonathan Cormier
Software Engineer
Voice: 315.425.4045 x222
http://www.CriticalLink.com
6712 Brooklawn Parkway, Syracuse, NY 13211
Powered by blists - more mailing lists