[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2af8a6cb2d2b4606abf48f0d2d0048e06f97fe51.camel@ew.tq-group.com>
Date: Thu, 26 Sep 2024 11:57:34 +0200
From: Matthias Schiffer <matthias.schiffer@...tq-group.com>
To: Marc Kleine-Budde <mkl@...gutronix.de>
Cc: Markus Schneider-Pargmann <msp@...libre.com>, Chandrasekar Ramakrishnan
<rcsekar@...sung.com>, Vincent Mailhol <mailhol.vincent@...adoo.fr>, "David
S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub
Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Martin
Hundebøll <martin@...nix.com>, "Felipe Balbi (Intel)"
<balbi@...nel.org>, Raymond Tan <raymond.tan@...el.com>, Jarkko Nikula
<jarkko.nikula@...ux.intel.com>, linux-can@...r.kernel.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org, linux@...tq-group.com
Subject: Re: [PATCH v3 2/2] can: m_can: fix missed interrupts with m_can_pci
On Thu, 2024-09-26 at 11:43 +0200, Marc Kleine-Budde wrote:
> On 26.09.2024 11:19:53, Matthias Schiffer wrote:
> > On Tue, 2024-09-24 at 08:08 +0200, Markus Schneider-Pargmann wrote:
> > >
> > > On Mon, Sep 23, 2024 at 05:32:16PM GMT, Matthias Schiffer wrote:
> > > > The interrupt line of PCI devices is interpreted as edge-triggered,
> > > > however the interrupt signal of the m_can controller integrated in Intel
> > > > Elkhart Lake CPUs appears to be generated level-triggered.
> > > >
> > > > Consider the following sequence of events:
> > > >
> > > > - IR register is read, interrupt X is set
> > > > - A new interrupt Y is triggered in the m_can controller
> > > > - IR register is written to acknowledge interrupt X. Y remains set in IR
> > > >
> > > > As at no point in this sequence no interrupt flag is set in IR, the
> > > > m_can interrupt line will never become deasserted, and no edge will ever
> > > > be observed to trigger another run of the ISR. This was observed to
> > > > result in the TX queue of the EHL m_can to get stuck under high load,
> > > > because frames were queued to the hardware in m_can_start_xmit(), but
> > > > m_can_finish_tx() was never run to account for their successful
> > > > transmission.
> > > >
> > > > To fix the issue, repeatedly read and acknowledge interrupts at the
> > > > start of the ISR until no interrupt flags are set, so the next incoming
> > > > interrupt will also result in an edge on the interrupt line.
> > > >
> > > > Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
> > > > Signed-off-by: Matthias Schiffer <matthias.schiffer@...tq-group.com>
> > >
> > > Just a few comment nitpicks below. Otherwise:
> > >
> > > Reviewed-by: Markus Schneider-Pargmann <msp@...libre.com>
> >
> >
> > We have received a report that while this patch fixes a stuck queue issue reproducible with cangen,
> > the problem has not disappeared with our customer's application. I will hold off sending a new
> > version of the patch while we're investigating whether there is a separate issue with the same
> > symptoms or the patch is insufficient.
> >
> > Patch 1/2 should be good to go and could be applied independently.
>
> Can you post the reproducer here, too. So that we can add it to the
> patch or at least reference to it.
>
> regards,
> Marc
Something like the following results in a stuck queue after a few minutes without this patch, and
ran without issue for 2.5h with the patch (with can0 and can1 of the Elkhart Lake connected to each
other):
---
ip link set can0 up type can bitrate 1000000
ip link set can1 up type can bitrate 1000000
cangen can1 -g 2 -I 100 -L 8 &
cangen can1 -g 2 -I 101 -L 8 &
cangen can1 -g 2 -I 102 -L 8 &
cangen can1 -g 2 -I 103 -L 8 &
cangen can1 -g 2 -I 104 -L 8 &
cangen can1 -g 2 -I 105 -L 8 &
cangen can1 -g 2 -I 106 -L 8 &
cangen can1 -g 2 -I 107 -L 8 &
cangen can0 -g 2 -I 000 -L 8 &
cangen can0 -g 2 -I 001 -L 8 &
cangen can0 -g 2 -I 002 -L 8 &
cangen can0 -g 2 -I 003 -L 8 &
cangen can0 -g 2 -I 004 -L 8 &
cangen can0 -g 2 -I 005 -L 8 &
cangen can0 -g 2 -I 006 -L 8 &
cangen can0 -g 2 -I 007 -L 8 &
stress-ng --matrix 0 &
---
I will add the reproducer to the commit message in v4. I'm also not sure if the stress-ng actually
has any effect, I'll verify that before the next version of the patch.
Matthias
>
--
TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
Amtsgericht München, HRB 105018
Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
https://www.tq-group.com/
Powered by blists - more mailing lists