lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2af8a6cb2d2b4606abf48f0d2d0048e06f97fe51.camel@ew.tq-group.com>
Date: Thu, 26 Sep 2024 11:57:34 +0200
From: Matthias Schiffer <matthias.schiffer@...tq-group.com>
To: Marc Kleine-Budde <mkl@...gutronix.de>
Cc: Markus Schneider-Pargmann <msp@...libre.com>, Chandrasekar Ramakrishnan
 <rcsekar@...sung.com>, Vincent Mailhol <mailhol.vincent@...adoo.fr>, "David
 S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub
 Kicinski <kuba@...nel.org>,  Paolo Abeni <pabeni@...hat.com>, Martin
 Hundebøll <martin@...nix.com>, "Felipe Balbi (Intel)"
 <balbi@...nel.org>, Raymond Tan <raymond.tan@...el.com>, Jarkko Nikula
 <jarkko.nikula@...ux.intel.com>, linux-can@...r.kernel.org, 
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org, linux@...tq-group.com
Subject: Re: [PATCH v3 2/2] can: m_can: fix missed interrupts with m_can_pci

On Thu, 2024-09-26 at 11:43 +0200, Marc Kleine-Budde wrote:
> On 26.09.2024 11:19:53, Matthias Schiffer wrote:
> > On Tue, 2024-09-24 at 08:08 +0200, Markus Schneider-Pargmann wrote:
> > > 
> > > On Mon, Sep 23, 2024 at 05:32:16PM GMT, Matthias Schiffer wrote:
> > > > The interrupt line of PCI devices is interpreted as edge-triggered,
> > > > however the interrupt signal of the m_can controller integrated in Intel
> > > > Elkhart Lake CPUs appears to be generated level-triggered.
> > > > 
> > > > Consider the following sequence of events:
> > > > 
> > > > - IR register is read, interrupt X is set
> > > > - A new interrupt Y is triggered in the m_can controller
> > > > - IR register is written to acknowledge interrupt X. Y remains set in IR
> > > > 
> > > > As at no point in this sequence no interrupt flag is set in IR, the
> > > > m_can interrupt line will never become deasserted, and no edge will ever
> > > > be observed to trigger another run of the ISR. This was observed to
> > > > result in the TX queue of the EHL m_can to get stuck under high load,
> > > > because frames were queued to the hardware in m_can_start_xmit(), but
> > > > m_can_finish_tx() was never run to account for their successful
> > > > transmission.
> > > > 
> > > > To fix the issue, repeatedly read and acknowledge interrupts at the
> > > > start of the ISR until no interrupt flags are set, so the next incoming
> > > > interrupt will also result in an edge on the interrupt line.
> > > > 
> > > > Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
> > > > Signed-off-by: Matthias Schiffer <matthias.schiffer@...tq-group.com>
> > > 
> > > Just a few comment nitpicks below. Otherwise:
> > > 
> > > Reviewed-by: Markus Schneider-Pargmann <msp@...libre.com>
> > 
> > 
> > We have received a report that while this patch fixes a stuck queue issue reproducible with cangen,
> > the problem has not disappeared with our customer's application. I will hold off sending a new
> > version of the patch while we're investigating whether there is a separate issue with the same
> > symptoms or the patch is insufficient.
> > 
> > Patch 1/2 should be good to go and could be applied independently.
> 
> Can you post the reproducer here, too. So that we can add it to the
> patch or at least reference to it.
> 
> regards,
> Marc

Something like the following results in a stuck queue after a few minutes without this patch, and
ran without issue for 2.5h with the patch (with can0 and can1 of the Elkhart Lake connected to each
other):

---
ip link set can0 up type can bitrate 1000000
ip link set can1 up type can bitrate 1000000

cangen can1 -g 2 -I 100 -L 8 &
cangen can1 -g 2 -I 101 -L 8 &
cangen can1 -g 2 -I 102 -L 8 &
cangen can1 -g 2 -I 103 -L 8 &
cangen can1 -g 2 -I 104 -L 8 &
cangen can1 -g 2 -I 105 -L 8 &
cangen can1 -g 2 -I 106 -L 8 &
cangen can1 -g 2 -I 107 -L 8 &

cangen can0 -g 2 -I 000 -L 8 &
cangen can0 -g 2 -I 001 -L 8 &
cangen can0 -g 2 -I 002 -L 8 &
cangen can0 -g 2 -I 003 -L 8 &
cangen can0 -g 2 -I 004 -L 8 &
cangen can0 -g 2 -I 005 -L 8 &
cangen can0 -g 2 -I 006 -L 8 &
cangen can0 -g 2 -I 007 -L 8 &

stress-ng --matrix 0 &
---

I will add the reproducer to the commit message in v4. I'm also not sure if the stress-ng actually
has any effect, I'll verify that before the next version of the patch.

Matthias


> 

-- 
TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
Amtsgericht München, HRB 105018
Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
https://www.tq-group.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ