lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1a4ed0696cbe222e50b5abdff08a5ce7f8223aae.camel@ew.tq-group.com>
Date: Thu, 26 Sep 2024 11:19:53 +0200
From: Matthias Schiffer <matthias.schiffer@...tq-group.com>
To: Markus Schneider-Pargmann <msp@...libre.com>, Marc Kleine-Budde
	 <mkl@...gutronix.de>
Cc: Chandrasekar Ramakrishnan <rcsekar@...sung.com>, Vincent Mailhol
 <mailhol.vincent@...adoo.fr>, "David S. Miller" <davem@...emloft.net>, Eric
 Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo
 Abeni <pabeni@...hat.com>,  Martin Hundebøll
 <martin@...nix.com>, "Felipe Balbi (Intel)" <balbi@...nel.org>, Raymond Tan
 <raymond.tan@...el.com>, Jarkko Nikula <jarkko.nikula@...ux.intel.com>, 
 linux-can@...r.kernel.org, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org,  linux@...tq-group.com
Subject: Re: [PATCH v3 2/2] can: m_can: fix missed interrupts with m_can_pci

On Tue, 2024-09-24 at 08:08 +0200, Markus Schneider-Pargmann wrote:
> 
> On Mon, Sep 23, 2024 at 05:32:16PM GMT, Matthias Schiffer wrote:
> > The interrupt line of PCI devices is interpreted as edge-triggered,
> > however the interrupt signal of the m_can controller integrated in Intel
> > Elkhart Lake CPUs appears to be generated level-triggered.
> > 
> > Consider the following sequence of events:
> > 
> > - IR register is read, interrupt X is set
> > - A new interrupt Y is triggered in the m_can controller
> > - IR register is written to acknowledge interrupt X. Y remains set in IR
> > 
> > As at no point in this sequence no interrupt flag is set in IR, the
> > m_can interrupt line will never become deasserted, and no edge will ever
> > be observed to trigger another run of the ISR. This was observed to
> > result in the TX queue of the EHL m_can to get stuck under high load,
> > because frames were queued to the hardware in m_can_start_xmit(), but
> > m_can_finish_tx() was never run to account for their successful
> > transmission.
> > 
> > To fix the issue, repeatedly read and acknowledge interrupts at the
> > start of the ISR until no interrupt flags are set, so the next incoming
> > interrupt will also result in an edge on the interrupt line.
> > 
> > Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
> > Signed-off-by: Matthias Schiffer <matthias.schiffer@...tq-group.com>
> 
> Just a few comment nitpicks below. Otherwise:
> 
> Reviewed-by: Markus Schneider-Pargmann <msp@...libre.com>


We have received a report that while this patch fixes a stuck queue issue reproducible with cangen,
the problem has not disappeared with our customer's application. I will hold off sending a new
version of the patch while we're investigating whether there is a separate issue with the same
symptoms or the patch is insufficient.

Patch 1/2 should be good to go and could be applied independently.

Matthias


> 
> > ---
> > 
> > v2: introduce flag is_edge_triggered, so we can avoid the loop on !m_can_pci
> > v3:
> > - rename flag to irq_edge_triggered
> > - update comment to describe the issue more generically as one of systems with
> >   edge-triggered interrupt line. m_can_pci is mentioned as an example, as it
> >   is the only m_can variant that currently sets the irq_edge_triggered flag.
> > 
> >  drivers/net/can/m_can/m_can.c     | 22 +++++++++++++++++-----
> >  drivers/net/can/m_can/m_can.h     |  1 +
> >  drivers/net/can/m_can/m_can_pci.c |  1 +
> >  3 files changed, 19 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
> > index c85ac1b15f723..24e348f677714 100644
> > --- a/drivers/net/can/m_can/m_can.c
> > +++ b/drivers/net/can/m_can/m_can.c
> > @@ -1207,20 +1207,32 @@ static void m_can_coalescing_update(struct m_can_classdev *cdev, u32 ir)
> >  static int m_can_interrupt_handler(struct m_can_classdev *cdev)
> >  {
> >  	struct net_device *dev = cdev->net;
> > -	u32 ir;
> > +	u32 ir = 0, ir_read;
> >  	int ret;
> >  
> >  	if (pm_runtime_suspended(cdev->dev))
> >  		return IRQ_NONE;
> >  
> > -	ir = m_can_read(cdev, M_CAN_IR);
> > +	/* The m_can controller signals its interrupt status as a level, but
> > +	 * depending in the integration the CPU may interpret the signal as
>                  ^ on?
> 
> > +	 * edge-triggered (for example with m_can_pci).
> > +	 * We must observe that IR is 0 at least once to be sure that the next
> 
> As the loop has a break for non edge-triggered chips, I think you should
> include that in the comment, like 'For these edge-triggered
> integrations, we must observe...' or something similar.
> 
> Best
> Markus
> 
> > +	 * interrupt will generate an edge.
> > +	 */
> > +	while ((ir_read = m_can_read(cdev, M_CAN_IR)) != 0) {
> > +		ir |= ir_read;
> > +
> > +		/* ACK all irqs */
> > +		m_can_write(cdev, M_CAN_IR, ir);
> > +
> > +		if (!cdev->irq_edge_triggered)
> > +			break;
> > +	}
> > +
> >  	m_can_coalescing_update(cdev, ir);
> >  	if (!ir)
> >  		return IRQ_NONE;
> >  
> > -	/* ACK all irqs */
> > -	m_can_write(cdev, M_CAN_IR, ir);
> > -
> >  	if (cdev->ops->clear_interrupts)
> >  		cdev->ops->clear_interrupts(cdev);
> >  
> > diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h
> > index 92b2bd8628e6b..ef39e8e527ab6 100644
> > --- a/drivers/net/can/m_can/m_can.h
> > +++ b/drivers/net/can/m_can/m_can.h
> > @@ -99,6 +99,7 @@ struct m_can_classdev {
> >  	int pm_clock_support;
> >  	int pm_wake_source;
> >  	int is_peripheral;
> > +	bool irq_edge_triggered;
> >  
> >  	// Cached M_CAN_IE register content
> >  	u32 active_interrupts;
> > diff --git a/drivers/net/can/m_can/m_can_pci.c b/drivers/net/can/m_can/m_can_pci.c
> > index d72fe771dfc7a..9ad7419f88f83 100644
> > --- a/drivers/net/can/m_can/m_can_pci.c
> > +++ b/drivers/net/can/m_can/m_can_pci.c
> > @@ -127,6 +127,7 @@ static int m_can_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
> >  	mcan_class->pm_clock_support = 1;
> >  	mcan_class->pm_wake_source = 0;
> >  	mcan_class->can.clock.freq = id->driver_data;
> > +	mcan_class->irq_edge_triggered = true;
> >  	mcan_class->ops = &m_can_pci_ops;
> >  
> >  	pci_set_drvdata(pci, mcan_class);
> > -- 
> > TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
> > Amtsgericht München, HRB 105018
> > Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
> > https://www.tq-group.com/

-- 
TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
Amtsgericht München, HRB 105018
Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
https://www.tq-group.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ