lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cc14312b391c17443a04129ae7871ae6aba43c20.camel@ew.tq-group.com>
Date: Mon, 23 Sep 2024 11:32:49 +0200
From: Matthias Schiffer <matthias.schiffer@...tq-group.com>
To: Markus Schneider-Pargmann <msp@...libre.com>
Cc: Chandrasekar Ramakrishnan <rcsekar@...sung.com>, Marc Kleine-Budde
 <mkl@...gutronix.de>, Vincent Mailhol <mailhol.vincent@...adoo.fr>, "David
 S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub
 Kicinski <kuba@...nel.org>,  Paolo Abeni <pabeni@...hat.com>, Martin
 Hundebøll <martin@...nix.com>, "Felipe Balbi (Intel)"
 <balbi@...nel.org>, Raymond Tan <raymond.tan@...el.com>, Jarkko Nikula
 <jarkko.nikula@...ux.intel.com>, linux-can@...r.kernel.org, 
 netdev@...r.kernel.org, linux-kernel@...r.kernel.org, linux@...tq-group.com
Subject: Re: [PATCH v2 2/2] can: m_can: fix missed interrupts with m_can_pci

On Mon, 2024-09-23 at 10:03 +0200, Markus Schneider-Pargmann wrote:
> Hi Matthias,
> 
> On Thu, Sep 19, 2024 at 01:27:28PM GMT, Matthias Schiffer wrote:
> > The interrupt line of PCI devices is interpreted as edge-triggered,
> > however the interrupt signal of the m_can controller integrated in Intel
> 
> I have a similar patch though for a different setup (I didn't send it
> yet). I have a tcan chip wired to a pin that is only capable of edge
> interrupts.

Should I also change the Fixes tag to something else then?

> 
> > Elkhart Lake CPUs appears to be generated level-triggered.
> > 
> > Consider the following sequence of events:
> > 
> > - IR register is read, interrupt X is set
> > - A new interrupt Y is triggered in the m_can controller
> > - IR register is written to acknowledge interrupt X. Y remains set in IR
> > 
> > As at no point in this sequence no interrupt flag is set in IR, the
> > m_can interrupt line will never become deasserted, and no edge will ever
> > be observed to trigger another run of the ISR. This was observed to
> > result in the TX queue of the EHL m_can to get stuck under high load,
> > because frames were queued to the hardware in m_can_start_xmit(), but
> > m_can_finish_tx() was never run to account for their successful
> > transmission.
> > 
> > To fix the issue, repeatedly read and acknowledge interrupts at the
> > start of the ISR until no interrupt flags are set, so the next incoming
> > interrupt will also result in an edge on the interrupt line.
> > 
> > Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
> > Signed-off-by: Matthias Schiffer <matthias.schiffer@...tq-group.com>
> > ---
> > 
> > v2: introduce flag is_edge_triggered, so we can avoid the loop on !m_can_pci
> > 
> >  drivers/net/can/m_can/m_can.c     | 21 ++++++++++++++++-----
> >  drivers/net/can/m_can/m_can.h     |  1 +
> >  drivers/net/can/m_can/m_can_pci.c |  1 +
> >  3 files changed, 18 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
> > index 47481afb9add3..2e182c3c98fed 100644
> > --- a/drivers/net/can/m_can/m_can.c
> > +++ b/drivers/net/can/m_can/m_can.c
> > @@ -1207,20 +1207,31 @@ static void m_can_coalescing_update(struct m_can_classdev *cdev, u32 ir)
> >  static int m_can_interrupt_handler(struct m_can_classdev *cdev)
> >  {
> >  	struct net_device *dev = cdev->net;
> > -	u32 ir;
> > +	u32 ir = 0, ir_read;
> >  	int ret;
> >  
> >  	if (pm_runtime_suspended(cdev->dev))
> >  		return IRQ_NONE;
> >  
> > -	ir = m_can_read(cdev, M_CAN_IR);
> > +	/* For m_can_pci, the interrupt line is interpreted as edge-triggered,
> > +	 * but the m_can controller generates them as level-triggered. We must
> > +	 * observe that IR is 0 at least once to be sure that the next
> > +	 * interrupt will generate an edge.
> > +	 */
> 
> Could you please remove this hardware specific comment? As mentioned
> above this will be independent of any specific hardware.

Ok.


> 
> > +	while ((ir_read = m_can_read(cdev, M_CAN_IR)) != 0) {
> > +		ir |= ir_read;
> > +
> > +		/* ACK all irqs */
> > +		m_can_write(cdev, M_CAN_IR, ir);
> > +
> > +		if (!cdev->is_edge_triggered)
> > +			break;
> > +	}
> > +
> >  	m_can_coalescing_update(cdev, ir);
> >  	if (!ir)
> >  		return IRQ_NONE;
> >  
> > -	/* ACK all irqs */
> > -	m_can_write(cdev, M_CAN_IR, ir);
> > -
> >  	if (cdev->ops->clear_interrupts)
> >  		cdev->ops->clear_interrupts(cdev);
> >  
> > diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h
> > index 92b2bd8628e6b..8c17eb94d2f98 100644
> > --- a/drivers/net/can/m_can/m_can.h
> > +++ b/drivers/net/can/m_can/m_can.h
> > @@ -99,6 +99,7 @@ struct m_can_classdev {
> >  	int pm_clock_support;
> >  	int pm_wake_source;
> >  	int is_peripheral;
> > +	bool is_edge_triggered;
> 
> To avoid confusion could you rename it to irq_edge_triggered or
> something similar, to make clear that it is not about the chip but the
> way the interrupt line is connected?

Will do.

> 
> Also I am not sure it is possible, but could you use
> irq_get_trigger_type() to see if it is a level or edge based interrupt?
> Then we wouldn't need this additional parameter at all and could just
> detect it in m_can.c.

Unfortunately that doesn't seem to work. irq_get_trigger_type() only returns a meaningful value
after the IRQ has been requested. I thought about requesting the IRQ with IRQF_NO_AUTOEN and then
filling in the irq_edge_triggered field before enabling the IRQ, but IRQF_NO_AUTOEN is incompatible
with IRQF_SHARED.

Of course there are ways around this - checking irq_get_trigger_type() from the ISR itself, or
adding more locking, but neither seems quite worthwhile to me. Would you agree with this?

Maybe there is some other way to find out the trigger type that would be set when the IRQ is
requested? I don't know what that would be however - so I'd just keep setting the flag statically
for m_can_pci and leave a dynamic solution for future improvement.

Matthias



> 
> Best
> Markus
> 
> >  
> >  	// Cached M_CAN_IE register content
> >  	u32 active_interrupts;
> > diff --git a/drivers/net/can/m_can/m_can_pci.c b/drivers/net/can/m_can/m_can_pci.c
> > index d72fe771dfc7a..f98527981402a 100644
> > --- a/drivers/net/can/m_can/m_can_pci.c
> > +++ b/drivers/net/can/m_can/m_can_pci.c
> > @@ -127,6 +127,7 @@ static int m_can_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
> >  	mcan_class->pm_clock_support = 1;
> >  	mcan_class->pm_wake_source = 0;
> >  	mcan_class->can.clock.freq = id->driver_data;
> > +	mcan_class->is_edge_triggered = true;
> >  	mcan_class->ops = &m_can_pci_ops;
> >  
> >  	pci_set_drvdata(pci, mcan_class);
> > -- 
> > TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
> > Amtsgericht München, HRB 105018
> > Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
> > https://www.tq-group.com/
> > 

-- 
TQ-Systems GmbH | Mühlstraße 2, Gut Delling | 82229 Seefeld, Germany
Amtsgericht München, HRB 105018
Geschäftsführer: Detlef Schneider, Rüdiger Stahl, Stefan Schneider
https://www.tq-group.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ