[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240919-colorful-gorilla-of-defense-3c2da3-mkl@pengutronix.de>
Date: Thu, 19 Sep 2024 11:17:04 +0200
From: Marc Kleine-Budde <mkl@...gutronix.de>
To: Matthias Schiffer <matthias.schiffer@...tq-group.com>
Cc: Chandrasekar Ramakrishnan <rcsekar@...sung.com>,
Vincent Mailhol <mailhol.vincent@...adoo.fr>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Martin Hundebøll <martin@...nix.com>,
Markus Schneider-Pargmann <msp@...libre.com>, "Felipe Balbi (Intel)" <balbi@...nel.org>,
Raymond Tan <raymond.tan@...el.com>, Jarkko Nikula <jarkko.nikula@...ux.intel.com>,
linux-can@...r.kernel.org, netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
linux@...tq-group.com
Subject: Re: [PATCH 2/2] can: m_can: fix missed interrupts with m_can_pci
On 19.09.2024 10:58:46, Matthias Schiffer wrote:
> On Thu, 2024-09-19 at 10:47 +0200, Marc Kleine-Budde wrote:
> > On 18.09.2024 16:21:54, Matthias Schiffer wrote:
> > > The interrupt line of PCI devices is interpreted as edge-triggered,
> > > however the interrupt signal of the m_can controller integrated in Intel
> > > Elkhart Lake CPUs appears to be generated level-triggered.
> > >
> > > Consider the following sequence of events:
> > >
> > > - IR register is read, interrupt X is set
> > > - A new interrupt Y is triggered in the m_can controller
> > > - IR register is written to acknowledge interrupt X. Y remains set in IR
> > >
> > > As at no point in this sequence no interrupt flag is set in IR, the
> > > m_can interrupt line will never become deasserted, and no edge will ever
> > > be observed to trigger another run of the ISR. This was observed to
> > > result in the TX queue of the EHL m_can to get stuck under high load,
> > > because frames were queued to the hardware in m_can_start_xmit(), but
> > > m_can_finish_tx() was never run to account for their successful
> > > transmission.
> > >
> > > To fix the issue, repeatedly read and acknowledge interrupts at the
> > > start of the ISR until no interrupt flags are set, so the next incoming
> > > interrupt will also result in an edge on the interrupt line.
> > >
> > > Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
> > > Signed-off-by: Matthias Schiffer <matthias.schiffer@...tq-group.com>
> > > ---
> > > drivers/net/can/m_can/m_can.c | 18 +++++++++++++-----
> > > 1 file changed, 13 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
> > > index 47481afb9add3..363732517c3c5 100644
> > > --- a/drivers/net/can/m_can/m_can.c
> > > +++ b/drivers/net/can/m_can/m_can.c
> > > @@ -1207,20 +1207,28 @@ static void m_can_coalescing_update(struct m_can_classdev *cdev, u32 ir)
> > > static int m_can_interrupt_handler(struct m_can_classdev *cdev)
> > > {
> > > struct net_device *dev = cdev->net;
> > > - u32 ir;
> > > + u32 ir = 0, ir_read;
> > > int ret;
> > >
> > > if (pm_runtime_suspended(cdev->dev))
> > > return IRQ_NONE;
> > >
> > > - ir = m_can_read(cdev, M_CAN_IR);
> > > + /* For m_can_pci, the interrupt line is interpreted as edge-triggered,
> > > + * but the m_can controller generates them as level-triggered. We must
> > > + * observe that IR is 0 at least once to be sure that the next
> > > + * interrupt will generate an edge.
> > > + */
> > > + while ((ir_read = m_can_read(cdev, M_CAN_IR)) != 0) {
> > > + ir |= ir_read;
> > > +
> > > + /* ACK all irqs */
> > > + m_can_write(cdev, M_CAN_IR, ir);
> > > + }
> >
> > This probably causes a measurable overhead on peripheral devices, think
> > about limiting this to !peripheral devices or introduce a new quirk that
> > is only set for the PCI devices.
> I did consider introducing a flag like that, but is the overhead
> really significant? In the regular case (where no new interrupt comes
> in between reading, writing and re-reading IR), the only added
> overhead is one additional register read. On m_can_pci, I've seen the
> race condition that causes a second loop iteration to be taken only
> once in several 100k frames on avarage.
A register read via SPI is quite costly compared to mmio. And Marcus has
optimized the peripheral case quite good, and I don't want any
performance regressions.
> Or are register reads and writes that much slower on peripheral
> devices that it is more likely to receive a new interrupt inbetween?
> If that is the case, it would indeed make sense to limit this to
> instances with edge-triggered IRQ.
The mcp251xfd driver actually loops [1] the whole handling until there
are no IRQ pending:
https://elixir.bootlin.com/linux/v6.11/source/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c#L1466
But the m_can driver doesn't.
[1] I don't have measurements how often the driver actually loops.
regards,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung Nürnberg | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-9 |
Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)
Powered by blists - more mailing lists