[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230927-fantasize-refuse-7fef75242672-mkl@pengutronix.de>
Date: Wed, 27 Sep 2023 11:30:16 +0200
From: Marc Kleine-Budde <mkl@...gutronix.de>
To: Miquel Raynal <miquel.raynal@...tlin.com>
Cc: Wolfgang Grandegger <wg@...ndegger.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org,
linux-can@...r.kernel.org,
Jérémie Dautheribes <jeremie.dautheribes@...tlin.com>,
Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
sylvain.girard@...com, pascal.eberhard@...com,
stable@...r.kernel.org
Subject: Re: [PATCH net] can: sja1000: Always restart the Tx queue after an
overrun
On 22.09.2023 17:47:27, Miquel Raynal wrote:
> Upstream commit 717c6ec241b5 ("can: sja1000: Prevent overrun stalls with
> a soft reset on Renesas SoCs") fixes an issue with Renesas own SJA1000
> CAN controller reception: the Rx buffer is only 5 messages long, so when
> the bus loaded (eg. a message every 50us), overrun may easily
> happen. Upon an overrun situation, due to a possible internal crosstalk
> situation, the controller enters a frozen state which only can be
> unlocked with a soft reset (experimentally). The solution was to offload
> a call to sja1000_start() in a threaded handler. This needs to happen in
> process context as this operation requires to sleep. sja1000_start()
> basically enters "reset mode", performs a proper software reset and
> returns back into "normal mode".
>
> Since this fix was introduced, we no longer observe any stalls in
> reception. However it was sporadically observed that the transmit path
> would now freeze. Further investigation blamed the fix mentioned above,
> and especially the reset operation. Reproducing the reset in a loop
> helped identifying what could possibly go wrong. The sja1000 is a single
> Tx queue device, which leverages the netdev helpers to process one Tx
> message at a time. The logic is: the queue is stopped, the message sent
> to the transceiver, once properly transmitted the controller sets a
> status bit which triggers an interrupt, in the interrupt handler the
> transmission status is checked and the queue woken up. Unfortunately, if
> an overrun happens, we might perform the soft reset precisely between
> the transmission of the buffer to the transceiver and the advent of the
> transmission status bit. We would then stop the transmission operation
> without re-enabling the queue, leading to all further transmissions to
> be ignored.
>
> The reset interrupt can only happen while the device is "open", and
> after a reset we anyway want to resume normal operations, no matter if a
> packet to transmit got dropped in the process, so we shall wake up the
> queue. Restarting the device and waking-up the queue is exactly what
> sja1000_set_mode(CAN_MODE_START) does. In order to be consistent about
> the queue state, we must acquire a lock both in the reset handler and in
> the transmit path to ensure serialization of both operations. As the
> reset handler might still be called after the transmission of a frame to
> the transceiver but before it actually gets transmitted, we must ensure
> we don't leak the skb, so we free it (the behavior is consistent, no
> matter if there was an skb on the stack or not).
Can you make use of netif_tx_disable() and netif_wake_queue() in
sja1000_reset_interrupt() instead of the lock?
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung Nürnberg | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-9 |
Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)
Powered by blists - more mailing lists