linux-kernel - Re: FlexCAN on i.MX28 interrupt flooding retrying send

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 07 Mar 2014 09:46:11 +0100
From:	Marc Kleine-Budde <mkl@...gutronix.de>
To:	Stanislav Meduna <stano@...una.org>, wg@...ndegger.com,
	linux-can@...r.kernel.org,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-rt-users@...r.kernel.org" <linux-rt-users@...r.kernel.org>
Subject: Re: FlexCAN on i.MX28 interrupt flooding retrying send

On 03/07/2014 09:08 AM, Stanislav Meduna wrote:
> Hi,
> 
> I am using a FlexCAN CAN controller on a Freescale i.MX28 platform [1].
> If a packet is being sent when the bus is disconnected, I am getting
> an interrupt flooed that basically kills the machine.
> 
> This is _not_ the same problem as [2] - my kernel already has
> the fix.
> 
> The first interrupt comes with ESR 0x00028652, i.e.
> 
> TXWRN_INT
> BIT1_ERR
> STF_ERR
> TX_WRN
> TXRX
> FLT_CONF error passive
> ERR_INT
> 
> The next ones come the same without the acked TXWRN_INT.
> Reading the ESR again immediately after acking gives
> 0x00000250, i.e.
> 
> TX_WRN
> TXRX
> FLT_CONF error passive
> 
> so everything ackable has actually been acked.
> 
> I think that the problem is that the FlexCAN tries to retransmit
> the frame indefinitely. Each retry senses the bus in the invalid
> state (BIT1_ERR) and immediately fires a new ERR_INT. To verify
> this I aborted the transmitted frame in the error state in the
> interrupt handler
> 
> #define FLEXCAN_ESR_ERR_TRANSMIT \
> 	(FLEXCAN_ESR_BIT1_ERR | FLEXCAN_ESR_BIT0_ERR | FLEXCAN_ESR_ACK_ERR)
> 
> if (reg_esr & FLEXCAN_ESR_ERR_TRANSMIT) {
> 
> 	/* In case of a transmission error the packet is retried and
> 	 * if the error persists, we will get another interrupt right
> 	 * away. Abort the transmission - a lost packet is better than
> 	 * an irq storm.
> 	 */
> 	if(printk_ratelimit())
> 		netdev_err(dev, "Aborted transmission, ESR %08x\n", reg_esr);
> 
> 	can_get_echo_skb(dev, 0);
> 	flexcan_write(FLEXCAN_MB_CNT_CODE(0x4),
> 		&regs->cantxfg[FLEXCAN_TX_BUF_ID].can_ctrl);
> 	netif_wake_queue(dev);
> }
> 
> and the problem disappeared as expected. However, the correct
> way is probably to retry during some reasonable (configurable?)
> time interval.
> 
> What puzzles me is that I did not found any other instance
> of this problem in the relevant mailing lists, only the original [2].
> 
> I am using the 3.4.77 kernel with the realtime patches, but the
> code in the latest mainline looks the same in this respect.
> Maybe the realtime patches change some bevaviour, but I don't
> think they affect the core problem. I am not really an expert
> in the network devices, NAPI etc - maybe in that case the error
> interrupt should be disabled and re-enabled only if the
> error condition goes away? - I don't know...

Your kernel is missing the patch:

e358784 can: flexcan: fix mx28 detection by rearanging OF match table

With this patch the CAN core properly detected as an mx28, so that bus
errors stay disabled (unless you enable them). If you need bus errors to
detect not connected CAN busses, you need another patchset berr_limit,
which is not yet mainline. I can repost it here, if you need it.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


Download attachment "signature.asc" of type "application/pgp-signature" (243 bytes)