netdev - Re: [BUG] pfifo_fast may cause out-of-order CAN frame transmission

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7717e4470f6881bbc92645c72ad7f6ec71360796.camel@redhat.com>
Date:   Thu, 09 Jan 2020 13:51:03 +0100
From:   Paolo Abeni <pabeni@...hat.com>
To:     Ahmad Fatoum <a.fatoum@...gutronix.de>, linux-can@...r.kernel.org,
        netdev@...r.kernel.org
Cc:     Sascha Hauer <kernel@...gutronix.de>
Subject: Re: [BUG] pfifo_fast may cause out-of-order CAN frame transmission

On Wed, 2020-01-08 at 15:55 +0100, Ahmad Fatoum wrote:
> I've run into an issue of CAN frames being sent out-of-order on an i.MX6 Dual
> with Linux v5.5-rc5. Bisecting has lead me down to this commit:
> 
> ba27b4cdaaa ("net: dev: introduce support for sch BYPASS for lockless qdisc")
> 
> With it, using pfifo_fast, every few hundred frames, FlexCAN's .ndo_start_xmit is
> passed frames in an order different from how userspace stuffed them into the same
> socket.
> 
> Reverting it fixes the issue as does booting with maxcpus=1 or using pfifo
> instead of pfifo_fast.
> 
> According to [1], such reordering shouldn't be happening.
> 
> Details on my setup:
> Kernel version: v5.5-rc5, (occurs much more often with LOCKDEP turned on)
> CAN-Bitrate: 250 kbit/s
> CAN frames are generated with:
> cangen canX -I2 -L1 -Di -i -g0.12 -p 100
> which keeps polling after ENOBUFS until socket is writable, sends out a CAN
> frame with one incrementing payload byte and then waits 120 usec before repeating.
> 
> Please let me know if any additional info is needed.

Thank you for the report.

I think there is a possible race condition in the 'empty' flag update
schema:

CPU 0					CPU1
(running e.g. net_tx_action)		(can xmit)

qdisc_run()				__dev_xmit_skb()
pfifo_fast_dequeue				
// queue is empty, returns NULL
WRITE_ONCE(qdisc->empty, true);
					pfifo_fast_enqueue
					qdisc_run_begin() 	
					// still locked by CPU 0,
					// return false and do nothing, 
					// qdisc->empty is still true

					(next can xmit)
					// BYPASS code path
					sch_direct_xmit()
					// send pkt 2
					__qdisc_run()
					// send pkt 1

The following patch try to addresses the above, clearing 'empty' flag
in a more aggressive way. We can end-up skipping the bypass
optimization more often than strictly needed in some contended
scenarios, but I guess that is preferrable to the current issue.

The code is only build-tested, could you please try it in your setup?

Thanks,

Paolo
---
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index fceddf89592a..df460fe0773a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -158,7 +158,6 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
 	if (qdisc->flags & TCQ_F_NOLOCK) {
 		if (!spin_trylock(&qdisc->seqlock))
 			return false;
-		WRITE_ONCE(qdisc->empty, false);
 	} else if (qdisc_is_running(qdisc)) {
 		return false;
 	}
diff --git a/net/core/dev.c b/net/core/dev.c
index 0ad39c87b7fd..3c46575a5af5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3625,6 +3625,8 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 			qdisc_run_end(q);
 		} else {
 			rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
+			if (rc != NET_XMIT_DROP && READ_ONCE(q->empty))
+				WRITE_ONCE(q->empty, false);
 			qdisc_run(q);
 		}