netdev - RE: rawip: delayed and mis-sequenced transmits

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:   Thu, 7 Jul 2022 09:34:36 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Jakub Kicinski' <kuba@...nel.org>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Vladimir Oltean <vladimir.oltean@....com>,
        "'linyunsheng@...wei.com'" <linyunsheng@...wei.com>
Subject: RE: rawip: delayed and mis-sequenced transmits

From: Jakub Kicinski
> Sent: 07 July 2022 02:54
> 
> On Wed, 6 Jul 2022 15:54:18 +0000 David Laight wrote:
> > Anyone any ideas before I start digging through the kernel code?
> 
> If the qdisc is pfifo_fast and kernel is old there could be races.
> But I don't think that's likely given you probably run something
> recent and next packet tx would usually flush the stuck packet.
> In any case - switching qdisc could be a useful test, also bpftrace
> is your friend for catching patckets with long sojourn time.

Reading the sources I think I've found something:
In core/dev.c line 3818 there is:

static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
				 struct net_device *dev,
				 struct netdev_queue *txq)
{
	spinlock_t *root_lock = qdisc_lock(q);
	struct sk_buff *to_free = NULL;
	bool contended;
	int rc;

	qdisc_calculate_pkt_len(skb, q);

	if (q->flags & TCQ_F_NOLOCK) {
		if (q->flags & TCQ_F_CAN_BYPASS && nolock_qdisc_is_empty(q) &&
		    qdisc_run_begin(q)) {
			/* Retest nolock_qdisc_is_empty() within the protection
			 * of q->seqlock to protect from racing with requeuing.
			 */
			if (unlikely(!nolock_qdisc_is_empty(q))) {
				rc = dev_qdisc_enqueue(skb, q, &to_free, txq);
				__qdisc_run(q);
				qdisc_run_end(q);

				goto no_lock_out;
			}

I think I'm getting into the code below with a packet queued.
Unlike the code above this sends the current packet before the
queued one - which is exactly what I'm seeing.
Which must mean that the global flags are out of sync with
the per-cpu flags and a transmit from the cpu that queued
the packet is needed to unblock things.

This seems to have been added by c4fef01ba4793

			qdisc_bstats_cpu_update(q, skb);
			if (sch_direct_xmit(skb, q, dev, txq, NULL, true) &&
			    !nolock_qdisc_is_empty(q))
				__qdisc_run(q);

			qdisc_run_end(q);
			return NET_XMIT_SUCCESS;
		}

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)