netdev - Re: [PATCH net] net/sched: Fix mirred deadlock on device recursion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <127148e766b177a470a397d9c1615fae19934141.camel@sipsolutions.net>
Date: Fri, 07 Jun 2024 16:40:36 +0200
From: Johannes Berg <johannes@...solutions.net>
To: Victor Nogueira <victor@...atatu.com>, edumazet@...gle.com, 
	davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com
Cc: jhs@...atatu.com, jiri@...nulli.us, xiyou.wangcong@...il.com, 
	netdev@...r.kernel.org, renmingshuai@...wei.com, pctammela@...atatu.com
Subject: Re: [PATCH net] net/sched: Fix mirred deadlock on device recursion

Hi all,

I noticed today that this causes a userspace visible change in behaviour
(and a regression in some of our tests) for transmitting to a device
when it has no carrier, when noop_qdisc is assigned to it. Instead of
silently dropping the packets, -ENOBUFS will be returned if the socket
opted in to RECVERR.

The reason for this is that the static noop_qdisc:

struct Qdisc noop_qdisc = {
        .enqueue        =       noop_enqueue,
        .dequeue        =       noop_dequeue,
        .flags          =       TCQ_F_BUILTIN,
        .ops            =       &noop_qdisc_ops,
        .q.lock         =       __SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
        .dev_queue      =       &noop_netdev_queue,
        .busylock       =       __SPIN_LOCK_UNLOCKED(noop_qdisc.busylock),
        .gso_skb = {
                .next = (struct sk_buff *)&noop_qdisc.gso_skb,
                .prev = (struct sk_buff *)&noop_qdisc.gso_skb,
                .qlen = 0,
                .lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.gso_skb.lock),
        },
        .skb_bad_txq = {
                .next = (struct sk_buff *)&noop_qdisc.skb_bad_txq,
                .prev = (struct sk_buff *)&noop_qdisc.skb_bad_txq,
                .qlen = 0,
                .lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.skb_bad_txq.lock),
        },
};

doesn't have an owner set, and it's obviously not allocated via
qdisc_alloc(). Thus, it defaults to 0, so if you get to it on CPU 0 (I
was using ARCH=um which isn't even SMP) then it will just always run
into the 

> +	if (unlikely(READ_ONCE(q->owner) == smp_processor_id())) {
> +		kfree_skb_reason(skb, SKB_DROP_REASON_TC_RECLASSIFY_LOOP);
> +		return NET_XMIT_DROP;
> +	}

case.

I'm not sure I understand the busylock logic well enough, so almost
seems to me we shouldn't do this whole thing on the noop_qdisc at all,
e.g. via tagging owner with -2 to say don't do it:

--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3865,9 +3865,11 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 		qdisc_run_end(q);
 		rc = NET_XMIT_SUCCESS;
 	} else {
-		WRITE_ONCE(q->owner, smp_processor_id());
+		if (q->owner != -2)
+			WRITE_ONCE(q->owner, smp_processor_id());
 		rc = dev_qdisc_enqueue(skb, q, &to_free, txq);
-		WRITE_ONCE(q->owner, -1);
+		if (q->owner != -2)
+			WRITE_ONCE(q->owner, -1);
 		if (qdisc_run_begin(q)) {
 			if (unlikely(contended)) {
 				spin_unlock(&q->busylock);
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 2a637a17061b..e857e4638671 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -657,6 +657,7 @@ static struct netdev_queue noop_netdev_queue = {
 };
 
 struct Qdisc noop_qdisc = {
+	.owner		=	-2,
 	.enqueue	=	noop_enqueue,
 	.dequeue	=	noop_dequeue,
 	.flags		=	TCQ_F_BUILTIN,


(and yes, I believe it doesn't need to be READ_ONCE for the check
against -2 since that's mutually exclusive with all other values)

Or maybe simply ignoring the value for the noop_qdisc:

--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3822,7 +3822,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 		return rc;
 	}
 
-	if (unlikely(READ_ONCE(q->owner) == smp_processor_id())) {
+	if (unlikely(q != &noop_qdisc && READ_ONCE(q->owner) == smp_processor_id())) {
 		kfree_skb_reason(skb, SKB_DROP_REASON_TC_RECLASSIFY_LOOP);
 		return NET_XMIT_DROP;
 	}

That's shorter, but I'm not sure if there might be other special
cases...

Or maybe someone can think of an even better fix?

Thanks,
johannes