[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <127148e766b177a470a397d9c1615fae19934141.camel@sipsolutions.net>
Date: Fri, 07 Jun 2024 16:40:36 +0200
From: Johannes Berg <johannes@...solutions.net>
To: Victor Nogueira <victor@...atatu.com>, edumazet@...gle.com,
davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com
Cc: jhs@...atatu.com, jiri@...nulli.us, xiyou.wangcong@...il.com,
netdev@...r.kernel.org, renmingshuai@...wei.com, pctammela@...atatu.com
Subject: Re: [PATCH net] net/sched: Fix mirred deadlock on device recursion
Hi all,
I noticed today that this causes a userspace visible change in behaviour
(and a regression in some of our tests) for transmitting to a device
when it has no carrier, when noop_qdisc is assigned to it. Instead of
silently dropping the packets, -ENOBUFS will be returned if the socket
opted in to RECVERR.
The reason for this is that the static noop_qdisc:
struct Qdisc noop_qdisc = {
.enqueue = noop_enqueue,
.dequeue = noop_dequeue,
.flags = TCQ_F_BUILTIN,
.ops = &noop_qdisc_ops,
.q.lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
.dev_queue = &noop_netdev_queue,
.busylock = __SPIN_LOCK_UNLOCKED(noop_qdisc.busylock),
.gso_skb = {
.next = (struct sk_buff *)&noop_qdisc.gso_skb,
.prev = (struct sk_buff *)&noop_qdisc.gso_skb,
.qlen = 0,
.lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.gso_skb.lock),
},
.skb_bad_txq = {
.next = (struct sk_buff *)&noop_qdisc.skb_bad_txq,
.prev = (struct sk_buff *)&noop_qdisc.skb_bad_txq,
.qlen = 0,
.lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.skb_bad_txq.lock),
},
};
doesn't have an owner set, and it's obviously not allocated via
qdisc_alloc(). Thus, it defaults to 0, so if you get to it on CPU 0 (I
was using ARCH=um which isn't even SMP) then it will just always run
into the
> + if (unlikely(READ_ONCE(q->owner) == smp_processor_id())) {
> + kfree_skb_reason(skb, SKB_DROP_REASON_TC_RECLASSIFY_LOOP);
> + return NET_XMIT_DROP;
> + }
case.
I'm not sure I understand the busylock logic well enough, so almost
seems to me we shouldn't do this whole thing on the noop_qdisc at all,
e.g. via tagging owner with -2 to say don't do it:
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3865,9 +3865,11 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
qdisc_run_end(q);
rc = NET_XMIT_SUCCESS;
} else {
- WRITE_ONCE(q->owner, smp_processor_id());
+ if (q->owner != -2)
+ WRITE_ONCE(q->owner, smp_processor_id());
rc = dev_qdisc_enqueue(skb, q, &to_free, txq);
- WRITE_ONCE(q->owner, -1);
+ if (q->owner != -2)
+ WRITE_ONCE(q->owner, -1);
if (qdisc_run_begin(q)) {
if (unlikely(contended)) {
spin_unlock(&q->busylock);
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 2a637a17061b..e857e4638671 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -657,6 +657,7 @@ static struct netdev_queue noop_netdev_queue = {
};
struct Qdisc noop_qdisc = {
+ .owner = -2,
.enqueue = noop_enqueue,
.dequeue = noop_dequeue,
.flags = TCQ_F_BUILTIN,
(and yes, I believe it doesn't need to be READ_ONCE for the check
against -2 since that's mutually exclusive with all other values)
Or maybe simply ignoring the value for the noop_qdisc:
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3822,7 +3822,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
return rc;
}
- if (unlikely(READ_ONCE(q->owner) == smp_processor_id())) {
+ if (unlikely(q != &noop_qdisc && READ_ONCE(q->owner) == smp_processor_id())) {
kfree_skb_reason(skb, SKB_DROP_REASON_TC_RECLASSIFY_LOOP);
return NET_XMIT_DROP;
}
That's shorter, but I'm not sure if there might be other special
cases...
Or maybe someone can think of an even better fix?
Thanks,
johannes
Powered by blists - more mailing lists