lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 7 Jun 2024 16:54:10 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Johannes Berg <johannes@...solutions.net>
Cc: Victor Nogueira <victor@...atatu.com>, davem@...emloft.net, kuba@...nel.org, 
	pabeni@...hat.com, jhs@...atatu.com, jiri@...nulli.us, 
	xiyou.wangcong@...il.com, netdev@...r.kernel.org, renmingshuai@...wei.com, 
	pctammela@...atatu.com
Subject: Re: [PATCH net] net/sched: Fix mirred deadlock on device recursion

On Fri, Jun 7, 2024 at 4:40 PM Johannes Berg <johannes@...solutions.net> wrote:
>
> Hi all,
>
> I noticed today that this causes a userspace visible change in behaviour
> (and a regression in some of our tests) for transmitting to a device
> when it has no carrier, when noop_qdisc is assigned to it. Instead of
> silently dropping the packets, -ENOBUFS will be returned if the socket
> opted in to RECVERR.
>
> The reason for this is that the static noop_qdisc:
>
> struct Qdisc noop_qdisc = {
>         .enqueue        =       noop_enqueue,
>         .dequeue        =       noop_dequeue,
>         .flags          =       TCQ_F_BUILTIN,
>         .ops            =       &noop_qdisc_ops,
>         .q.lock         =       __SPIN_LOCK_UNLOCKED(noop_qdisc.q.lock),
>         .dev_queue      =       &noop_netdev_queue,
>         .busylock       =       __SPIN_LOCK_UNLOCKED(noop_qdisc.busylock),
>         .gso_skb = {
>                 .next = (struct sk_buff *)&noop_qdisc.gso_skb,
>                 .prev = (struct sk_buff *)&noop_qdisc.gso_skb,
>                 .qlen = 0,
>                 .lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.gso_skb.lock),
>         },
>         .skb_bad_txq = {
>                 .next = (struct sk_buff *)&noop_qdisc.skb_bad_txq,
>                 .prev = (struct sk_buff *)&noop_qdisc.skb_bad_txq,
>                 .qlen = 0,
>                 .lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.skb_bad_txq.lock),
>         },
> };
>
> doesn't have an owner set, and it's obviously not allocated via
> qdisc_alloc(). Thus, it defaults to 0, so if you get to it on CPU 0 (I
> was using ARCH=um which isn't even SMP) then it will just always run
> into the
>
> > +     if (unlikely(READ_ONCE(q->owner) == smp_processor_id())) {
> > +             kfree_skb_reason(skb, SKB_DROP_REASON_TC_RECLASSIFY_LOOP);
> > +             return NET_XMIT_DROP;
> > +     }
>
> case.
>
> I'm not sure I understand the busylock logic well enough, so almost
> seems to me we shouldn't do this whole thing on the noop_qdisc at all,
> e.g. via tagging owner with -2 to say don't do it:
>
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3865,9 +3865,11 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>                 qdisc_run_end(q);
>                 rc = NET_XMIT_SUCCESS;
>         } else {
> -               WRITE_ONCE(q->owner, smp_processor_id());
> +               if (q->owner != -2)
> +                       WRITE_ONCE(q->owner, smp_processor_id());
>                 rc = dev_qdisc_enqueue(skb, q, &to_free, txq);
> -               WRITE_ONCE(q->owner, -1);
> +               if (q->owner != -2)
> +                       WRITE_ONCE(q->owner, -1);
>                 if (qdisc_run_begin(q)) {
>                         if (unlikely(contended)) {
>                                 spin_unlock(&q->busylock);
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 2a637a17061b..e857e4638671 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -657,6 +657,7 @@ static struct netdev_queue noop_netdev_queue = {
>  };
>
>  struct Qdisc noop_qdisc = {
> +       .owner          =       -2,
>         .enqueue        =       noop_enqueue,
>         .dequeue        =       noop_dequeue,
>         .flags          =       TCQ_F_BUILTIN,
>
>
> (and yes, I believe it doesn't need to be READ_ONCE for the check
> against -2 since that's mutually exclusive with all other values)
>
> Or maybe simply ignoring the value for the noop_qdisc:
>
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3822,7 +3822,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>                 return rc;
>         }
>
> -       if (unlikely(READ_ONCE(q->owner) == smp_processor_id())) {
> +       if (unlikely(q != &noop_qdisc && READ_ONCE(q->owner) == smp_processor_id())) {
>                 kfree_skb_reason(skb, SKB_DROP_REASON_TC_RECLASSIFY_LOOP);
>                 return NET_XMIT_DROP;
>         }
>
> That's shorter, but I'm not sure if there might be other special
> cases...
>
> Or maybe someone can think of an even better fix?

Why not simply initialize noop_qdisc.owner to -1 ?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ