netdev - Re: [PATCH RFC net 1/1] net/sched: Fix mirred to self recursion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLhd4iD-pDVJHzKqWbf16u9KyNtgV41X3sd=iy15jDQtQ@mail.gmail.com>
Date: Wed, 27 Mar 2024 14:23:20 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, jiri@...nulli.us, 
	xiyou.wangcong@...il.com, netdev@...r.kernel.org, renmingshuai@...wei.com, 
	Victor Nogueira <victor@...atatu.com>
Subject: Re: [PATCH RFC net 1/1] net/sched: Fix mirred to self recursion

On Wed, Mar 27, 2024 at 12:03 AM Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
> When the mirred action is used on a classful egress qdisc and a packet is
> mirrored or redirected to self we hit a qdisc lock deadlock.
> See trace below.
>
> [..... other info removed for brevity....]
> [   82.890906]
> [   82.890906] ============================================
> [   82.890906] WARNING: possible recursive locking detected
> [   82.890906] 6.8.0-05205-g77fadd89fe2d-dirty #213 Tainted: G        W
> [   82.890906] --------------------------------------------
> [   82.890906] ping/418 is trying to acquire lock:
> [   82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
> __dev_queue_xmit+0x1778/0x3550
> [   82.890906]
> [   82.890906] but task is already holding lock:
> [   82.890906] ffff888006994110 (&sch->q.lock){+.-.}-{3:3}, at:
> __dev_queue_xmit+0x1778/0x3550
> [   82.890906]
> [   82.890906] other info that might help us debug this:
> [   82.890906]  Possible unsafe locking scenario:
> [   82.890906]
> [   82.890906]        CPU0
> [   82.890906]        ----
> [   82.890906]   lock(&sch->q.lock);
> [   82.890906]   lock(&sch->q.lock);
> [   82.890906]
> [   82.890906]  *** DEADLOCK ***
> [   82.890906]
> [..... other info removed for brevity....]
>
> Example setup (eth0->eth0) to recreate
> tc qdisc add dev eth0 root handle 1: htb default 30
> tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
>      action mirred egress redirect dev eth0
>
> Another example(eth0->eth1->eth0) to recreate
> tc qdisc add dev eth0 root handle 1: htb default 30
> tc filter add dev eth0 handle 1: protocol ip prio 2 matchall \
>      action mirred egress redirect dev eth1
>
> tc qdisc add dev eth1 root handle 1: htb default 30
> tc filter add dev eth1 handle 1: protocol ip prio 2 matchall \
>      action mirred egress redirect dev eth0
>
> We fix this by adding a per-cpu, per-qdisc recursion counter which is
> incremented the first time a root qdisc is entered and on a second attempt
> enter the same root qdisc from the top, the packet is dropped to break the
> loop.
>
> Reported-by: renmingshuai@...wei.com
> Closes: https://lore.kernel.org/netdev/20240314111713.5979-1-renmingshuai@huawei.com/
> Fixes: 3bcb846ca4cf ("net: get rid of spin_trylock() in net_tx_action()")
> Fixes: e578d9c02587 ("net: sched: use counter to break reclassify loops")
> Co-developed-by: Victor Nogueira <victor@...atatu.com>
> Signed-off-by: Victor Nogueira <victor@...atatu.com>
> Signed-off-by: Jamal Hadi Salim <jhs@...atatu.com>
> ---
>  include/net/sch_generic.h |  2 ++
>  net/core/dev.c            |  9 +++++++++
>  net/sched/sch_api.c       | 12 ++++++++++++
>  net/sched/sch_generic.c   |  2 ++
>  4 files changed, 25 insertions(+)
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index cefe0c4bdae3..f9f99df037ed 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -125,6 +125,8 @@ struct Qdisc {
>         spinlock_t              busylock ____cacheline_aligned_in_smp;
>         spinlock_t              seqlock;
>
> +       u16 __percpu            *xmit_recursion;
> +
>         struct rcu_head         rcu;
>         netdevice_tracker       dev_tracker;
>         /* private data */
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 9a67003e49db..2b712388c06f 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3789,6 +3789,13 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>         if (unlikely(contended))
>                 spin_lock(&q->busylock);

This could hang here (busylock)


>
> +       if (__this_cpu_read(*q->xmit_recursion) > 0) {
> +               __qdisc_drop(skb, &to_free);
> +               rc = NET_XMIT_DROP;
> +               goto free_skb_list;
> +       }


I do not think we want to add yet another cache line miss and
complexity in tx fast path.

I think that mirred should  use a separate queue to kick a transmit
from the top level.

(Like netif_rx() does)

Using a softnet.xmit_qdisc_recursion (not a qdisc-per-cpu thing),
would allow mirred to bypass this additional queue
in most cases.

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cb37817d6382c29117afd8ce54db6dba94f8c930..62ba5ef554860496ee928f7ed6b7c3ea46b8ee1d
100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3217,7 +3217,8 @@ struct softnet_data {
 #endif
        /* written and read only by owning cpu: */
        struct {
-               u16 recursion;
+               u8 recursion;
+               u8 qdisc_recursion;
                u8  more;
 #ifdef CONFIG_NET_EGRESS
                u8  skip_txqueue;
diff --git a/net/core/dev.c b/net/core/dev.c
index 9a67003e49db87f3f92b6c6296b3e7a5ca9d9171..7ac59835edef657e9558d4d4fc0a76b171aace93
100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4298,7 +4298,9 @@ int __dev_queue_xmit(struct sk_buff *skb, struct
net_device *sb_dev)

        trace_net_dev_queue(skb);
        if (q->enqueue) {
+               __this_cpu_inc(softnet_data.xmit.qdisc_recursion);
                rc = __dev_xmit_skb(skb, q, dev, txq);
+               __this_cpu_dec(softnet_data.xmit.qdisc_recursion);
                goto out;
        }

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 5b38143659249e66718348e0ec4ed3c7bc21c13d..0f5f02e6744397d33ae2a72670ba7131aaa6942e
100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -237,8 +237,13 @@ tcf_mirred_forward(bool at_ingress, bool
want_ingress, struct sk_buff *skb)
 {
        int err;

-       if (!want_ingress)
-               err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
+       if (!want_ingress) {
+               if (__this_cpu_read(softnet_data.xmit.qdisc_recursion)) {
+                       // Queue to top level, or drop
+               } else {
+                       err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
+               }
+       }
        else if (!at_ingress)
                err = netif_rx(skb);
        else