netdev - Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred ingress

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YzCZMHYmk53mQ+HK@pop-os.localdomain>
Date:   Sun, 25 Sep 2022 11:08:48 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     Davide Caratti <dcaratti@...hat.com>
Cc:     Jamal Hadi Salim <jhs@...atatu.com>, Jiri Pirko <jiri@...nulli.us>,
        Paolo Abeni <pabeni@...hat.com>,
        Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
        wizhao@...hat.com, netdev@...r.kernel.org
Subject: Re: [PATCH net] net/sched: act_mirred: use the backlog for mirred
 ingress

On Fri, Sep 23, 2022 at 05:11:12PM +0200, Davide Caratti wrote:
> William reports kernel soft-lockups on some OVS topologies when TC mirred
> "egress-to-ingress" action is hit by local TCP traffic. Indeed, using the
> mirred action in egress-to-ingress can easily produce a dmesg splat like:
> 
>  ============================================
>  WARNING: possible recursive locking detected
>  6.0.0-rc4+ #511 Not tainted
>  --------------------------------------------
>  nc/1037 is trying to acquire lock:
>  ffff950687843cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
> 
>  but task is already holding lock:
>  ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
> 
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(slock-AF_INET/1);
>    lock(slock-AF_INET/1);
> 
>   *** DEADLOCK ***
> 
>   May be due to missing lock nesting notation
> 
>  12 locks held by nc/1037:
>   #0: ffff950687843d40 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x19/0x40
>   #1: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
>   #2: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
>   #3: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
>   #4: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
>   #5: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
>   #6: ffff950687846cb0 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1023/0x1160
>   #7: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5/0x610
>   #8: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: ip_finish_output2+0xaa/0xa10
>   #9: ffffffff9be072e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x72/0x11b0
>   #10: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x181/0x400
>   #11: ffffffff9be07320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x54/0x160
> 
>  stack backtrace:
>  CPU: 1 PID: 1037 Comm: nc Not tainted 6.0.0-rc4+ #511
>  Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7353+9de0a3cc 04/01/2014
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x44/0x5b
>   __lock_acquire.cold.76+0x121/0x2a7
>   lock_acquire+0xd5/0x310
>   _raw_spin_lock_nested+0x39/0x70
>   tcp_v4_rcv+0x1023/0x1160
>   ip_protocol_deliver_rcu+0x4d/0x280
>   ip_local_deliver_finish+0xac/0x160
>   ip_local_deliver+0x71/0x220
>   ip_rcv+0x5a/0x200
>   __netif_receive_skb_one_core+0x89/0xa0
>   netif_receive_skb+0x1c1/0x400
>   tcf_mirred_act+0x2a5/0x610 [act_mirred]
>   tcf_action_exec+0xb3/0x210
>   fl_classify+0x1f7/0x240 [cls_flower]
>   tcf_classify+0x7b/0x320
>   __dev_queue_xmit+0x3a4/0x11b0
>   ip_finish_output2+0x3b8/0xa10
>   ip_output+0x7f/0x260
>   __ip_queue_xmit+0x1ce/0x610
>   __tcp_transmit_skb+0xabc/0xc80
>   tcp_rcv_state_process+0x669/0x1290
>   tcp_v4_do_rcv+0xd7/0x370
>   tcp_v4_rcv+0x10bc/0x1160
>   ip_protocol_deliver_rcu+0x4d/0x280
>   ip_local_deliver_finish+0xac/0x160
>   ip_local_deliver+0x71/0x220
>   ip_rcv+0x5a/0x200
>   __netif_receive_skb_one_core+0x89/0xa0
>   netif_receive_skb+0x1c1/0x400
>   tcf_mirred_act+0x2a5/0x610 [act_mirred]
>   tcf_action_exec+0xb3/0x210
>   fl_classify+0x1f7/0x240 [cls_flower]
>   tcf_classify+0x7b/0x320
>   __dev_queue_xmit+0x3a4/0x11b0
>   ip_finish_output2+0x3b8/0xa10
>   ip_output+0x7f/0x260
>   __ip_queue_xmit+0x1ce/0x610
>   __tcp_transmit_skb+0xabc/0xc80
>   tcp_write_xmit+0x229/0x12c0
>   __tcp_push_pending_frames+0x32/0xf0
>   tcp_sendmsg_locked+0x297/0xe10
>   tcp_sendmsg+0x27/0x40
>   sock_sendmsg+0x58/0x70
>   __sys_sendto+0xfd/0x170
>   __x64_sys_sendto+0x24/0x30
>   do_syscall_64+0x3a/0x90
>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>  RIP: 0033:0x7f11a06fd281
>  Code: 00 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 e5 43 2c 00 41 89 ca 8b 00 85 c0 75 1c 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 41 56 41 89 ce 41 55
>  RSP: 002b:00007ffd17958358 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
>  RAX: ffffffffffffffda RBX: 0000555c6e671610 RCX: 00007f11a06fd281
>  RDX: 0000000000002000 RSI: 0000555c6e73a9f0 RDI: 0000000000000003
>  RBP: 0000555c6e6433b0 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
>  R13: 0000555c6e671410 R14: 0000555c6e671410 R15: 0000555c6e6433f8
>   </TASK>
> 
> that is very similar to those observed by William in his setup.
> By using netif_rx() for mirred ingress packets, packets are queued in the
> backlog, like it's done in the receive path of "loopback" and "veth", and
> the deadlock is not visible anymore. Also add a selftest that can be used
> to reproduce the problem / verify the fix.

Which also means we can no longer know the RX path status any more,
right? I mean if we have filters on ingress, we can't know whether they
drop this packet or not, after this patch? To me, this at least breaks
users' expectation.

BTW, have you thought about solving the above lockdep warning in TCP
layer?

Thanks.