netdev - Re: BUG_ON in skb_segment, after bpf_skb_change

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <94cd6f4d-09d4-11c0-64f4-bdc544bb3dcb@gmail.com>
Date:   Mon, 26 Aug 2019 19:47:40 +0200
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Shmulik Ladkani <shmulik.ladkani@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Eric Dumazet <eric.dumazet@...il.com>,
        netdev <netdev@...r.kernel.org>,
        Alexander Duyck <alexander.duyck@...il.com>
Cc:     Alexei Starovoitov <ast@...nel.org>, Yonghong Song <yhs@...com>,
        Steffen Klassert <steffen.klassert@...unet.com>,
        shmulik@...anetworks.com, eyal@...anetworks.com
Subject: Re: BUG_ON in skb_segment, after bpf_skb_change_proto was applied



On 8/26/19 4:07 PM, Shmulik Ladkani wrote:
> Hi,
> 
> In our production systems, running v4.19.y longterm kernels, we hit a
> BUG_ON in 'skb_segment()'. It occurs rarely and although tried, couldn't
> synthetically reproduce.
> 
> In v4.19.41 it crashes at net/core/skbuff.c:3711
> 
> 		while (pos < offset + len) {
> 			if (i >= nfrags) {
> 				i = 0;
> 				nfrags = skb_shinfo(list_skb)->nr_frags;
> 				frag = skb_shinfo(list_skb)->frags;
> 				frag_skb = list_skb;
> 				if (!skb_headlen(list_skb)) {
> 					BUG_ON(!nfrags);
> 				} else {
> 3711:					BUG_ON(!list_skb->head_frag);
> 
> With the accompanying dump:
> 
>  kernel BUG at net/core/skbuff.c:3711!
>  invalid opcode: 0000 [#1] SMP PTI
>  CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 4.19.41-041941-generic #201905080231
>  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
>  RIP: 0010:skb_segment+0xb65/0xda9
>  Code: 89 44 24 60 48 89 4c 24 70 e8 87 b3 ff ff 48 8b 4c 24 70 44 8b 44 24 60 85 c0 44 8b 54 24 4c 0f 84 fc fb ff ff e9 16 fd ff ff <0f> 0b 29 c1 89 ce 09 ca e9 61 ff ff ff 0f 0b 41 8b bf 84 00 00 00
>  RSP: 0018:ffff9e4d79b037c0 EFLAGS: 00010246
>  RAX: ffff9e4d75012ec0 RBX: ffff9e4d74067500 RCX: 0000000000000000
>  RDX: 0000000000480020 RSI: 0000000000000000 RDI: ffff9e4d74e3a200
>  RBP: ffff9e4d79b03898 R08: 0000000000000564 R09: f69d84ecbfe8b972
>  R10: 0000000000000571 R11: a6b66a32f69d84ec R12: 0000000000000564
>  R13: ffff9e4c18d03ef0 R14: 0000000000000000 R15: ffff9e4d74e3a200
>  FS:  0000000000000000(0000) GS:ffff9e4d79b00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00000000007f50d8 CR3: 000000009420a003 CR4: 00000000001606e0
>  Call Trace:
>   <IRQ>
>   tcp_gso_segment+0xf9/0x4e0
>   tcp6_gso_segment+0x5e/0x100
>   ipv6_gso_segment+0x112/0x340
>   skb_mac_gso_segment+0xb9/0x130
>   __skb_gso_segment+0x84/0x190
>   validate_xmit_skb+0x14a/0x2f0
>   validate_xmit_skb_list+0x4b/0x70
>   sch_direct_xmit+0x154/0x390
>   __dev_queue_xmit+0x808/0x920
>   dev_queue_xmit+0x10/0x20
>   neigh_direct_output+0x11/0x20
>   ip6_finish_output2+0x1b9/0x5b0
>   ip6_finish_output+0x13a/0x1b0
>   ip6_output+0x6c/0x110
>   ? ip6_fragment+0xa40/0xa40
>   ip6_forward+0x501/0x810
>   ip6_rcv_finish+0x7a/0x90
>   ipv6_rcv+0x69/0xe0
>   ? nf_hook.part.24+0x10/0x10
>   __netif_receive_skb_core+0x4fa/0xc80
>   ? netif_receive_skb_core+0x20/0x20
>   ? netif_receive_skb_internal+0x45/0xf0
>   ? tcp4_gro_complete+0x86/0x90
>   ? napi_gro_complete+0x53/0x90
>   __netif_receive_skb_one_core+0x3b/0x80
>   __netif_receive_skb+0x18/0x60
>   process_backlog+0xb3/0x170
>   net_rx_action+0x130/0x350
>   __do_softirq+0xdc/0x2d4
> 
> To our best knowledge, the packet flow leading to this BUG_ON is:
> 
>   - ingress on eth0 (veth, gro:on), ipv4 udp encapsulated esp
>   - re-ingresss on eth0, after xfrm, decapsulated ipv4 tcp
>   - the skb was GROed (skb_is_gso:true)
>   - ipv4 forwarding to dummy1, where eBPF nat4-to-6 program is attached
>     at TC Egress (calls 'bpf_skb_change_proto()'), then redirect to ingress
>     on same device.
>     NOTE: 'bpf_skb_proto_4_to_6()' mangles 'shinfo->gso_size'

Doing this on an skb with a frag_list is doomed, in current gso_segment() state.

A rewrite  would be needed (I believe I did so at some point, but Herbert Xu fought hard against it)


>   - ingress on dummy1, ipv6 tcp
>   - ipv6 forwarding
>   - egress on tun2 (tun device) that calls:
>     validate_xmit_skb -> ... -> skb_segment BUG_ON
> 
> A similar issue was reported and fixed by Yonghong Song in commit
> 13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb").
> 
> However 13acc94eff12 added "BUG_ON(!list_skb->head_frag)" to line 3711,
> and patchwork states:
> 
>     This patch addressed the issue by handling skb_headlen(list_skb) != 0
>     case properly if list_skb->head_frag is true, which is expected in
>     most cases. [1]
> 
> meaning, 13acc94eff12 does not support list_skb->head_frag=0 case.
> 
> Historically, it is claimed that skb_segment is rather intolerant to
> gso_size changes, quote:
> 
>     Eric suggested to shrink gso_size instead to avoid segmentation+fragments.
>     I think its nice idea, but skb_gso_segment makes certain assumptions about
>     nr_frags and gso_size (it can't handle frag size > desired mss). [2]
> 
> Any suggestions how to debug and fix this?
> 
> Could it be that 'bpf_skb_change_proto()' isn't really allowed to
> mangle 'gso_size', and we should somehow enforce a 'skb_segment()' call
> PRIOR translation?
> 
> Appreciate any input and assistance,
> Shmulik
> 
> [1] https://patchwork.ozlabs.org/patch/889166/
> [2] https://patchwork.ozlabs.org/patch/314327/
>