lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 26 Aug 2019 17:07:24 +0300
From:   Shmulik Ladkani <>
To:     Daniel Borkmann <>,
        Eric Dumazet <>,
        netdev <>,
        Alexander Duyck <>
Cc:     Alexei Starovoitov <>, Yonghong Song <>,
        Steffen Klassert <>,,
Subject: BUG_ON in skb_segment, after bpf_skb_change_proto was applied


In our production systems, running v4.19.y longterm kernels, we hit a
BUG_ON in 'skb_segment()'. It occurs rarely and although tried, couldn't
synthetically reproduce.

In v4.19.41 it crashes at net/core/skbuff.c:3711

		while (pos < offset + len) {
			if (i >= nfrags) {
				i = 0;
				nfrags = skb_shinfo(list_skb)->nr_frags;
				frag = skb_shinfo(list_skb)->frags;
				frag_skb = list_skb;
				if (!skb_headlen(list_skb)) {
				} else {
3711:					BUG_ON(!list_skb->head_frag);

With the accompanying dump:

 kernel BUG at net/core/skbuff.c:3711!
 invalid opcode: 0000 [#1] SMP PTI
 CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Not tainted 4.19.41-041941-generic #201905080231
 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
 RIP: 0010:skb_segment+0xb65/0xda9
 Code: 89 44 24 60 48 89 4c 24 70 e8 87 b3 ff ff 48 8b 4c 24 70 44 8b 44 24 60 85 c0 44 8b 54 24 4c 0f 84 fc fb ff ff e9 16 fd ff ff <0f> 0b 29 c1 89 ce 09 ca e9 61 ff ff ff 0f 0b 41 8b bf 84 00 00 00
 RSP: 0018:ffff9e4d79b037c0 EFLAGS: 00010246
 RAX: ffff9e4d75012ec0 RBX: ffff9e4d74067500 RCX: 0000000000000000
 RDX: 0000000000480020 RSI: 0000000000000000 RDI: ffff9e4d74e3a200
 RBP: ffff9e4d79b03898 R08: 0000000000000564 R09: f69d84ecbfe8b972
 R10: 0000000000000571 R11: a6b66a32f69d84ec R12: 0000000000000564
 R13: ffff9e4c18d03ef0 R14: 0000000000000000 R15: ffff9e4d74e3a200
 FS:  0000000000000000(0000) GS:ffff9e4d79b00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000007f50d8 CR3: 000000009420a003 CR4: 00000000001606e0
 Call Trace:
  ? ip6_fragment+0xa40/0xa40
  ? nf_hook.part.24+0x10/0x10
  ? netif_receive_skb_core+0x20/0x20
  ? netif_receive_skb_internal+0x45/0xf0
  ? tcp4_gro_complete+0x86/0x90
  ? napi_gro_complete+0x53/0x90

To our best knowledge, the packet flow leading to this BUG_ON is:

  - ingress on eth0 (veth, gro:on), ipv4 udp encapsulated esp
  - re-ingresss on eth0, after xfrm, decapsulated ipv4 tcp
  - the skb was GROed (skb_is_gso:true)
  - ipv4 forwarding to dummy1, where eBPF nat4-to-6 program is attached
    at TC Egress (calls 'bpf_skb_change_proto()'), then redirect to ingress
    on same device.
    NOTE: 'bpf_skb_proto_4_to_6()' mangles 'shinfo->gso_size'
  - ingress on dummy1, ipv6 tcp
  - ipv6 forwarding
  - egress on tun2 (tun device) that calls:
    validate_xmit_skb -> ... -> skb_segment BUG_ON

A similar issue was reported and fixed by Yonghong Song in commit
13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb").

However 13acc94eff12 added "BUG_ON(!list_skb->head_frag)" to line 3711,
and patchwork states:

    This patch addressed the issue by handling skb_headlen(list_skb) != 0
    case properly if list_skb->head_frag is true, which is expected in
    most cases. [1]

meaning, 13acc94eff12 does not support list_skb->head_frag=0 case.

Historically, it is claimed that skb_segment is rather intolerant to
gso_size changes, quote:

    Eric suggested to shrink gso_size instead to avoid segmentation+fragments.
    I think its nice idea, but skb_gso_segment makes certain assumptions about
    nr_frags and gso_size (it can't handle frag size > desired mss). [2]

Any suggestions how to debug and fix this?

Could it be that 'bpf_skb_change_proto()' isn't really allowed to
mangle 'gso_size', and we should somehow enforce a 'skb_segment()' call
PRIOR translation?

Appreciate any input and assistance,


Powered by blists - more mailing lists