linux-kernel

Open Source and information security mailing list archives

Date:	Mon, 2 Mar 2015 11:45:11 +1100
From:	Chris Dunlop <chris@...he.net.au>
To:	dev@...nvswitch.org, linux-kernel@...r.kernel.org,
	stable@...r.kernel.org
Cc:	"Xu (Simon) Chen" <xchenum@...il.com>, Thomas Graf <tgraf@...g.ch>,
	Pravin Shelar <pshelar@...ira.com>, discuss@...nvswitch.org
Subject: pskb_expand_head: skb_shared BUG

Heads up...

We've hit this BUG() in v3.10.70, v3.14.27 and v3.18.7:

net/core/skbuff.c:
1027 int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
1028                      gfp_t gfp_mask)
1029 {
1030         int i;
1031         u8 *data;
1032         int size = nhead + skb_end_offset(skb) + ntail;
1033         long off;
1034 
1035         BUG_ON(nhead < 0);
1036 
1037         if (skb_shared(skb))
1038                 BUG();             <<< BOOM!!!

This appears to be a regression in the 3.10.x stable series:
we've been running for 11 months on v3.10.33 without problem, we
upgraded to v3.14.27 and hit the BUG(), than again on upgrading
to v3.18.7, then again after downgrading to v3.10.70. 

We're now back on v3.10.33.

The commit identified below (bda56f1) looks to be innocent as it hasn't
made it into v3.10.70.

Simon's email at the bottom has a proper stack trace.

Sorry, my stack traces are pretty rubbish as nothing shows up in
kern.log (nor rsyslog/omrelp), it appears openvswitch doesn't play
nicely with netconsole, and the KVM/IP (Avocent) we're using for the
consoles on these machines doesn't do scrollback, sometimes chops off
the first 3 characters or so, and is very blurry. FWIW, screenshots
available:

https://www.dropbox.com/sh/e9vjhi5i33r9dhk/AAAYJsYKBxaFEb6STCfAB-rpa?dl=0

Hand transcribed:

--- screen-1.png, v3.14.33, machine 1 ---
<<< beginning of trace cut off >>>
(EOI)
[(ffffffff 810fcd15)] do_softirg+0x45/0x80
[(ffffffff 8136fdf4)] netif_rx_ni+0x34/0x70
[(ffffffff a0108953)] tun_get_user+0x433/0x8f0 [tun]
[(ffffffff a010928d)] tun_sendmsg+0x5c/0x80 [tun]
[(ffffffff a01b3e42)] handle_tx+0x382/0x400 [vhost_net]
[(ffffffff a01b3ef5)] handle_tx_kick+0x15/0x20 [vhost_net]
[(ffffffff a01a3652)] vhost_worker+0x103/0x1a0 [vhost]
[(ffffffff a01a8550)] ? vhost_get_vq_desc+0x480/0x480 [vhost]
[(ffffffff 8106be24)] kthread+0xe4/0x100
[(ffffffff 8106bd40)] ? kthread_create_on_node+0x240/0x240
[(ffffffff 8146a66c)] ret_from_fork+0x7c/0xb0
[(ffffffff 8106bd40)] ? kthread_create_on_node+0x240/0x240
Code: 55 c0 e8 80 be dc ff 64 c0 48 8b 55 b0 0f 85 0b ff ff ff e9 02 ff ff ff 0f 1f 80 00 00 00 00 41 81 bf 00 20 00 00 e9 1f fe ff ff <0f> 0b 0f 0b 44 89 fe 4c 89 ef e8 ac ec ff ff 85 b0 74 12 48 89
RIP [(ffffffff31360fa4)] pskb_expand_head+0x234/0x370
 RSP (ffff88031f003968)
---[ end trace d2c9127b007ab2a5 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000+0xffffffff9fffffff]

--- screen-2.png, v3.14.70, machine 1 ---
<<< beginning of trace cut off >>>
[(ffffffff813c31fc)] call_softirq+0x1c/0x30
(EOI)
[(ffffffff81003945)] do_softirq+0x55/0x90
[(ffffffff812e8fa8)] netif_rx_ni+0x28/0x30
[(ffffffffa039765e)] tun_get_user+0x43e/0x860 [tun]
[(ffffffffa0477001)] ? vhost_get_vq_desc+0x211/0x450 [vhost_net]
[(ffffffffa0397ad7)] tun_sendmsg+0x57/0x80 [tun]
[(ffffffffa0478540)] handle_tx+0x220/0x4a0 [tun]
[(ffffffffa04787f5)] handle_tx_kick+0x15/0x20 [vhost_net]
[(ffffffffa0475781)] vhost_worker+0xf1/0x190 [vhost_net]
[(ffffffffa0475690)] ? memory_access_ok.isra.9+0xc0/0xc0 [vhost_net]
[(ffffffff81060650)] kthread+0xc0/0xd0
[(ffffffff81060590)] ? kthread_create_on_node+0x130/0x130
[(ffffffff813c1f1c)] ret_from_fork+0x8c/0xb0
[(ffffffff81060590)] ? kthread_create_on_node+0x130/0x130
Code: 55 c0 e8 40 e1 e0 ff 84 c0 48 8b 55 c0 0f 85 e3 fe ff ff e? ?a fe ff ff 0f 1f 80 00 00 00 00 41 81 cf 00 20 00 00 e9 ff fd ff ff <0f> 0b 0? ?b 44 89 fe 48 89 df e8 7d eb ff ff 85 c0 74 12 4c 89
RIP  [(ffffffff812cded4)] pskb_expand_head+0x254/0x290
 RSP (ffff88103fd039a8)
---[ end trace 5f660386da152ef0 ]---
Kernel panic - not syncing: Fatal exception in interrupt

--- screen-3.png, v3.14.70, machine 2 ---
<<< beginning of trace cut off >>>
(EOI)
[(ffffffff81003945)] do_soft_irq+0x55/0x90
[(ffffffff812a9fa9)] netif_rx_ni+0x28/0x30
[(ffffffffa031865e)] tun_get_user+0x43e/0x860 [tun]
[(ffffffffa0310000)] ? vhost_get_vq_desc+0x211/0x450 [vhost_net]
[(ffffffffa0319ad7)] tun_sendmsg+0x57/0x80 [tun]
[(ffffffffa0311540)] handle_tx+0x220/0x4a0 [vhost_net]
[(ffffffffa03117f5)] handle_tx_kick+-x15/0x20 [vhost_net]
[(ffffffffa030a781)] vhost_worker+0xf1/0x190 [vhost_net]
[(ffffffffa030a690)] ? memory_access_ok.isra.9+0xc0/0xd0 [vhost_net]
[(ffffffff81080650)] kthread+0xc0/0xd0
[(ffffffff81060590)] ? kthread_create_on_node+0x130/0x130
[(ffffffff813c1f1b)] ret_from_fork+0x7c/0xb0
[(ffffffff81060590)] ? kthread_create_on_node+0x130/0x130
Code: 55 c0 e8 40 e1 e0 ff 84 c0 48 8b 55 c0 0f 85 e8 fa ff ff e9 ca fa ff ff 0c 1f 80 00 00 00 00 41 81 cf 00 20 00 00 e9 ff fd ff ff <0f> 0b 0f 0b 44 89 fe 48 89 df e8 7c eb ff ff 85 c0 74 12 4c 89
RIP  [(ffffffff312dded4)] pskb_expand_head+0x254/0x390
 RSP (ffff88103fdc39a8)
---[ end trace 39d8f481dcb017e8 ]---
Kernel panic - not syncing: Fatal exception in interrupt

Cheers,

Chris

On Fri, Feb 27, 2015 at 04:13:30PM +1100, Chris Dunlop wrote:
> Hi,
> 
> "Me too" on Simon's BUG() described below (apologies for the top post).
> Basically:
> 
> [ 7318.409796] kernel BUG at net/core/skbuff.c:1041!
> ...
> [ 7318.591562] RIP: 0010:[<ffffffff813eb634>]  [<ffffffff813eb634>] pskb_expand_head+0x234/0x270
> ...
> [ 7318.705710]  [<ffffffff813eb6fc>] __pskb_pull_tail+0x4c/0x330
> [ 7318.711571]  [<ffffffff813f8ca7>] skb_checksum_help+0x147/0x1a0
> [ 7318.717599]  [<ffffffffa07de8b0>] queue_userspace_packet+0x3f0/0x440 [openvswitch]
> 
> I've hit this BUG() several times within hours or days of running on v3.14.27
> and v3.14.33, whereas the box previously ran for months on v3.10.33 without an
> issue.
> 
> It looks like it's related to this commit which went into v3.14-rc1 and
> introduced performing skb_checksum_help() on a (non-vlan) skb passed into
> queue_userspace_packet():
> 
> commit bda56f143c9dc38ae7926ba21ebeb35359a6c051
> Author: Thomas Graf <tgraf@...g.ch>
> 
>     openvswitch: Use skb_zerocopy() for upcall
> 
> It seems in some cases the skb is shared, causing us to hit the
> BUG() in pskb_expand_head(), per:
> 
> ----------------------------------------------------------------------
> static int queue_userspace_packet(struct net *net, int dp_ifindex,
>                                   struct sk_buff *skb,
>                                   const struct dp_upcall_info *upcall_info)
> {
>         if (vlan_tx_tag_present(skb)) {
> 		nskb = skb_clone(skb, GFP_ATOMIC);
> 		...
> 		skb = nskb;
>         }
>         ...
> 	if (skb->ip_summed == CHECKSUM_PARTIAL &&
> 	    (err = skb_checksum_help(skb)))
> 		...
> }
> 
> int skb_checksum_help(struct sk_buff *skb)
> {
> 	...
> 	if (skb_has_shared_frag(skb)) {
> 		ret = __skb_linearize(skb);
> 		...
> }
> 
> static inline int __skb_linearize(struct sk_buff *skb)
> {
> 	return __pskb_pull_tail(skb, skb->data_len) ? 0 : -ENOMEM;
> }
> 
> unsigned char *__pskb_pull_tail(struct sk_buff *skb, int delta)
> {
> 	int eat = (skb->tail + delta) - skb->end;
> 	...
> 	if (eat > 0 || skb_cloned(skb)) {
> 		if (pskb_expand_head(skb, 0, eat > 0 ? eat + 128 : 0,
> 				GFP_ATOMIC))
> 			return NULL;
> 	}
> 	...
> }
> 
> pskb_expand_head(skb, 0, eat > 0 ? eat + 128 : 0, GFP_ATOMIC)
> {
> 	...
> 	if (skb_shared(skb))
> 		BUG();				<<< BOOM!!!
> 	...
> }
> ----------------------------------------------------------------------
> 
> Cheers,
> 
> Chris
> 
> On Thu, Feb 12, 2015 at 08:32:46PM -0500, Xu (Simon) Chen wrote:
> > Hi folks,
> > 
> > I can now consistently reproduce a kernel panic on my system. I am using
> > OVS 2.3.0 on 3.14.29 kernel, a sender and a receiver (two VMs) on two
> > identical hypervisors, using VXLAN tunnel connecting the two VMs. Iperf is
> > used inside of VMs for generating traffic. The sender side has no problem,
> > while the hypervisor with the receiving VM consistently crashes after
> > certain amount of time (or rather packets).
> > 
> > 
> > The kernel panic seems to be related to skb_shared check inside of
> > pskb_expand_head function:
> > 
> >  [ 7318.405112] ------------[ cut here ]------------
> > [ 7318.409796] kernel BUG at net/core/skbuff.c:1041!
> > [ 7318.414563] invalid opcode: 0000 [#1] SMP
> > [ 7318.418868] Modules linked in: ip6table_filter ip6_tables xt_mac xt_tcpudp xt_state xt_physdev xt_set xt_multiport iptable_filter iptable_nat nf_nat_ipv4 nf_nat iptable_raw ip_tables x_tables ip_set_hash_ip ip_set nfnetlink vhost_net vhost macvtap macvlan tun veth openvswitch(O) gre vxlan libcrc32c bridge 8021q garp stp llc bonding joydev hid_generic usbhid hid deflate ctr twofish_generic twofish_avx_x86_64 nfsd twofish_x86_64_3way twofish_x86_64 twofish_common auth_rpcgss oid_registry nfs_acl camellia_generic camellia_aesni_avx_x86_64 camellia_x86_64 nfs lockd serpent_avx_x86_64 fscache serpent_sse2_x86_64 xts serpent_generic sunrpc blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common des_generic cbc cmac binfmt_misc xcbc rmd160 sha512_generic sha256_generic hmac crypto_null af_key xfrm_algo iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal coretemp kvm_intel kvm crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd microcode evdev ehci_pci sb_edac ehci_hcd edac_core usbcore lpc_ich ioatdma i2c_i801 usb_common mfd_core tpm_tis wmi tpm acpi_cpufreq processor thermal_sys button nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack ipmi_devintf ipmi_si ipmi_msghandler loop tcp_scalable autofs4 ext4 cr c16 jbd2 mbcache crc32c btrfs xor raid6_pq dm_mod mlx4_en(O) sg sd_mod crc_t10dif crct10dif_common igb isci i2c_algo_bit ahci libsas i2c_core libahci dca mlx4_core(O) megaraid_sas scsi_transport_sas ptp libata pps_core compat(O) scsi_mod
> > [ 7318.568195] CPU: 14 PID: 54124 Comm: vhost-54120 Tainted: G           O 3.14.25-ts1 #1
> > [ 7318.576227] Hardware name: Supermicro SYS-F617R2-R72+/X9DRFR, BIOS 3.0b 04/24/2014
> > [ 7318.583944] task: ffff887f25dde240 ti: ffff883ef6a32000 task.ti: ffff883ef6a32000
> > [ 7318.591562] RIP: 0010:[<ffffffff813eb634>]  [<ffffffff813eb634>] pskb_expand_head+0x234/0x270
> > [ 7318.600295] RSP: 0018:ffff887f7f103978  EFLAGS: 00010202
> > [ 7318.605770] RAX: 0000000000000002 RBX: ffff887f23417700 RCX: 0000000000000020
> > [ 7318.613016] RDX: 00000000000002ee RSI: 0000000000000000 RDI: ffff887f23417700
> > [ 7318.620278] RBP: ffff887f7f1039b8 R08: 000000005ff00000 R09: ffff887f2113e040
> > [ 7318.627523] R10: 00000000ffffee43 R11: 0000000000000002 R12: 0000000000000000
> > [ 7318.634805] R13: ffff887f23417700 R14: 000000000000000d R15: ffff887f23417700
> > [ 7318.642032] FS:  0000000000000000(0000) GS:ffff887f7f100000(0000) knlGS:0000000000000000
> > [ 7318.650238] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 7318.656110] CR2: 00002ba78155e680 CR3: 0000007f1f8cd000 CR4: 00000000001427e0
> > [ 7318.663378] Stack:
> > [ 7318.665464]  ffff887f7f1039f8 ffffffff8142ae14 ffffffffa07b50f0 ffff887f23417700
> > [ 7318.673226]  ffff887f7f103b58 ffff887f7f103a70 000000000000000d ffff887f23417700
> > [ 7318.681000]  ffff887f7f103a08 ffffffff813eb6fc ffff887f23417700 000001f823417700
> > [ 7318.688790] Call Trace:
> > [ 7318.691309]  <IRQ>
> > [ 7318.693290]  [<ffffffff8142ae14>] ? nf_hook_slow+0x74/0x130
> > [ 7318.699285]  [<ffffffffa07b50f0>] ? deliver_clone+0x60/0x60 [bridge]
> > [ 7318.705710]  [<ffffffff813eb6fc>] __pskb_pull_tail+0x4c/0x330
> > [ 7318.711571]  [<ffffffff813f8ca7>] skb_checksum_help+0x147/0x1a0
> > [ 7318.717599]  [<ffffffffa07de8b0>] queue_userspace_packet+0x3f0/0x440 [openvswitch]
> > [ 7318.725289]  [<ffffffffa07dfcd5>] ovs_dp_upcall+0x65/0x70 [openvswitch]
> > [ 7318.732037]  [<ffffffffa07dc7b6>] do_execute_actions+0x366/0xc00 [openvswitch]
> > [ 7318.739403]  [<ffffffff8142ae14>] ? nf_hook_slow+0x74/0x130
> > [ 7318.745072]  [<ffffffff812a7c9a>] ? arch_fast_hash2+0xa/0x10
> > [ 7318.750883]  [<ffffffffa07dc7ec>] do_execute_actions+0x39c/0xc00 [openvswitch]
> > [ 7318.758221]  [<ffffffffa07b570d>] ? br_forward+0x5d/0x70 [bridge]
> > [ 7318.764419]  [<ffffffffa07dd0c6>] ovs_execute_actions+0x76/0x110 [openvswitch]
> > [ 7318.771773]  [<ffffffffa07dfd6f>] ovs_dp_process_packet_with_key+0x8f/0xf0 [openvswitch]
> > [ 7318.779988]  [<ffffffffa07e0efa>] ? ovs_flow_extract+0x89a/0xab0 [openvswitch]
> > [ 7318.787355]  [<ffffffffa07dfe10>] ovs_dp_process_received_packet+0x40/0x60 [openvswitch]
> > [ 7318.795535]  [<ffffffffa07e616a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
> > [ 7318.802634]  [<ffffffffa07e7cf5>] netdev_frame_hook+0xc5/0x120 [openvswitch]
> > [ 7318.809773]  [<ffffffff813f9f42>] __netif_receive_skb_core+0x332/0x7f0
> > [ 7318.816418]  [<ffffffffa07e7c30>] ? netdev_create+0x150/0x150 [openvswitch]
> > [ 7318.823475]  [<ffffffff813fa426>] __netif_receive_skb+0x26/0x70
> > [ 7318.829472]  [<ffffffff813fa514>] process_backlog+0xa4/0x180
> > [ 7318.835223]  [<ffffffff813fa979>] net_rx_action+0x139/0x220
> > [ 7318.840894]  [<ffffffff81053218>] __do_softirq+0xf8/0x280
> > [ 7318.846391]  [<ffffffff81504b5c>] do_softirq_own_stack+0x1c/0x30
> > [ 7318.852517]  <EOI>
> > [ 7318.854504]  [<ffffffff81053425>] do_softirq+0x45/0x50
> > [ 7318.860084]  [<ffffffff813f9759>] netif_rx_ni+0x39/0x70
> > [ 7318.865416]  [<ffffffffa07f1ab3>] tun_get_user+0x413/0x840 [tun]
> > [ 7318.871506]  [<ffffffffa07f1f3a>] tun_sendmsg+0x5a/0x80 [tun]
> > [ 7318.877357]  [<ffffffffa0819e32>] handle_tx+0x382/0x400 [vhost_net]
> > [ 7318.883712]  [<ffffffffa0819ee5>] handle_tx_kick+0x15/0x20 [vhost_net]
> > [ 7318.890333]  [<ffffffffa080d4f6>] vhost_worker+0xf6/0x190 [vhost]
> > [ 7318.896528]  [<ffffffffa080d400>] ? vhost_log_access_ok+0x30/0x30 [vhost]
> > [ 7318.903454]  [<ffffffff81070c69>] kthread+0xc9/0xe0
> > [ 7318.908412]  [<ffffffff81070ba0>] ? flush_kthread_worker+0x80/0x80
> > [ 7318.914674]  [<ffffffff8150342c>] ret_from_fork+0x7c/0xb0
> > [ 7318.920163]  [<ffffffff81070ba0>] ? flush_kthread_worker+0x80/0x80
> > [ 7318.926426] Code: 55 c0 e8 f0 38 d4 ff 48 8b 55 c0 84 c0 0f 85 0b ff ff ff e9 02 ff ff ff 0f 1f 80 00 00 00 00 41 81 cf 00 20 00 00 e9 1f fe ff ff <0f> 0b 0f 0b 44 89 fe 4c 89 ef e8 ad e8 ff ff 85 c0 74 12 48 89
> > [ 7318.950040] RIP  [<ffffffff813eb634>] pskb_expand_head+0x234/0x270
> > [ 7318.956385]  RSP <ffff887f7f103978>
> > [ 7318.959988] ---[ end trace 221c17dcc65b8372 ]---
> > [ 7319.076935] Kernel panic - not syncing: Fatal exception in interrupt
> > [ 7319.086993] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
> > [ 7319.204560] Rebooting in 10 seconds..
> >
> > _______________________________________________
> > discuss mailing list
> > discuss@...nvswitch.org
> > http://openvswitch.org/mailman/listinfo/discuss
> _______________________________________________
> dev mailing list
> dev@...nvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives