lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 5 Nov 2019 15:32:53 +0100
From:   Thibaut Sautereau <thibaut.sautereau@...p-os.org>
To:     Vlastimil Babka <vbabka@...e.cz>
Cc:     netdev@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        "David S. Miller" <davem@...emloft.net>,
        Laura Abbott <labbott@...hat.com>,
        Kees Cook <keescook@...omium.org>,
        Alexander Potapenko <glider@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>, clipos@....gouv.fr
Subject: Re: Double free of struct sk_buff reported by
 SLAB_CONSISTENCY_CHECKS with init_on_free

On Tue, Nov 05, 2019 at 10:00:39AM +0100, Vlastimil Babka wrote:
> On 11/4/19 6:03 PM, Thibaut Sautereau wrote:
> > The BUG only happens when using `slub_debug=F` on the command-line (to
> > enable SLAB_CONSISTENCY_CHECKS), otherwise the double free is not
> > reported and the system keeps running.
> 
> You could change slub_debug parameter to:
> slub_debug=FU,skbuff_head_cache
> 
> That will also print out who previously allocated and freed the double
> freed object. And limit all the tracking just to the affected cache.

Thanks, I did not know about that.

However, as kind of expected, I get a BUG due to a NULL pointer
dereference in print_track():

	=============================================================================
	BUG skbuff_head_cache (Tainted: G                T): Object already free
	-----------------------------------------------------------------------------

	BUG: kernel NULL pointer dereference, address: 00000000000000e0
	#PF: supervisor read access in kernel mode
	#PF: error_code(0x0000) - not-present page
	PGD 0 P4D 0 
	Oops: 0000 [#1] PREEMPT SMP PTI
	CPU: 0 PID: 0 Comm: swapper/0 Tainted: G    B           T 5.3.8 #1
	Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
	RIP: 0010:print_track+0xd/0x67
	Code: 24 20 89 ee 48 c7 c7 70 33 d5 8c 49 8b 4c 24 28 49 89 c0 e8 77 f2 ef ff 83 c5 01 eb b9 41 54 49 89 fc 55 48 89 d5 53 48 89 f3 <48> 8b 13 48 85 d2 74 4d 48 89 e9 44 8b 8b 8c 00 00 00 4c 89 e6 48
	RSP: 0018:ffff99d1c0003d18 EFLAGS: 00010002
	RAX: 0000000000000000 RBX: 00000000000000e0 RCX: 0000000000000000
	RDX: 00000000fffc0d26 RSI: 00000000000000e0 RDI: ffffffff8cd538a0
	RBP: 00000000fffc0d26 R08: 0000000000000240 R09: 0000000000000004
	R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8cd538a0
	R13: ffffd1cd00c88700 R14: ffff93613e077e00 R15: 0000000000000002
	FS:  0000000000000000(0000) GS:ffff93613e600000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00000000000000e0 CR3: 00000000010c0000 CR4: 00000000000006f0
	Call Trace:
	 <IRQ>
	 print_tracking+0x38/0x6b
	 print_trailer+0x33/0x1d4
	 free_debug_processing.cold.38+0xc9/0x149
	 ? __kfree_skb_flush+0x30/0x40
	 ? __kfree_skb_flush+0x30/0x40
	 ? __kfree_skb_flush+0x30/0x40
	 __slab_free+0x22a/0x3d0
	 ? sock_wfree+0x5d/0x70
	 ? __kfree_skb_flush+0x30/0x40
	 kmem_cache_free_bulk+0x354/0x420
	 __kfree_skb_flush+0x30/0x40
	 net_rx_action+0x2be/0x460
	 __do_softirq+0xf3/0x249
	 irq_exit+0x93/0xb0
	 do_IRQ+0xa0/0x110
	 common_interrupt+0xf/0xf
	 </IRQ>
	RIP: 0010:default_idle+0x13/0x20
	Code: ff eb 9f e8 8f eb 8c ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 e8 0b 4b bd ff e9 07 00 00 00 0f 00 2d 21 5a 42 00 fb f4 <e9> f8 4a bd ff 0f 1f 84 00 00 00 00 00 53 65 48 8b 1c 25 00 5d 01
	RSP: 0018:ffffffff8d003eb8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffffd6
	RAX: 0000000000000000 RBX: 0000000000000000 RCX: 7ffffff6501e23c1
	RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8cda0130
	RBP: ffffffff8d088438 R08: 0000000000000000 R09: 00000009b00fa2fe
	R10: 00000000000001fd R11: 0000000000000001 R12: 0000000000000000
	R13: 0000000000000000 R14: ffffffff8d502920 R15: 000000007efbbd98
	 ? default_idle+0x5/0x20
	 do_idle+0x1a8/0x220
	 cpu_startup_entry+0x14/0x20
	 start_kernel+0x615/0x654
	 secondary_startup_64+0xa4/0xb0
	Modules linked in: virtio_scsi virtio_net net_failover failover virtio_mmio virtio_input virtio_gpu virtio_crypto crypto_engine virtio_console virtio_balloon uhci_hcd uas usb_storage snd_hda_codec_generic snd_hda_codec snd_hda_core ledtrig_audio rtc_cmos qxl qemu_fw_cfg psmouse mousedev ehci_pci ehci_hcd button bochs_drm drm_vram_helper ttm virtio_rng
	CR2: 00000000000000e0
	---[ end trace 386b05c18ef99c32 ]---
	RIP: 0010:print_track+0xd/0x67
	Code: 24 20 89 ee 48 c7 c7 70 33 d5 8c 49 8b 4c 24 28 49 89 c0 e8 77 f2 ef ff 83 c5 01 eb b9 41 54 49 89 fc 55 48 89 d5 53 48 89 f3 <48> 8b 13 48 85 d2 74 4d 48 89 e9 44 8b 8b 8c 00 00 00 4c 89 e6 48
	RSP: 0018:ffff99d1c0003d18 EFLAGS: 00010002
	RAX: 0000000000000000 RBX: 00000000000000e0 RCX: 0000000000000000
	RDX: 00000000fffc0d26 RSI: 00000000000000e0 RDI: ffffffff8cd538a0
	RBP: 00000000fffc0d26 R08: 0000000000000240 R09: 0000000000000004
	R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8cd538a0
	R13: ffffd1cd00c88700 R14: ffff93613e077e00 R15: 0000000000000002
	FS:  0000000000000000(0000) GS:ffff93613e600000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00000000000000e0 CR3: 00000000010c0000 CR4: 00000000000006f0
	Kernel panic - not syncing: Fatal exception in interrupt
	Shutting down cpus with NMI
	Kernel Offset: 0xb000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
	---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

I tried reverting 1b7e816fc80e ("mm: slub: Fix slab walking for
init_on_free") or disabling init_on_free, but then again I cannot
reproduce the bug and nothing is logged.

> 
> > The code path is:
> > 	net_rx_action
> > 	  __kfree_skb_flush
> > 		kmem_cache_free_bulk()  # skbuff_head_cache
> > 		  slab_free()
> > 			do_slab_free()
> > 			  __slab_free()
> > 				free_debug_processing()
> > 				  free_consistency_check()
> > 					object_err()    # "Object already free"
> > 					  print_trailer()
> > 						print_tracking()    # !(s->flags & SLAB_STORE_USER) => return;
> > 						print_page_info()   # "INFO: Slab ..."
> > 						pr_err("INFO: Object ...", ..., get_freepointer(s, p))
> > 						  get_freepointer()
> > 						    freelist_dereference() # NULL pointer dereference
> > 
> > Enabling KASAN shows less info because the NULL pointer dereference then
> > apparently happens before reaching free_debug_processing().
> > 
> > Bisection points to the following commit: 1b7e816fc80e ("mm: slub: Fix
> > slab walking for init_on_free"), and indeed the BUG is not triggered
> > when init_on_free is disabled.
> 
> That could be either buggy SLUB code, or the commit somehow exposed a
> real bug in skbuff users.

Right. At first I thought about some incompatibility between
init_on_free and SLAB_CONSISTENCY_CHECKS, but in that case why would it
only happen with skbuff_head_cache? On the other hand, if it's a bug in
skbuff users, why is the on_freelist() check in free_consistency_check()
not detecting anything when init_on_free is disabled? 

-- 
Thibaut Sautereau
CLIP OS developer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ