lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100406225925.GA3446@liondog.tnic>
Date:	Wed, 7 Apr 2010 00:59:25 +0200
From:	Borislav Petkov <bp@...en8.de>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>,
	Minchan Kim <minchan.kim@...il.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Lee Schermerhorn <Lee.Schermerhorn@...com>,
	Nick Piggin <npiggin@...e.de>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	sgunderson@...foot.com
Subject: Re: Ugly rmap NULL ptr deref oopsie on hibernate (was Linux
 2.6.34-rc3)

From: Linus Torvalds <torvalds@...ux-foundation.org>
Date: Tue, Apr 06, 2010 at 02:27:37PM -0700

> On Tue, 6 Apr 2010, Borislav Petkov wrote:
> > > So again, it's actually anon_vma.head.next that is NULL, not any of the 
> > > entries on the list itself.
> > > 
> > > Now, I can see several cases for this:
> > > 
> > >  - the obvious one: anon_vma just wasn't correctly initialized, and is 
> > >    missing a INIT_LIST_HEAD(&anon_vma->head). That's either a slab bug (we 
> > >    don't have a whole lot of coverage of constructors), or somebody 
> > >    allocated an anon_vma without using the anon_vma_cachep.
> > 
> > I've added code to verify this and am suspend/resuming now... Wait a
> > minute, Linus, you're good! :) :
> > 
> > [  873.083074] PM: Preallocating image memory... 
> > [  873.254359] NULL anon_vma->head.next, page 2182681
> 
> Yeah, I was pretty sure of that thing.
> 
> I still don't see _how_ it happens, though. That 'struct anon_vma' is very 
> simple, and contains literally just the lock and that list_head.
> 
> Now, 'head.next' is kind of magical, because it contains that magic 
> low-bit "have I been locked" thing (see "vm_lock_anon_vma()" in 
> mm/mmap.c). But I'm not seeing anything else touching it.
> 
> And if you allocate a anon_vma the proper way, the SLUB constructor should 
> have made sure that the head is initialized. And no normal list operation 
> ever sets any list pointer to zero, although a "list_del()" on the first 
> list entry could do it if that first list entry had a NULL next pointer. 
> 
> > Now, how do we track back to the place which is missing anon_vma->head
> > init? Can we use the struct page *page arg to page_referenced_anon()
> > somehow?
> 
> You might enable SLUB debugging (both SLUB_DEBUG _and_ SLUB_DEBUG_ON), and 
> then make the "object_err()" function in mm/slub.c be non-static. You 
> could call it when you see the problem, perhaps.
> 
> Or you could just add tests to both alloc_anon_vma() and free_anon_vma() 
> to check that 'list_empty(&anon_vma->head)' is true. I dunno.

Ok, I tried doing all you suggested and here's what came out. Please,
take this with a grain of salt because I'm almost falling asleep - even
the coffee is not working anymore so it could be just as well that I've
made a mistake somewhere (the new OOPS is a #GP, by the way), just
watch:

Source changes locally:

--
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 4884462..0c11dfb 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -108,6 +108,8 @@ unsigned int kmem_cache_size(struct kmem_cache *);
 const char *kmem_cache_name(struct kmem_cache *);
 int kmem_ptr_validate(struct kmem_cache *cachep, const void *ptr);
 
+void object_err(struct kmem_cache *s, struct page *page, u8 *object, char *reason);
+
 /*
  * Please use this macro to create slab caches. Simply specify the
  * name of the structure and maybe some flags that are listed above.
diff --git a/mm/rmap.c b/mm/rmap.c
index eaa7a09..7b35b3f 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -66,11 +66,24 @@ static struct kmem_cache *anon_vma_chain_cachep;
 
 static inline struct anon_vma *anon_vma_alloc(void)
 {
-	return kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
+	struct anon_vma *ret;
+	ret = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
+
+	if (!ret->head.next) {
+		printk("%s NULL anon_vma->head.next\n", __func__);
+		dump_stack();
+	}
+
+	return ret;
 }
 
 void anon_vma_free(struct anon_vma *anon_vma)
 {
+	if (!anon_vma->head.next) {
+		printk("%s NULL anon_vma->head.next\n", __func__);
+		dump_stack();
+	}
+
 	kmem_cache_free(anon_vma_cachep, anon_vma);
 }
 
@@ -494,6 +507,18 @@ static int page_referenced_anon(struct page *page,
 		return referenced;
 
 	mapcount = page_mapcount(page);
+
+	if (!anon_vma->head.next) {
+		printk(KERN_ERR "NULL anon_vma->head.next, page %lu\n",
+				page_to_pfn(page));
+
+		object_err(anon_vma_cachep, page, (u8 *)anon_vma, "NULL next");
+
+		dump_stack();
+
+		return referenced;
+	}
+
 	list_for_each_entry(avc, &anon_vma->head, same_anon_vma) {
 		struct vm_area_struct *vma = avc->vma;
 		unsigned long address = vma_address(page, vma);
diff --git a/mm/slub.c b/mm/slub.c
index b364844..bcf5416 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -477,7 +477,7 @@ static void print_trailer(struct kmem_cache *s, struct page *page, u8 *p)
 	dump_stack();
 }
 
-static void object_err(struct kmem_cache *s, struct page *page,
+void object_err(struct kmem_cache *s, struct page *page,
 			u8 *object, char *reason)
 {
 	slab_bug(s, "%s", reason);

---

do the same exercise of starting several guests and then shutting them
down, and hibernating at the same time. After having shutdown the
guests, start firefox and let it load a big html page and hibernate
while doing so, boom!

[  269.104940] Freezing user space processes ... (elapsed 0.03 seconds) done.
[  269.141953] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
[  269.155115] PM: Preallocating image memory... 
[  269.423811] general protection fault: 0000 [#1] PREEMPT SMP 
[  269.424003] last sysfs file: /sys/power/state
[  269.424003] CPU 0 
[  269.424003] Modules linked in: powernow_k8 cpufreq_ondemand cpufreq_powersave cpufreq_userspace freq_table cpufreq_co
nservative binfmt_misc kvm_amd kvm ipv6 vfat fat dm_crypt dm_mod ohci_hcd pcspkr edac_core k10temp 8250_pnp 8250 serial_
core
[  269.424003] 
[  269.424003] Pid: 2617, comm: hib.sh Tainted: G        W  2.6.34-rc3-00288-gab195c5-dirty #4 M3A78 PRO/System Product 
Name
[  269.424003] RIP: 0010:[<ffffffff810c0cb4>]  [<ffffffff810c0cb4>] page_referenced+0x147/0x232
[  269.424003] RSP: 0018:ffff88022a1218b8  EFLAGS: 00010246
[  269.424003] RAX: ffff8802126fa468 RBX: ffffea000700b210 RCX: 0000000000000000
[  269.424003] RDX: ffff8802126fa429 RSI: ffff8802126fa440 RDI: ffff88022dc3cb80
[  269.424003] RBP: ffff88022a121938 R08: 0000000000000002 R09: 0000000000000000
[  269.424003] R10: 0000000000000246 R11: ffff88021a030478 R12: 0000000000000000
[  269.424003] R13: 002e2e2e002e2e0e R14: ffff8802126fa428 R15: ffff88022a121a00
[  269.424003] FS:  00007fe2799796f0(0000) GS:ffff88000a000000(0000) knlGS:0000000000000000
[  269.424003] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  269.424003] CR2: 00007fffdefb3880 CR3: 00000002171c0000 CR4: 00000000000006f0
[  269.424003] DR0: 0000000000000090 DR1: 00000000000000a4 DR2: 00000000000000ff
[  269.424003] DR3: 000000000000000f DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  269.424003] Process hib.sh (pid: 2617, threadinfo ffff88022a120000, task ffff88022dc3cb80)
[  269.424003] Stack:
[  269.424003]  ffff8802126fa468 00000000813f8cfc ffffffff8165ae28 00000000000042e7
[  269.424003] <0> ffff88022a1218f8 ffffffff810c6051 ffffea0006f968c8 ffffea0006f968c8
[  269.424003] <0> ffff88022a121938 00000002810ab275 0000000006f96890 ffffea000700b238
[  269.424003] Call Trace:
[  269.424003]  [<ffffffff810c6051>] ? swapcache_free+0x37/0x3c
[  269.424003]  [<ffffffff810ab79a>] shrink_page_list+0x14a/0x477
[  269.424003]  [<ffffffff813f8c36>] ? _raw_spin_unlock_irq+0x30/0x58
[  269.424003]  [<ffffffff810abe1e>] shrink_inactive_list+0x357/0x5e5
[  269.424003]  [<ffffffff810ac3b6>] shrink_zone+0x30a/0x3d4
[  269.424003]  [<ffffffff810acf91>] do_try_to_free_pages+0x176/0x27f
[  269.424003]  [<ffffffff810ad12f>] shrink_all_memory+0x95/0xc4
[  269.424003]  [<ffffffff810aa634>] ? isolate_pages_global+0x0/0x1f0
[  269.424003]  [<ffffffff81076e64>] ? count_data_pages+0x65/0x79
[  269.424003]  [<ffffffff810770cb>] hibernate_preallocate_memory+0x1aa/0x2cb
[  269.424003]  [<ffffffff813f5135>] ? printk+0x41/0x44
[  269.424003]  [<ffffffff81075a6b>] hibernation_snapshot+0x36/0x1e1
[  269.424003]  [<ffffffff81075ce4>] hibernate+0xce/0x172
[  269.424003]  [<ffffffff81074a51>] state_store+0x5c/0xd3
[  269.424003]  [<ffffffff81185097>] kobj_attr_store+0x17/0x19
[  269.424003]  [<ffffffff81125edb>] sysfs_write_file+0x108/0x144
[  269.424003]  [<ffffffff810d57a7>] vfs_write+0xb2/0x153
[  269.424003]  [<ffffffff81063bf1>] ? trace_hardirqs_on_caller+0x1f/0x14b
[  269.424003]  [<ffffffff810d590b>] sys_write+0x4a/0x71
[  269.424003]  [<ffffffff810021db>] system_call_fastpath+0x16/0x1b
[  269.424003] Code: 3b 56 10 73 1e 48 83 fa f2 74 18 48 8d 4d cc 4d 89 f8 48 89 df e8 1e f2 ff ff 41 01 c4 83 7d cc 00 
74 19 4d 8b 6d 20 49 83 ed 20 <49> 8b 45 20 0f 18 08 49 8d 45 20 48 39 45 80 75 aa 4c 89 f7 e8 
[  269.424003] RIP  [<ffffffff810c0cb4>] page_referenced+0x147/0x232
[  269.424003]  RSP <ffff88022a1218b8>
[  269.438405] ---[ end trace ad5b4172ee94398e ]---
[  269.438553] note: hib.sh[2617] exited with preempt_count 2
[  269.438709] BUG: scheduling while atomic: hib.sh/2617/0x10000003
[  269.438858] INFO: lockdep is turned off.
[  269.439075] Modules linked in: powernow_k8 cpufreq_ondemand cpufreq_powersave cpufreq_userspace freq_table cpufreq_co
nservative binfmt_misc kvm_amd kvm ipv6 vfat fat dm_crypt dm_mod ohci_hcd pcspkr edac_core k10temp 8250_pnp 8250 serial_core
[  269.440875] Pid: 2617, comm: hib.sh Tainted: G      D W  2.6.34-rc3-00288-gab195c5-dirty #4
[  269.441137] Call Trace:
[  269.441288]  [<ffffffff81063107>] ? __debug_show_held_locks+0x1b/0x24
[  269.441440]  [<ffffffff8102d3c0>] __schedule_bug+0x72/0x77
[  269.441590]  [<ffffffff813f553e>] schedule+0xd9/0x730
[  269.441741]  [<ffffffff8103022c>] __cond_resched+0x18/0x24
[  269.441891]  [<ffffffff813f5c62>] _cond_resched+0x2c/0x37
[  269.442045]  [<ffffffff810b7d7d>] unmap_vmas+0x6ce/0x893
[  269.442205]  [<ffffffff810bc42f>] exit_mmap+0xd7/0x182
[  269.442352]  [<ffffffff81035951>] mmput+0x48/0xb9
[  269.442502]  [<ffffffff81039c21>] exit_mm+0x110/0x11d
[  269.442652]  [<ffffffff8103b663>] do_exit+0x1c5/0x691
[  269.442802]  [<ffffffff81038d0d>] ? kmsg_dump+0x13b/0x155
[  269.442953]  [<ffffffff810060db>] ? oops_end+0x47/0x93
[  269.443107]  [<ffffffff81006122>] oops_end+0x8e/0x93
[  269.443262]  [<ffffffff81006313>] die+0x5a/0x63
[  269.443414]  [<ffffffff81003eaf>] do_general_protection+0x134/0x13c
[  269.443566]  [<ffffffff813f90f0>] ? irq_return+0x0/0x2
[  269.443716]  [<ffffffff813f92cf>] general_protection+0x1f/0x30
[  269.443867]  [<ffffffff810c0cb4>] ? page_referenced+0x147/0x232
[  269.444021]  [<ffffffff810c0bf0>] ? page_referenced+0x83/0x232
[  269.444176]  [<ffffffff810c6051>] ? swapcache_free+0x37/0x3c
[  269.444328]  [<ffffffff810ab79a>] shrink_page_list+0x14a/0x477
[  269.444479]  [<ffffffff813f8c36>] ? _raw_spin_unlock_irq+0x30/0x58
[  269.444630]  [<ffffffff810abe1e>] shrink_inactive_list+0x357/0x5e5
[  269.444782]  [<ffffffff810ac3b6>] shrink_zone+0x30a/0x3d4
[  269.444933]  [<ffffffff810acf91>] do_try_to_free_pages+0x176/0x27f
[  269.445087]  [<ffffffff810ad12f>] shrink_all_memory+0x95/0xc4
[  269.445243]  [<ffffffff810aa634>] ? isolate_pages_global+0x0/0x1f0
[  269.445396]  [<ffffffff81076e64>] ? count_data_pages+0x65/0x79
[  269.445547]  [<ffffffff810770cb>] hibernate_preallocate_memory+0x1aa/0x2cb
[  269.445698]  [<ffffffff813f5135>] ? printk+0x41/0x44
[  269.445848]  [<ffffffff81075a6b>] hibernation_snapshot+0x36/0x1e1
[  269.445999]  [<ffffffff81075ce4>] hibernate+0xce/0x172
[  269.446160]  [<ffffffff81074a51>] state_store+0x5c/0xd3
[  269.446307]  [<ffffffff81185097>] kobj_attr_store+0x17/0x19
[  269.446457]  [<ffffffff81125edb>] sysfs_write_file+0x108/0x144
[  269.446607]  [<ffffffff810d57a7>] vfs_write+0xb2/0x153
[  269.446757]  [<ffffffff81063bf1>] ? trace_hardirqs_on_caller+0x1f/0x14b
[  269.446908]  [<ffffffff810d590b>] sys_write+0x4a/0x71
[  269.447063]  [<ffffffff810021db>] system_call_fastpath+0x16/0x1b


This time we have

[  269.424003] RIP: 0010:[<ffffffff810c0cb4>]  [<ffffffff810c0cb4>] page_referenced+0x147/0x232 

which is offset 0x1104.

which is

    10eb:       48 89 df                mov    %rbx,%rdi
    10ee:       e8 00 00 00 00          callq  10f3 <page_referenced+0x136>
    10f3:       41 01 c4                add    %eax,%r12d
    10f6:       83 7d cc 00             cmpl   $0x0,-0x34(%rbp)
    10fa:       74 19                   je     1115 <page_referenced+0x158>
    10fc:       4d 8b 6d 20             mov    0x20(%r13),%r13
    1100:       49 83 ed 20             sub    $0x20,%r13
    1104:       49 8b 45 20             mov    0x20(%r13),%rax			<-------------------------
    1108:       0f 18 08                prefetcht0 (%rax)
    110b:       49 8d 45 20             lea    0x20(%r13),%rax
    110f:       48 39 45 80             cmp    %rax,-0x80(%rbp)
    1113:       75 aa                   jne    10bf <page_referenced+0x102>
    1115:       4c 89 f7                mov    %r14,%rdi

and asm is

	.loc 1 522 0
	movq	32(%r13), %r13	# <variable>.same_anon_vma.next, __mptr.454
.LVL295:
	subq	$32, %r13	#, avc
.LVL296:
.L186:
.LBE1224:
	movq	32(%r13), %rax	# <variable>.same_anon_vma.next, <variable>.same_anon_vma.next		<--------------
	prefetcht0	(%rax)	# <variable>.same_anon_vma.next
	leaq	32(%r13), %rax	#, tmp104
	cmpq	%rax, -128(%rbp)	# tmp104, %sfp
	jne	.L189	#,
.L188:
	.loc 1 540 0
	movq	%r14, %rdi	# anon_vma,
	call	page_unlock_anon_vma	#

and %r13 contains some funny stuff, could be some mangled SLUB debug
poison or something: R13: 002e2e2e002e2e0e. Maybe this is the reason for
the #GP.

But yes, even if the oopsing instruction is

movq	32(%r13), %rax	# <variable>.same_anon_vma.next, <variable>.same_anon_vma.next

this is not same_anon_vma.next because we've come to the above
instruction through the ".L186:" label, before which we have %r13
already loaded with anon_vma->head.next.

To be continued...

-- 
Regards/Gruss,
    Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ