lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 22 May 2014 17:08:09 +0200
From:	Vlastimil Babka <vbabka@...e.cz>
To:	Dave Jones <davej@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	linux-mm@...ck.org, Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: 3.15.0-rc6: VM_BUG_ON_PAGE(PageTail(page), page)

On 05/22/2014 03:58 PM, Dave Jones wrote:
> Not sure if Sasha has already reported this on -next (It's getting hard
> to keep track of all the VM bugs he's been finding), but I hit this overnight
> on .15-rc6.  First time I've seen this one.
>
>
> page:ffffea0004599800 count:0 mapcount:0 mapping:          (null) index:0x2
> page flags: 0x20000000008000(tail)
> ------------[ cut here ]------------
> kernel BUG at include/linux/page-flags.h:415!
> invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> CPU: 1 PID: 6858 Comm: trinity-c42 Not tainted 3.15.0-rc6+ #216
> task: ffff88012d18e900 ti: ffff88009e87a000 task.ti: ffff88009e87a000
> RIP: 0010:[<ffffffffbb718d98>]  [<ffffffffbb718d98>] PageTransHuge.part.23+0xb/0xd
> RSP: 0000:ffff88009e87b940  EFLAGS: 00010246
> RAX: 0000000000000001 RBX: 0000000000116660 RCX: 0000000000000006
> RDX: 0000000000000000 RSI: ffffffffbb0c00f8 RDI: ffffffffbb0bfed2
> RBP: ffff88009e87b940 R08: ffffffffbc01203c R09: 00000000000003da
> R10: 00000000000003d9 R11: 0000000000000003 R12: 0000000000000001
> R13: 0000000000116800 R14: ffff88024d64ce00 R15: ffffea0004599800
> FS:  00007f4fd192e740(0000) GS:ffff88024d040000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000004c00000 CR3: 00000000a19ce000 CR4: 00000000001407e0
> DR0: 00000000024f4000 DR1: 0000000001d43000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> Stack:
>   ffff88009e87b9e8 ffffffffbb1728a3 ffff88009e87b9e8 ffff88009e87baa8
>   ffff88012d18e900 ffff88009e87ba60 0000000000000000 0000000400000016
>   0000000000000000 ffff88009e87bfd8 00000000000008b3 ffff88009e87ba50
> Call Trace:
>   [<ffffffffbb1728a3>] isolate_migratepages_range+0x7a3/0x870
>   [<ffffffffbb172d90>] compact_zone+0x370/0x560
>   [<ffffffffbb173022>] compact_zone_order+0xa2/0x110
>   [<ffffffffbb1733f1>] try_to_compact_pages+0x101/0x130
>   [<ffffffffbb71861b>] __alloc_pages_direct_compact+0xac/0x1d0
>   [<ffffffffbb15760b>] __alloc_pages_nodemask+0x6ab/0xaf0
>   [<ffffffffbb19c9ea>] alloc_pages_vma+0x9a/0x160
>   [<ffffffffbb1aef0d>] do_huge_pmd_anonymous_page+0xfd/0x3c0
>   [<ffffffffbb0a19cd>] ? get_parent_ip+0xd/0x50
>   [<ffffffffbb17ac18>] handle_mm_fault+0x158/0xcb0
>   [<ffffffffbb72594d>] ? retint_restore_args+0xe/0xe
>   [<ffffffffbb728bb6>] __do_page_fault+0x1a6/0x620
>   [<ffffffffbb11011e>] ? __acct_update_integrals+0x8e/0x120
>   [<ffffffffbb0a19cd>] ? get_parent_ip+0xd/0x50
>   [<ffffffffbb72949b>] ? preempt_count_sub+0x6b/0xf0
>   [<ffffffffbb72904e>] do_page_fault+0x1e/0x70
> Code: 75 1d 55 be 6c 00 00 00 48 c7 c7 8a 2f a2 bb 48 89 e5 e8 6c 49 95 ff 5d c6 05 74 16 65 00 01 c3 55 31 f6 48 89 e5 e8 28 bd a3 ff <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 57 45 31 ff 41 56 49 89 fe
> RIP  [<ffffffffbb718d98>]
>
> That BUG is..
>
> 413 static inline int PageTransHuge(struct page *page)
> 414 {
> 415         VM_BUG_ON_PAGE(PageTail(page), page);
> 416         return PageHead(page);
> 417 }

Any idea which of the two PageTransHuge() calls in 
isolate_migratepages_range() that is? Offset far in the function suggest 
it's where the lru lock is already held, but I'm not sure as decodecode 
of your dump and objdump of my own compile look widely different.

If it's indeed the later PageTransHuge() call, it means that somebody 
else has cleared PageLRU and set PageTail (I don't think a page could 
have both at once) between the checks for PageLRU() and PageTransHuge() 
in isolate_migratepages_range(), while the latter was holding lru_lock. 
That's quite weird...

Vlastimil

> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists