lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LSU.2.11.1604200041460.3009@eggly.anvils>
Date:	Wed, 20 Apr 2016 01:01:12 -0700 (PDT)
From:	Hugh Dickins <hughd@...gle.com>
To:	"Shi, Yang" <yang.shi@...aro.org>
cc:	Andrew Morton <akpm@...ux-foundation.org>, sfr@...b.auug.org.au,
	Hugh Dickins <hughd@...gle.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org
Subject: Re: [BUG linux-next] Kernel panic found with linux-next-20160414

On Tue, 19 Apr 2016, Shi, Yang wrote:
> Hi folks,
> 
> When I ran ltp on linux-next-20160414 on my ARM64 machine, I got the below
> kernel panic:
> 
> Unable to handle kernel paging request at virtual address ffffffc007846000
> pgd = ffffffc01e21d000
> [ffffffc007846000] *pgd=0000000000000000, *pud=0000000000000000
> Internal error: Oops: 96000047 [#11] PREEMPT SMP
> Modules linked in: loop
> CPU: 7 PID: 274 Comm: systemd-journal Tainted: G      D
> 4.6.0-rc3-next-20160414-WR8.0.0.0_standard+ #9
> Hardware name: Freescale Layerscape 2085a RDB Board (DT)
> task: ffffffc01e3fcf80 ti: ffffffc01ea8c000 task.ti: ffffffc01ea8c000
> PC is at copy_page+0x38/0x120
> LR is at migrate_page_copy+0x604/0x1660
> pc : [<ffffff9008ff2318>] lr : [<ffffff900867cdac>] pstate: 20000145
> sp : ffffffc01ea8ecd0
> x29: ffffffc01ea8ecd0 x28: 0000000000000000
> x27: 1ffffff7b80240f8 x26: ffffffc018196f20
> x25: ffffffbdc01e1180 x24: ffffffbdc01e1180
> x23: 0000000000000000 x22: ffffffc01e3fcf80
> x21: ffffffc00481f000 x20: ffffff900a31d000
> x19: ffffffbdc01207c0 x18: 0000000000000f00
> x17: 0000000000000000 x16: 0000000000000000
> x15: 0000000000000000 x14: 0000000000000000
> x13: 0000000000000000 x12: 0000000000000000
> x11: 0000000000000000 x10: 0000000000000000
> x9 : 0000000000000000 x8 : 0000000000000000
> x7 : 0000000000000000 x6 : 0000000000000000
> x5 : 0000000000000000 x4 : 0000000000000000
> x3 : 0000000000000000 x2 : 0000000000000000
> x1 : ffffffc00481f080 x0 : ffffffc007846000
> 
> Call trace:
> Exception stack(0xffffffc021fc2ed0 to 0xffffffc021fc2ff0)
> 2ec0:                                   ffffffbdc00887c0 ffffff900a31d000
> 2ee0: ffffffc021fc30f0 ffffff9008ff2318 0000000020000145 0000000000000025
> 2f00: ffffffbdc025a280 ffffffc020adc4c0 0000000041b58ab3 ffffff900a085fd0
> 2f20: ffffff9008200658 0000000000000000 0000000000000000 ffffffbdc00887c0
> 2f40: ffffff900b0f1320 ffffffc021fc3078 0000000041b58ab3 ffffff900a0864f8
> 2f60: ffffff9008210010 ffffffc021fb8960 ffffff900867bacc 1ffffff8043f712d
> 2f80: ffffffc021fc2fb0 ffffff9008210564 ffffffc021fc3070 ffffffc021fb8940
> 2fa0: 0000000008221f78 ffffff900862f9c8 ffffffc021fc2fe0 ffffff9008215dc8
> 2fc0: 1ffffff8043f8602 ffffffc021fc0000 ffffffc00968a000 ffffffc00221f080
> 2fe0: f9407e11d00001f0 d61f02209103e210
> [<ffffff9008ff2318>] copy_page+0x38/0x120
> [<ffffff900867de7c>] migrate_page+0x74/0x98
> [<ffffff90089ba418>] nfs_migrate_page+0x58/0x80
> [<ffffff900867dffc>] move_to_new_page+0x15c/0x4d8
> [<ffffff900867eec8>] migrate_pages+0x7c8/0x11f0
> [<ffffff90085f8724>] compact_zone+0xdfc/0x2570
> [<ffffff90085f9f78>] compact_zone_order+0xe0/0x170
> [<ffffff90085fb688>] try_to_compact_pages+0x2e8/0x8f8
> [<ffffff90085913a0>] __alloc_pages_direct_compact+0x100/0x540
> [<ffffff9008592420>] __alloc_pages_nodemask+0xc40/0x1c58
> [<ffffff90086887e8>] khugepaged+0x468/0x19c8
> [<ffffff9008301700>] kthread+0x248/0x2c0
> [<ffffff9008206610>] ret_from_fork+0x10/0x40
> Code: d281f012 91020021 f1020252 d503201f (a8000c02)
> 
> 
> I did some initial investigation and found it is caused by DEBUG_PAGEALLOC
> and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT. And, mainline 4.6-rc3 works well.
> 
> It should be not arch specific although I got it caught on ARM64. I suspect
> this might be caused by Hugh's huge tmpfs patches.

Thanks for testing.  It might be caused by my patches, but I don't think
that's very likely.  This is page migraton for compaction, in the service
of anon THP's khugepaged; and I wonder if you were even exercising huge
tmpfs when running LTP here (it certainly can be done: I like to mount a
huge tmpfs on /opt/ltp and install there, with shmem_huge 2 so any other
tmpfs mounts are also huge).

There are compaction changes in linux-next too, but I don't see any
reason why they'd cause this.  I don't know arm64 traces enough to know
whether it's the source page or the destination page for the copy, but
it looks as if it has been freed (and DEBUG_PAGEALLOC unmapped) before
reaching migration's copy.

Needs more debugging, I'm afraid: is it reproducible?

Hugh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ