lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 26 Aug 2020 08:44:15 +0800
From:   Rong Chen <rong.a.chen@...el.com>
To:     Qian Cai <cai@....pw>
Cc:     Catalin Marinas <catalin.marinas@....com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Michal Hocko <mhocko@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org
Subject: Re: [mm] c566586818:
 BUG:kernel_hang_in_early-boot_stage,last_printk:Probing_EDD(edd=off_to_disable)...ok



On 8/24/20 8:29 PM, Qian Cai wrote:
> On Mon, Aug 24, 2020 at 10:47:20AM +0800, Rong Chen wrote:
>>
>> On 8/21/20 9:01 AM, Qian Cai wrote:
>>> On Tue, Aug 18, 2020 at 08:23:51AM +0800, kernel test robot wrote:
>>>> Greeting,
>>>>
>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>
>>>> commit: c5665868183fec689dbab9fb8505188b2c4f0757 ("mm: kmemleak: use the memory pool for early allocations")
>>> I might see one of those early boot failure before. In my case, the bare-metal
>>> system was reset. Can you try to narrow down to a smaller
>>> CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE (assume 0 works if your bisecting was
>>> correct) that works?
>> Hi Qian,
>>
>> Adding CONFIG_EARLY_PRINTK=y to the kconfig file, and the boot hangs in the
>> below position:
>>
>> [    0.715834] Kernel command line: root=/dev/ram0 hung_task_panic=1 debug
>> apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100
>> net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1
>> nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0
>> drbd.minor_count=8
>> systemd.log_level=err ignore_loglevel console=tty0 earlyprintk=ttyS0,115200
>> console=ttyS0,115200 vga=normal rw rcuperf.shutdown=0 watchdog_thresh=60
>> [    0.719688] sysrq: sysrq always enabled.
>> [    0.801005] Dentry cache hash table entries: 2097152 (order: 12, 16777216
>> bytes, linear)
>> [    0.805588] Inode-cache hash table entries: 1048576 (order: 11, 8388608
>> bytes, linear)
>> [    0.806464] mem auto-init: stack:off, heap alloc:on, heap free:off
>> [    1.080978] Memory: 12319196K/12680692K available (10243K kernel code,
>> 2414K rwdata, 8184K rodata, 856K init, 20772K bss, 361496K reserved, 0K
>> cma-reserved)
>> qemu-system-x86_64: terminating on signal 2
>>
>> The problem disappeared if CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=400:
> Interesting. Can you paste the line as show:
>
> ./scripts/faddr2line vmlinux lookup_address_in_pgd+0xd1/0x158
>
> Also, does this happens on the latest mainline or linux-next? Looks like you
> were reproducing using v5.3.

Hi Qian,

I rebuilt the kernel on commit c566586818 but the error changed to "RIP: 
0010:clear_page_orig+0x12/0x40",
and the error can be reproduced on parent commit:

[    0.539811] Memory: 12325340K/12680692K available (10243K kernel 
code, 2414K rwdata, 8188K rodata, 856K init, 14628K bss, 355352K 
reserved, 0K cma-reserved)
[    4.133400] BUG: unable to handle page fault for address: 
ffff88833653e000
[    4.134130] #PF: supervisor write access in kernel mode
[    4.134694] #PF: error_code(0x0002) - not-present page
[    4.135177] PGD 3800067 P4D 3800067 PUD f000e6f2f000d445 PMD 0
[    4.135730] Thread overran stack, or stack corrupted
[    4.136192] Oops: 0002 [#1] DEBUG_PAGEALLOC PTI
[    4.136609] CPU: 0 PID: 0 Comm: swapper Not tainted 
5.3.0-11792-gc5665868183fe #1
[    4.137300] RIP: 0010:clear_page_orig+0x12/0x40
[    4.137732] Code: 03 00 00 00 b0 01 5b c3 b9 00 02 00 00 31 c0 f3 48 
ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00 00 00 00 ff 
c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89 47
[    4.139453] RSP: 0000:ffffffff8239d8e8 EFLAGS: 00010016
[    4.139939] RAX: 0000000000000000 RBX: 0000000000000101 RCX: 
000000000000003f
[    4.140602] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 
ffff88833653e000
[    4.141261] RBP: ffffea000cd94f80 R08: ffffffff82427800 R09: 
ffffea000cd94f80
[    4.141956] R10: 0000160000000000 R11: ffff888000000000 R12: 
0000000000000000
[    4.142642] R13: 0000000000000001 R14: 0000000000092000 R15: 
0000000000000046
[    4.143298] FS:  0000000000000000(0000) GS:ffffffff8243d000(0000) 
knlGS:0000000000000000
[    4.144076] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.144661] CR2: ffff88833653e000 CR3: 0000000002420000 CR4: 
00000000000006b0
[    4.145382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[    4.146121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[    4.146829] Call Trace:
[    4.147066] Modules linked in:
[    4.147359] CR2: ffff88833653e000
[    4.147757] random: get_random_bytes called from 
init_oops_id+0x1d/0x2c with crng_init=0

$ ./scripts/faddr2line vmlinux clear_page_orig+0x12/0x40
clear_page_orig+0x12/0x40:
clear_page_orig at arch/x86/lib/clear_page_64.S:31


but I also can reproduced the lookup_address_in_pgd error in v5.9-rc2 
with attached config file:

[    0.382789] Memory: 12313044K/12680692K available (10242K kernel 
code, 2658K rwdata, 8916K rodata, 800K init, 24540K bss, 367392K 
reserved, 0K cma-reserved)
[    4.027977] general protection fault, probably for non-canonical 
address 0xf0006f7280000d98: 0000 [#1] DEBUG_PAGEALLOC PTI
[    4.029094] CPU: 0 PID: 0 Comm: swapper Not tainted 5.9.0-rc2 #1
[    4.029741] RIP: 0010:lookup_address_in_pgd+0x7c/0xcc
[    4.030341] Code: 00 00 48 3d 81 00 00 00 74 6c 4c 89 df e8 9d f2 ff 
ff 48 f7 d0 4c 21 d8 a8 01 74 5a 4c 89 d6 4c 89 df e8 fd f5 ff ff 49 89 
c0 <48> f7 00 9f ff ff ff 74 93 41 c7 01 02 00 00 00 48 8b 08 48 89 cf
[    4.032205] RSP: 0000:ffffffff82453a08 EFLAGS: 00010082
[    4.032716] RAX: f0006f7280000d98 RBX: 0000000000000001 RCX: 
f000e6f280000000
[    4.033569] RDX: ffff888000000000 RSI: ffff888000000d98 RDI: 
f000e6f2f000d400
[    4.034474] RBP: ffffffff82453b28 R08: f0006f7280000d98 R09: 
ffffffff82453a48
[    4.035125] R10: ffff88833664c000 R11: f000e6f2f000d445 R12: 
ffff88833664c000
[    4.035836] R13: 0000000000000001 R14: ffff888000000000 R15: 
ffffffff827806b8
[    4.036575] FS:  0000000000000000(0000) GS:ffffffff82641000(0000) 
knlGS:0000000000000000
[    4.037389] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.037961] CR2: ffff8883447ff000 CR3: 0000000002622000 CR4: 
00000000000006b0
[    4.038677] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[    4.039388] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[    4.040243] Call Trace:
[    4.040552] Modules linked in:
[    4.041033] random: get_random_bytes called from 
init_oops_id+0x1d/0x2c with crng_init=0

$ ./scripts/faddr2line vmlinux lookup_address_in_pgd+0x7c/0xcc
lookup_address_in_pgd+0x7c/0xcc:
lookup_address_in_pgd at arch/x86/mm/pat/set_memory.c:604
(inlined by) lookup_address_in_pgd at arch/x86/mm/pat/set_memory.c:575


Best Regards,
Rong Chen

View attachment "config" of type "text/plain" (129504 bytes)

View attachment "reproduce" of type "text/plain" (790 bytes)

Powered by blists - more mailing lists