linux-kernel - Re: [kernel-hardening] [PATCH v5 03/32] x86/cpa: In populate_pgd, don't set the pgd entry until it's populated

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87mvl8tn93.fsf@gmail.com>
Date:	Sat, 23 Jul 2016 16:58:16 +0200
From:	Nicolai Stange <nicstange@...il.com>
To:	Valdis.Kletnieks@...edu
Cc:	Andy Lutomirski <luto@...nel.org>,
	kernel-hardening@...ts.openwall.com, x86@...nel.org,
	linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
	Borislav Petkov <bp@...en8.de>,
	Nadav Amit <nadav.amit@...il.com>,
	Kees Cook <keescook@...omium.org>,
	Brian Gerst <brgerst@...il.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Josh Poimboeuf <jpoimboe@...hat.com>,
	Jann Horn <jann@...jh.net>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	Ingo Molnar <mingo@...nel.org>
Subject: Re: [kernel-hardening] [PATCH v5 03/32] x86/cpa: In populate_pgd, don't set the pgd entry until it's populated

Valdis.Kletnieks@...edu writes:

> On Thu, 21 Jul 2016 22:34:33 -0700, Andy Lutomirski said:
>
>> How much memory do you have and what's your config?  My code is
>> obviously buggy, but I'm wondering why neither I nor the 0day bot caught
>> this.
>
> Probably because your devel box and the 0day bot both have 4-level page
> tables and the dual-core i5 in my laptop has (presumably) 3?
>
> In any case, your patch didn't fix things, nor did (as you noted in a mail
> to Ingo) does reverting the problem commit (and then the following one that
> deletes now-dead code so it will compile cleanly).


Applying the patch directly on top of 360cb4d15567 ("x86/mm/cpa: In
populate_pgd(), don't set the PGD entry until it's populated") *does*
fix things for me.

Hardware: i7-4800MQ, 8GiB RAM, Dell Latitude E6540

FYI, the kernel panic grabbed via console=uart,io,0x3f8,... is

BUG: unable to handle kernel paging request at ffffb92ac0000fc0
IP: [<ffffffff8106b8d1>] native_set_pmd+0x1/0x10
PGD 0 
Oops: 0002 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0-rc6+ #190
Hardware name: Dell Inc. Latitude E6540/0725FP, BIOS A10 06/26/2014
task: ffffffff81e0d580 ti: ffffffff81e00000 task.ti: ffffffff81e00000
RIP: 0010:[<ffffffff8106b8d1>]  [<ffffffff8106b8d1>] native_set_pmd+0x1/0x10
RSP: 0000:ffffffff81e03c38  EFLAGS: 00010206
RAX: 00000000ff0000f3 RBX: 00000000ff000000 RCX: ffff880000000000
RDX: ffffb92ac0000fc0 RSI: 00000000ff0000f3 RDI: ffffb92ac0000fc0
RBP: ffffffff81e03c90 R08: ffff880000000fc0 R09: 0000000000000073
R10: ffff88022ede5000 R11: 0000000000000001 R12: ffffffff81e03e48
R13: 0000000001000000 R14: 0000000000000073 R15: ffff880000000018
FS:  0000000000000000(0000) GS:ffff88022ea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffb92ac0000fc0 CR3: 0000000001e06000 CR4: 00000000000406b0
Stack:
 ffffffff81e03c90 ffffffff8107217f 0000000000000073 0000000100000000
 0000000000000001 0000000000001000 ffff880000000018 0000000000001000
 ffffffff81e03e48 0000000100000000 ffffffffff2018a8 ffffffff81e03d08
Call Trace:
 [<ffffffff8107217f>] ? populate_pmd+0x11f/0x2c0
 [<ffffffff81072823>] __cpa_process_fault+0x503/0x5d0
 [<ffffffff81073223>] __change_page_attr_set_clr+0x563/0xe00
 [<ffffffff81074e6f>] kernel_map_pages_in_pgd+0x8f/0xd0
 [<ffffffff81fa5e2e>] __map_region+0x3c/0x58
 [<ffffffff81fa6064>] efi_map_region+0x31/0xca
 [<ffffffff81fa5af3>] efi_enter_virtual_mode+0x215/0x4bd
 [<ffffffff814c6289>] ? acpi_os_signal_semaphore+0x2c/0x38
 [<ffffffff814f5c4a>] ? acpi_ut_initialize_interfaces+0x62/0x67
 [<ffffffff81f84f78>] start_kernel+0x3cf/0x478
 [<ffffffff81f84120>] ? early_idt_handler_array+0x120/0x120
 [<ffffffff81f842db>] x86_64_start_reservations+0x2f/0x31
 [<ffffffff81f84429>] x86_64_start_kernel+0x14c/0x16f
Code: 89 e5 48 89 47 04 5d c3 66 90 55 48 89 e5 0f 01 f8 5d c3 0f 1f 80 00 00 00 00 55 48 89 37 48 89 e5 5d c3 0f 1f 80 00 00 00 00 55 <48> 89 37 48 89 e5 5d c3 0f 1f 80 00 00 00 00 55 48 89 37 48 89 
RIP  [<ffffffff8106b8d1>] native_set_pmd+0x1/0x10
 RSP <ffffffff81e03c38>
CR2: ffffb92ac0000fc0
---[ end trace 2f8154f277751049 ]---
Kernel panic - not syncing: Attempted to kill the idle task!
---[ end Kernel panic - not syncing: Attempted to kill the idle task!


The reason the patch didn't work for Valdis might be that there is
another issue in next-20150722 with the same symptoms (provided you
don't watch the serial console). Valdis, did you apply the provided
patch on top of next?

The "other issue" is:

RDX: 0000000000000010 RSI: 00000000000306c3 RDI: ffff88003bdea2fc
RBP: ffffffffb6e03a70 R08: ffff88003bdea000 R09: 0000000000000000
R10: ffffffffb713d3a0 R11: 0000000000000008 R12: 0000000000000020
R13: ffff88003bdea2fc R14: ffffffffb6e03a80 R15: ffffffffb6e03ea0
FS:  0000000000000000(0000) GS:ffff9208aea00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88003bdea300 CR3: 00000001dce06000 CR4: 00000000000406b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 ffffffffb6054cea 0000000000000000 0000000100000000 0000000000000001
 0000000000000000 0000000000000000 ffffffffb705c2e0 000000003fffc000
 ffffffffb6e03e90 ffffffffb6055487 ffff88003bdea2fc ffffffffb6e0d580
Call Trace:
 [<ffffffffb6054cea>] ? find_microcode_patch+0x4a/0xa0
 [<ffffffffb6055487>] load_microcode.isra.1.constprop.12+0x37/0xa0
 [<ffffffffb6036700>] ? dump_trace+0x120/0x320
 [<ffffffffb644fee8>] ? put_dec+0x18/0xa0
 [<ffffffffb645025d>] ? number+0x2ed/0x300
 [<ffffffffb6ff3ba1>] ? serial_putc+0x1e/0x2d
 [<ffffffffb6ff3b83>] ? serial8250_early_out+0x62/0x62
 [<ffffffffb654f127>] ? uart_console_write+0x57/0x70
 [<ffffffffb61094ad>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffffb6152775>] ? __module_address+0x5/0xf0
 [<ffffffffb6152872>] ? __module_text_address+0x12/0x60
 [<ffffffffb61967e4>] ? is_ftrace_trampoline+0x44/0x70
 [<ffffffffb60d68a6>] ? __kernel_text_address+0x56/0x70
 [<ffffffffb60371bb>] ? print_context_stack+0x7b/0x100
 [<ffffffffb6109695>] ? __bfs+0x25/0x280
 [<ffffffffb61967e4>] ? is_ftrace_trampoline+0x44/0x70
 [<ffffffffb6152775>] ? __module_address+0x5/0xf0
 [<ffffffffb6152872>] ? __module_text_address+0x12/0x60
 [<ffffffffb61967e4>] ? is_ftrace_trampoline+0x44/0x70
 [<ffffffffb60d68a6>] ? __kernel_text_address+0x56/0x70
 [<ffffffffb60371bb>] ? print_context_stack+0x7b/0x100
 [<ffffffffb6036700>] ? dump_trace+0x120/0x320
 [<ffffffffb644fee8>] ? put_dec+0x18/0xa0
 [<ffffffffb645025d>] ? number+0x2ed/0x300
 [<ffffffffb6ff3ba1>] ? serial_putc+0x1e/0x2d
 [<ffffffffb6ff3b83>] ? serial8250_early_out+0x62/0x62
 [<ffffffffb654f127>] ? uart_console_write+0x57/0x70
 [<ffffffffb61094ad>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffffb61094ad>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffffb689de84>] ? _raw_spin_unlock_irqrestore+0x54/0x60
 [<ffffffffb611f16d>] ? console_unlock+0x33d/0x670
 [<ffffffffb611f7a1>] ? vprintk_emit+0x301/0x5e0
 [<ffffffffb605553f>] ? collect_cpu_info_early+0x4f/0x140
 [<ffffffffb61ea845>] ? __pr_info+0x5a/0x76
 [<ffffffffb60557cd>] load_ucode_intel_ap+0x5d/0x80
 [<ffffffffb6054924>] load_ucode_ap+0x94/0xa0
 [<ffffffffb60481a8>] cpu_init+0x58/0x3e0
 [<ffffffffb60709bc>] ? set_pte_vaddr+0x5c/0x90
 [<ffffffffb6fac06c>] trap_init+0x2b6/0x328
 [<ffffffffb6fa0dba>] start_kernel+0x224/0x47f
 [<ffffffffb6fa0120>] ? early_idt_handler_array+0x120/0x120
 [<ffffffffb6fa02cf>] x86_64_start_reservations+0x29/0x2b
 [<ffffffffb6fa041e>] x86_64_start_kernel+0x14d/0x170
Code: c1 74 04 85 c2 74 e4 b8 01 00 00 00 5d c3 41 89 ca b8 01 00 00 00 41 09 d2 74 f1 85 d1 74 98 5d c3 31 c0 5d c3 90 e8 eb b1 84 00 <39> 4f 04 77 03 31 c0 c3 55 48 89 e5 e8 6a ff ff ff 5d c3 0f 1f 
RIP  [<ffffffffb6055af5>] has_newer_microcode+0x5/0x20
 RSP <ffffffffb6e03a30>
CR2: ffff88003bdea300
---[ end trace b163fd3960fd46fb ]---
Kernel panic - not syncing: Attempted to kill the idle task!
---[ end Kernel panic - not syncing: Attempted to kill the idle task!

I bisected this one to 21ef9a5c3164 ("Merge branch 'x86/microcode'"). Both
of its parents do not exhibit that behaviour.  This merge's author is
Ingo Molnar, so I added him to the CC list.


Thanks,

Nicolai