lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Wed, 16 Jan 2013 14:30:45 +0000
From:	Matt Fleming <matt.fleming@...el.com>
To:	Nathan Zimmer <nzimmer@....com>
Cc:	x86@...nel.org, linux-efi@...r.kernel.org,
	linux-kernel@...r.kernel.org, stable@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, Robin Holt <holt@....com>,
	Matthew Garrett <mjg59@...f.ucam.org>,
	Jan Beulich <JBeulich@...e.com>
Subject: Re: [PATCH v2] efi, x86: Pass a proper identity mapping in
 efi_call_phys_prelog

(Sorry for the delay in replying to this)

On Tue, 2013-01-08 at 09:02 -0600, Nathan Zimmer wrote:
> Update efi_call_phys_prelog to install an identity mapping of all available
> memory.  This corrects a bug on very large systems with more then 512 GB in
> which bios would not be able to access addresses above not in the mapping.
> 
> The result is a crash that looks much like this.
> 
> BUG: unable to handle kernel paging request at 000000effd870020
> IP: [<0000000078bce331>] 0x78bce330
> PGD 0
> Oops: 0000 [#1] SMP
> Modules linked in:
> CPU 0
> Pid: 0, comm: swapper/0 Tainted: G        W    3.8.0-rc1-next-20121224-medusa_ntz+ #2 Intel Corp. Stoutland Platform
> RIP: 0010:[<0000000078bce331>]  [<0000000078bce331>] 0x78bce330
> RSP: 0000:ffffffff81601d28  EFLAGS: 00010006
> RAX: 0000000078b80e18 RBX: 0000000000000004 RCX: 0000000000000004
> RDX: 0000000078bcf958 RSI: 0000000000002400 RDI: 8000000000000000
> RBP: 0000000078bcf760 R08: 000000effd870000 R09: 0000000000000000
> R10: 0000000000000000 R11: 00000000000000c3 R12: 0000000000000030
> R13: 000000effd870000 R14: 0000000000000000 R15: ffff88effd870000
> FS:  0000000000000000(0000) GS:ffff88effe400000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000effd870020 CR3: 000000000160c000 CR4: 00000000000006b0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff81614400)
> Stack:
>  0000000078b80d18 0000000000000004 0000000078bced7b ffff880078b81fff
>  0000000000000000 0000000000000082 0000000078bce3a8 0000000000002400
>  0000000060000202 0000000078b80da0 0000000078bce45d ffffffff8107cb5a
> Call Trace:
>  [<ffffffff8107cb5a>] ? on_each_cpu+0x77/0x83
>  [<ffffffff8102f4eb>] ? change_page_attr_set_clr+0x32f/0x3ed
>  [<ffffffff81035946>] ? efi_call4+0x46/0x80
>  [<ffffffff816c5abb>] ? efi_enter_virtual_mode+0x1f5/0x305
>  [<ffffffff816aeb24>] ? start_kernel+0x34a/0x3d2
>  [<ffffffff816ae5ed>] ? repair_env_string+0x60/0x60
>  [<ffffffff816ae2be>] ? x86_64_start_reservations+0xba/0xc1
>  [<ffffffff816ae120>] ? early_idt_handlers+0x120/0x120
>  [<ffffffff816ae419>] ? x86_64_start_kernel+0x154/0x163
> Code:  Bad RIP value.
> RIP  [<0000000078bce331>] 0x78bce330
>  RSP <ffffffff81601d28>
> CR2: 000000effd870020
> ---[ end trace ead828934fef5eab ]---
> 
> v2: I noticed a typo on the one sent friday, also ccing the stable tree.
> 
> 
> Cc: stable@...r.kernel.org
> Cc: Matt Fleming <matt.fleming@...el.com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: "H. Peter Anvin" <hpa@...or.com>
> Signed-off-by: Nathan Zimmer <nzimmer@....com>
> Signed-off-by: Robin Holt <holt@....com>
> ---
>  arch/x86/platform/efi/efi_64.c |   22 +++++++++++++++++-----
>  1 files changed, 17 insertions(+), 5 deletions(-)

The last time that the topic of introducing a proper identity pagetable
came up was here,

	https://lkml.org/lkml/2012/10/5/253

mainly in the context of some firmware requiring physical mappings for
I/O devices to persist even after we've called SetVirtualAddressMap().
It turns out that there's a bunch of places in the kernel that want this
identity mapping - the one you've pointed out being yet another.

> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index 95fd505..2b20038 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -38,7 +38,7 @@
>  #include <asm/cacheflush.h>
>  #include <asm/fixmap.h>
>  
> -static pgd_t save_pgd __initdata;
> +static pgd_t *save_pgd __initdata;
>  static unsigned long efi_flags __initdata;
>  
>  static void __init early_code_mapping_set_exec(int executable)
> @@ -61,12 +61,20 @@ static void __init early_code_mapping_set_exec(int executable)
>  void __init efi_call_phys_prelog(void)
>  {
>  	unsigned long vaddress;
> +	int pgd;
> +	int n_pgds;
>  
>  	early_code_mapping_set_exec(1);
>  	local_irq_save(efi_flags);
> -	vaddress = (unsigned long)__va(0x0UL);
> -	save_pgd = *pgd_offset_k(0x0UL);
> -	set_pgd(pgd_offset_k(0x0UL), *pgd_offset_k(vaddress));
> +
> +	n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT), PGDIR_SIZE);
> +	save_pgd = kmalloc(n_pgds * sizeof(pgd_t), GFP_KERNEL);

There's potential for this to be called under a spinlock, e.g. from
phys_efi_get_time(). Maybe this should be GFP_ATOMIC?

> +	for (pgd = 0; pgd < n_pgds; pgd++) {
> +		save_pgd[pgd] = *pgd_offset_k(pgd * PGDIR_SIZE);
> +		vaddress = (unsigned long)__va(pgd * PGDIR_SIZE);
> +		set_pgd(pgd_offset_k(pgd * PGDIR_SIZE), *pgd_offset_k(vaddress));
> +	}
>  	__flush_tlb_all();
>  }
>  
> @@ -75,7 +83,11 @@ void __init efi_call_phys_epilog(void)
>  	/*
>  	 * After the lock is released, the original page table is restored.
>  	 */
> -	set_pgd(pgd_offset_k(0x0UL), save_pgd);
> +	int pgd;
> +	int n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT) , PGDIR_SIZE);
> +	for (pgd = 0; pgd < n_pgds; pgd++)
> +		set_pgd(pgd_offset_k(pgd * PGDIR_SIZE), save_pgd[pgd]);
> +	kfree(save_pgd);
>  	__flush_tlb_all();
>  	local_irq_restore(efi_flags);
>  	early_code_mapping_set_exec(0);

I initially wrote a patch that was a little like this but Peter
suggested making use of trampoline_pgd on x86-64 and switching between
pagetables with CR3. While that's certainly the approach we should take
going forward, this patch does look pretty trivial (making it suitable
for stable) and it fixes a crash.

So unless anyone's got an objection I'm happy to queue this up.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ