lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 10 Mar 2011 10:39:15 +0800
From:	Lin Ming <ming.m.lin@...el.com>
To:	matthieu castet <castet.matthieu@...e.fr>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Andi Kleen <andi@...stfloor.org>,
	Siarhei Liakh <sliakh.lkml@...il.com>,
	Xuxian Jiang <jiang@...ncsu.edu>, Ingo Molnar <mingo@...e.hu>,
	Arjan van de Ven <arjan@...radead.org>,
	lkml <linux-kernel@...r.kernel.org>, tglx <tglx@...utronix.de>,
	"linux-security-module@...r.kernel.org" 
	<linux-security-module@...r.kernel.org>
Subject: Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX
 protection for kernel data)

On Thu, 2011-03-10 at 07:16 +0800, matthieu castet wrote:
> Hi,
> 
> matthieu castet a écrit :
> > matthieu castet a écrit :
> >> Lin Ming a écrit :
> >>> On Tue, 2010-11-30 at 19:27 +0800, Peter Zijlstra wrote:
> >>>> On Tue, 2010-11-30 at 13:00 +0800, Lin Ming wrote:
> >>>>> echo 0 > /sys/devices/system/cpu/cpu1/online;
> >>>>> echo 1 > /sys/devices/system/cpu/cpu1/online;
> >>>>>
> >>>>> then machine just reboots...
> >>>>>
> >> I tried to do the same thing on qemu, and the same behavior happened 
> >> (ie reboot when resuming cpu1).
> >>
> >> After enabling qemu log, I found that a triple fault was happening at 
> >> the beginning of secondary_startup_64
> >> when doing "addq phys_base(%rip), %rax".
> >>
> >> Why ?
> >> I suppose because we access data set to NX, but we don't have enabled 
> >> yet NX in the msr. So the cpu crash due to "reserved bit check".
> >>
> >> If we enable NX before reading data, there is no more crash (patch 
> >> attached).
> >>
> >> Now I am not sure this is the correct fix. I think the problem is that 
> >> trampoline using kernel page table
> >> is very dangerous. The kernel can have modified them atfer booting !
> >> May be all the paging stuff should have been done in head_64.S. A 
> >> first one with identity mapping, and the second one for
> >> the real kernel stuff.
> >>
> > Lin, could you try this patch on your x64 machine.
> > 
> > 
> I updated the patch to last tip. I tested on a 64 bits config and everything works.

I tested suspend/resume/hotplug and it works on my 64 bit notebook too.

Thanks,
Lin Ming

> 
> Any comment on it ?
> 
> 
> Thanks
> 
> 
> Matthieu
> 
> 
> From: Matthieu CASTET <castet.matthieu@...e.fr>
> Date: Thu, 10 Mar 2011 00:10:01 +0100
> Subject: [PATCH] x86 : Add NX protection for kernel data on 64 bit
> 
> This fix the cpu hotplug support, by allocating dedicated page table
> for ident mapping in trampoline.
> This is need because kernel set NX flag in level3_ident_pgt and
> level3_kernel_pgt, and made it unusable from trampoline.
> 
> We also set the Low kernel Mapping to NX.
> 
> Finaly we apply nx in free_init_pages only when we switch to NX mode in order
> to preserve large page mapping.
> 
> mapping now look like :
> ---[ Low Kernel Mapping ]---
> 0xffff880000000000-0xffff880000200000           2M     RW             GLB NX pte
> 0xffff880000200000-0xffff880001000000          14M     RW         PSE GLB NX pmd
> 0xffff880001000000-0xffff880001200000           2M     ro         PSE GLB NX pmd
> 0xffff880001200000-0xffff8800012ae000         696K     ro             GLB NX pte
> 0xffff8800012ae000-0xffff880001400000        1352K     RW             GLB NX pte
> 0xffff880001400000-0xffff880001503000        1036K     ro             GLB NX pte
> 0xffff880001503000-0xffff880001600000        1012K     RW             GLB NX pte
> 0xffff880001600000-0xffff880007e00000         104M     RW         PSE GLB NX pmd
> 0xffff880007e00000-0xffff880007ffd000        2036K     RW             GLB NX pte
> 0xffff880007ffd000-0xffff880008000000          12K                           pte
> 0xffff880008000000-0xffff880040000000         896M                           pmd
> 0xffff880040000000-0xffff888000000000         511G                           pud
> 0xffff888000000000-0xffffc90000000000       66048G                           pgd
> ---[ vmalloc() Area ]---
> [...]
> ---[ High Kernel Mapping ]---
> 0xffffffff80000000-0xffffffff81000000          16M                           pmd
> 0xffffffff81000000-0xffffffff81400000           4M     ro         PSE GLB x  pmd
> 0xffffffff81400000-0xffffffff81600000           2M     ro         PSE GLB NX pmd
> 0xffffffff81600000-0xffffffff81800000           2M     RW         PSE GLB NX pmd
> 0xffffffff81800000-0xffffffffa0000000         488M                           pmd
> ---[ Modules ]---
> 
> Signed-off-by: Matthieu CASTET <castet.matthieu@...e.fr>
> 
> Conflicts:
> 
> 	arch/x86/kernel/head_64.S
> ---
>  arch/x86/kernel/head_64.S       |   15 +++++++++++++++
>  arch/x86/kernel/trampoline_64.S |    4 ++--
>  arch/x86/mm/init.c              |    6 ++++--
>  arch/x86/mm/init_64.c           |    6 +++++-
>  4 files changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index e11e394..e261354 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -140,6 +140,9 @@ ident_complete:
>  	addq	%rbp, trampoline_level4_pgt + 0(%rip)
>  	addq	%rbp, trampoline_level4_pgt + (511*8)(%rip)
>  
> +	addq	%rbp, trampoline_level3_ident_pgt + 0(%rip)
> +	addq	%rbp, trampoline_level3_ident_pgt + (L3_START_KERNEL*8)(%rip)
> +
>  	/* Due to ENTRY(), sometimes the empty space gets filled with
>  	 * zeros. Better take a jmp than relying on empty space being
>  	 * filled with 0x90 (nop)
> @@ -395,6 +398,18 @@ NEXT_PAGE(level2_kernel_pgt)
>  NEXT_PAGE(level2_spare_pgt)
>  	.fill   512, 8, 0
>  
> +NEXT_PAGE(trampoline_level3_ident_pgt)
> +	.quad	trampoline_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
> +	.fill	L3_START_KERNEL-1,8,0
> +	.quad	trampoline_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
> +	.fill	511-L3_START_KERNEL,8,0
> +
> +
> +NEXT_PAGE(trampoline_level2_ident_pgt)
> +	/* Since I easily can, map the first 1G.
> +	 * Don't set NX because code runs from these pages.
> +	 */
> +	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
>  #undef PMDS
>  #undef NEXT_PAGE
>  
> diff --git a/arch/x86/kernel/trampoline_64.S b/arch/x86/kernel/trampoline_64.S
> index 09ff517..8723e47 100644
> --- a/arch/x86/kernel/trampoline_64.S
> +++ b/arch/x86/kernel/trampoline_64.S
> @@ -164,8 +164,8 @@ trampoline_stack:
>  	.org 0x1000
>  trampoline_stack_end:
>  ENTRY(trampoline_level4_pgt)
> -	.quad	level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
> +	.quad	trampoline_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
>  	.fill	510,8,0
> -	.quad	level3_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
> +	.quad	trampoline_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
>  
>  ENTRY(trampoline_end)
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 286d289..98dd5fa 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -338,8 +338,10 @@ void free_init_pages(char *what, unsigned long begin, unsigned long end)
>  	 * we are going to free part of that, we need to make that
>  	 * writeable and non-executable first.
>  	 */
> -	set_memory_nx(begin, (end - begin) >> PAGE_SHIFT);
> -	set_memory_rw(begin, (end - begin) >> PAGE_SHIFT);
> +	if (kernel_set_to_readonly) {
> +		set_memory_nx(begin, (end - begin) >> PAGE_SHIFT);
> +		set_memory_rw(begin, (end - begin) >> PAGE_SHIFT);
> +	}
>  
>  	printk(KERN_INFO "Freeing %s: %luk freed\n", what, (end - begin) >> 10);
>  
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 470cc47..e9bb29d 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -832,6 +832,7 @@ void mark_rodata_ro(void)
>  	unsigned long rodata_start =
>  		((unsigned long)__start_rodata + PAGE_SIZE - 1) & PAGE_MASK;
>  	unsigned long end = (unsigned long) &__end_rodata_hpage_align;
> +	unsigned long kernel_end = (((unsigned long)&__init_end + HPAGE_SIZE) & HPAGE_MASK);
>  	unsigned long text_end = PAGE_ALIGN((unsigned long) &__stop___ex_table);
>  	unsigned long rodata_end = PAGE_ALIGN((unsigned long) &__end_rodata);
>  	unsigned long data_start = (unsigned long) &_sdata;
> @@ -842,11 +843,14 @@ void mark_rodata_ro(void)
>  
>  	kernel_set_to_readonly = 1;
>  
> +	/* make low level mapping NX */
> +	set_memory_nx(PAGE_OFFSET, (PMD_PAGE_SIZE*PTRS_PER_PMD) >> PAGE_SHIFT);
> +
>  	/*
>  	 * The rodata section (but not the kernel text!) should also be
>  	 * not-executable.
>  	 */
> -	set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
> +	set_memory_nx(rodata_start, (kernel_end - rodata_start) >> PAGE_SHIFT);
>  
>  	rodata_test();
>  


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ