lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <540B2B72.7090600@canonical.com>
Date:	Sat, 06 Sep 2014 17:42:42 +0200
From:	Stefan Bader <stefan.bader@...onical.com>
To:	David Vrabel <david.vrabel@...rix.com>,
	"xen-devel@...ts.xensource.com" <xen-devel@...ts.xensource.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
CC:	Kees Cook <keescook@...omium.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Boris Ostrovsky <boris.ostrovsky@...cle.com>
Subject: Re: [Xen-devel] [PATCH] x86/xen: Fix 64bit kernel pagetable setup
 of PV guests

On 02.09.2014 13:01, David Vrabel wrote:
> On 01/09/14 18:34, David Vrabel wrote:
>> On 29/08/14 16:17, Stefan Bader wrote:
>>>
>>> This change might not be the fully correct approach as it basically
>>> removes the pre-set page table entry for the fixmap that is compile
>>> time set (level2_fixmap_pgt[506]->level1_fixmap_pgt). For one the
>>> level1 page table is not yet declared in C headers (that might be
>>> fixed). But also with the current bug, it was removed, too. Since
>>> the Xen mappings for level2_kernel_pgt only covered kernel + initrd
>>> and some Xen data this did never reach that far. And still, something
>>> does create entries at level2_fixmap_pgt[506..507]. So it should be
>>> ok. At least I was able to successfully boot a kernel with 1G kernel
>>> image size without any vmalloc whinings.
>> [...]
>>> --- a/arch/x86/xen/mmu.c
>>> +++ b/arch/x86/xen/mmu.c
>>> @@ -1902,8 +1902,22 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
>>>  		/* L3_i[0] -> level2_ident_pgt */
>>>  		convert_pfn_mfn(level3_ident_pgt);
>>>  		/* L3_k[510] -> level2_kernel_pgt
>>> -		 * L3_i[511] -> level2_fixmap_pgt */
>>> +		 * L3_k[511] -> level2_fixmap_pgt */
>>>  		convert_pfn_mfn(level3_kernel_pgt);
>>> +
>>> +		/* level2_fixmap_pgt contains a single entry for the
>>> +		 * fixmap area at offset 506. The correct way would
>>> +		 * be to convert level2_fixmap_pgt to mfn and set the
>>> +		 * level1_fixmap_pgt (which is completely empty) to RO,
>>> +		 * too. But currently this page table is not declared,
>>> +		 * so it would be a bit of voodoo to get its address.
>>> +		 * And also the fixmap entry was never set due to using
>>> +		 * the wrong l2 when getting Xen's tables. So let's just
>>> +		 * just nuke it.
>>> +		 * This orphans level1_fixmap_pgt, but that was basically
>>> +		 * done before the change as well.
>>> +		 */
>>> +		memset(level2_fixmap_pgt, 0, 512*sizeof(long));
>>
>> level2_fixmap_pgt etc. are defined for the benefit of Xen only so I
>> think you should add an extern for level1_fixmap_pgt and fix this up
>> properly.
>>
>> It might not matter now, but it might in the future...
> 
> I found some time to look into this and test it.  Including without
> enabling KSLAR, but reproing the vmalloc failure with a large (800 MiB
> module).
> 
> I've clarified the change description, can you review my edits?
> 
> Thanks for investigating and fixing this!

Sorry for not having replied earlier (I am on vacation). Without testing it
looks about what I had after a few iterations (minus the export of l1). So if
you had a kernel booting with that I am happy, too. :)

-Stefan
> 
> David
> 
> 8<------------------------------
> x86/xen: don't copy junk into kernel page tables
> 
> When RANDOMIZE_BASE (KASLR) is enabled; or the sum of all loaded
> modules exceeds 512 MiB, then loading modules fails with a warning
> (and hence a vmalloc allocation failure) because the PTEs for the
> newly-allocated vmalloc address space are not zero.
> 
>   WARNING: CPU: 0 PID: 494 at linux/mm/vmalloc.c:128
>            vmap_page_range_noflush+0x2a1/0x360()
> 
> This is caused by xen_setup_kernel_pagetables() copying junk into the
> page tables (to level2_fixmap_pgt), overwriting many non-present
> entries.
> 
> Without KASLR, the normal kernel image size only covers the first half
> of level2_kernel_pgt and module space starts after that.
> 
> L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[  0..255]->kernel
>                                                   [256..511]->module
>                           [511]->level2_fixmap_pgt[  0..505]->module
> 
> This allows 512 MiB of of module vmalloc space to be used before
> having to use the corrupted level2_fixmap_pgt entries.
> 
> With KASLR enabled, the kernel image uses the full PUD range of 1G and
> module space starts in the level2_fixmap_pgt. So basically:
> 
> L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[0..511]->kernel
>                           [511]->level2_fixmap_pgt[0..505]->module
> 
> And now no module vmalloc space can be used without using the corrupt
> level2_fixmap_pgt entries.
> 
> Fix this by properly converting the level2_fixmap_pgt entries to MFNs,
> and setting level1_fixmap_pgt as read-only.
> 
> Signed-off-by: Stefan Bader <stefan.bader@...onical.com>
> Cc: stable@...r.kernel.org
> Signed-off-by: David Vrabel <david.vrabel@...rix.com>
> ---
>  arch/x86/include/asm/pgtable_64.h |    1 +
>  arch/x86/xen/mmu.c                |   27 ++++++++++++---------------
>  2 files changed, 13 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
> index 5be9063..3874693 100644
> --- a/arch/x86/include/asm/pgtable_64.h
> +++ b/arch/x86/include/asm/pgtable_64.h
> @@ -19,6 +19,7 @@ extern pud_t level3_ident_pgt[512];
>  extern pmd_t level2_kernel_pgt[512];
>  extern pmd_t level2_fixmap_pgt[512];
>  extern pmd_t level2_ident_pgt[512];
> +extern pte_t level1_fixmap_pgt[512];
>  extern pgd_t init_level4_pgt[];
>  
>  #define swapper_pg_dir init_level4_pgt
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index e8a1201..16fb009 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1866,12 +1866,11 @@ static void __init check_pt_base(unsigned long *pt_base, unsigned long *pt_end,
>   *
>   * We can construct this by grafting the Xen provided pagetable into
>   * head_64.S's preconstructed pagetables.  We copy the Xen L2's into
> - * level2_ident_pgt, level2_kernel_pgt and level2_fixmap_pgt.  This
> - * means that only the kernel has a physical mapping to start with -
> - * but that's enough to get __va working.  We need to fill in the rest
> - * of the physical mapping once some sort of allocator has been set
> - * up.
> - * NOTE: for PVH, the page tables are native.
> + * level2_ident_pgt, and level2_kernel_pgt.  This means that only the
> + * kernel has a physical mapping to start with - but that's enough to
> + * get __va working.  We need to fill in the rest of the physical
> + * mapping once some sort of allocator has been set up.  NOTE: for
> + * PVH, the page tables are native.
>   */
>  void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
>  {
> @@ -1902,8 +1901,11 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
>  		/* L3_i[0] -> level2_ident_pgt */
>  		convert_pfn_mfn(level3_ident_pgt);
>  		/* L3_k[510] -> level2_kernel_pgt
> -		 * L3_i[511] -> level2_fixmap_pgt */
> +		 * L3_k[511] -> level2_fixmap_pgt */
>  		convert_pfn_mfn(level3_kernel_pgt);
> +
> +		/* L3_k[511][506] -> level1_fixmap_pgt */
> +		convert_pfn_mfn(level2_fixmap_pgt);
>  	}
>  	/* We get [511][511] and have Xen's version of level2_kernel_pgt */
>  	l3 = m2v(pgd[pgd_index(__START_KERNEL_map)].pgd);
> @@ -1913,21 +1915,15 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
>  	addr[1] = (unsigned long)l3;
>  	addr[2] = (unsigned long)l2;
>  	/* Graft it onto L4[272][0]. Note that we creating an aliasing problem:
> -	 * Both L4[272][0] and L4[511][511] have entries that point to the same
> +	 * Both L4[272][0] and L4[511][510] have entries that point to the same
>  	 * L2 (PMD) tables. Meaning that if you modify it in __va space
>  	 * it will be also modified in the __ka space! (But if you just
>  	 * modify the PMD table to point to other PTE's or none, then you
>  	 * are OK - which is what cleanup_highmap does) */
>  	copy_page(level2_ident_pgt, l2);
> -	/* Graft it onto L4[511][511] */
> +	/* Graft it onto L4[511][510] */
>  	copy_page(level2_kernel_pgt, l2);
>  
> -	/* Get [511][510] and graft that in level2_fixmap_pgt */
> -	l3 = m2v(pgd[pgd_index(__START_KERNEL_map + PMD_SIZE)].pgd);
> -	l2 = m2v(l3[pud_index(__START_KERNEL_map + PMD_SIZE)].pud);
> -	copy_page(level2_fixmap_pgt, l2);
> -	/* Note that we don't do anything with level1_fixmap_pgt which
> -	 * we don't need. */
>  	if (!xen_feature(XENFEAT_auto_translated_physmap)) {
>  		/* Make pagetable pieces RO */
>  		set_page_prot(init_level4_pgt, PAGE_KERNEL_RO);
> @@ -1937,6 +1933,7 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
>  		set_page_prot(level2_ident_pgt, PAGE_KERNEL_RO);
>  		set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
>  		set_page_prot(level2_fixmap_pgt, PAGE_KERNEL_RO);
> +		set_page_prot(level1_fixmap_pgt, PAGE_KERNEL_RO);
>  
>  		/* Pin down new L4 */
>  		pin_pagetable_pfn(MMUEXT_PIN_L4_TABLE,
> 



Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ