lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1207271215540.26163@kaball.uk.xensource.com>
Date:	Fri, 27 Jul 2012 12:18:46 +0100
From:	Stefano Stabellini <stefano.stabellini@...citrix.com>
To:	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
CC:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"xen-devel@...ts.xensource.com" <xen-devel@...ts.xensource.com>
Subject: Re: [Xen-devel] [PATCH 5/7] xen/p2m: Add logic to revector a P2M
 tree to use __va leafs.

On Thu, 26 Jul 2012, Konrad Rzeszutek Wilk wrote:
> During bootup Xen supplies us with a P2M array. It sticks
> it right after the ramdisk, as can be seen with a 128GB PV guest:
> 
> (certain parts removed for clarity):
> xc_dom_build_image: called
> xc_dom_alloc_segment:   kernel       : 0xffffffff81000000 -> 0xffffffff81e43000  (pfn 0x1000 + 0xe43 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000
> xc_dom_alloc_segment:   ramdisk      : 0xffffffff81e43000 -> 0xffffffff925c7000  (pfn 0x1e43 + 0x10784 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000
> xc_dom_alloc_segment:   phys2mach    : 0xffffffff925c7000 -> 0xffffffffa25c7000  (pfn 0x125c7 + 0x10000 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x10000 at 0x7f0942dd2000
> xc_dom_alloc_page   :   start info   : 0xffffffffa25c7000 (pfn 0x225c7)
> xc_dom_alloc_page   :   xenstore     : 0xffffffffa25c8000 (pfn 0x225c8)
> xc_dom_alloc_page   :   console      : 0xffffffffa25c9000 (pfn 0x225c9)
> nr_page_tables: 0x0000ffffffffffff/48: 0xffff000000000000 -> 0xffffffffffffffff, 1 table(s)
> nr_page_tables: 0x0000007fffffffff/39: 0xffffff8000000000 -> 0xffffffffffffffff, 1 table(s)
> nr_page_tables: 0x000000003fffffff/30: 0xffffffff80000000 -> 0xffffffffbfffffff, 1 table(s)
> nr_page_tables: 0x00000000001fffff/21: 0xffffffff80000000 -> 0xffffffffa27fffff, 276 table(s)
> xc_dom_alloc_segment:   page tables  : 0xffffffffa25ca000 -> 0xffffffffa26e1000  (pfn 0x225ca + 0x117 pages)
> xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000
> xc_dom_alloc_page   :   boot stack   : 0xffffffffa26e1000 (pfn 0x226e1)
> xc_dom_build_image  : virt_alloc_end : 0xffffffffa26e2000
> xc_dom_build_image  : virt_pgtab_end : 0xffffffffa2800000
> 
> So the physical memory and virtual (using __START_KERNEL_map addresses)
> layout looks as so:
> 
>   phys                             __ka
> /------------\                   /-------------------\
> | 0          | empty             | 0xffffffff80000000|
> | ..         |                   | ..                |
> | 16MB       | <= kernel starts  | 0xffffffff81000000|
> | ..         |                   |                   |
> | 30MB       | <= kernel ends => | 0xffffffff81e43000|
> | ..         |  & ramdisk starts | ..                |
> | 293MB      | <= ramdisk ends=> | 0xffffffff925c7000|
> | ..         |  & P2M starts     | ..                |
> | ..         |                   | ..                |
> | 549MB      | <= P2M ends    => | 0xffffffffa25c7000|
> | ..         | start_info        | 0xffffffffa25c7000|
> | ..         | xenstore          | 0xffffffffa25c8000|
> | ..         | cosole            | 0xffffffffa25c9000|
> | 549MB      | <= page tables => | 0xffffffffa25ca000|
> | ..         |                   |                   |
> | 550MB      | <= PGT end     => | 0xffffffffa26e1000|
> | ..         | boot stack        |                   |
> \------------/                   \-------------------/
> 
> As can be seen, the ramdisk, P2M and pagetables are taking
> a bit of __ka addresses space. Which is a problem since the
> MODULES_VADDR starts at 0xffffffffa0000000 - and P2M sits
> right in there! This results during bootup with the inability to
> load modules, with this error:
> 
> ------------[ cut here ]------------
> WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 vmap_page_range_noflush+0x2d9/0x370()
> Call Trace:
>  [<ffffffff810719fa>] warn_slowpath_common+0x7a/0xb0
>  [<ffffffff81030279>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
>  [<ffffffff81071a45>] warn_slowpath_null+0x15/0x20
>  [<ffffffff81130b89>] vmap_page_range_noflush+0x2d9/0x370
>  [<ffffffff81130c4d>] map_vm_area+0x2d/0x50
>  [<ffffffff811326d0>] __vmalloc_node_range+0x160/0x250
>  [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
>  [<ffffffff810c6186>] ? load_module+0x66/0x19c0
>  [<ffffffff8105cadc>] module_alloc+0x5c/0x60
>  [<ffffffff810c5369>] ? module_alloc_update_bounds+0x19/0x80
>  [<ffffffff810c5369>] module_alloc_update_bounds+0x19/0x80
>  [<ffffffff810c70c3>] load_module+0xfa3/0x19c0
>  [<ffffffff812491f6>] ? security_file_permission+0x86/0x90
>  [<ffffffff810c7b3a>] sys_init_module+0x5a/0x220
>  [<ffffffff815ce339>] system_call_fastpath+0x16/0x1b
> ---[ end trace fd8f7704fdea0291 ]---
> vmalloc: allocation failure, allocated 16384 of 20480 bytes
> modprobe: page allocation failure: order:0, mode:0xd2
> 
> Since the __va and __ka are 1:1 up to MODULES_VADDR and
> cleanup_highmap rids __ka of the ramdisk mapping, what
> we want to do is similar - get rid of the P2M in the __ka
> address space. There are two ways of fixing this:
> 
>  1) All P2M lookups instead of using the __ka address would
>     use the __va address. This means we can safely erase from
>     __ka space the PMD pointers that point to the PFNs for
>     P2M array and be OK.
>  2). Allocate a new array, copy the existing P2M into it,
>     revector the P2M tree to use that, and return the old
>     P2M to the memory allocate. This has the advantage that
>     it sets the stage for using XEN_ELF_NOTE_INIT_P2M
>     feature. That feature allows us to set the exact virtual
>     address space we want for the P2M - and allows us to
>     boot as initial domain on large machines.
> 
> So we pick option 2).

1) looks like a decent option that requires less code.
Is the problem with 1) that we might want to access the P2M before we
have __va addresses ready?



> This patch only lays the groundwork in the P2M code. The patch
> that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree."
> 
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
> ---
>  arch/x86/xen/p2m.c     |   70 ++++++++++++++++++++++++++++++++++++++++++++++++
>  arch/x86/xen/xen-ops.h |    1 +
>  2 files changed, 71 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 6a2bfa4..bbfd085 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -394,7 +394,77 @@ void __init xen_build_dynamic_phys_to_machine(void)
>  	 * Xen provided pagetable). Do it later in xen_reserve_internals.
>  	 */
>  }
> +#ifdef CONFIG_X86_64
> +#include <linux/bootmem.h>
> +unsigned long __init xen_revector_p2m_tree(void)
> +{
> +	unsigned long va_start;
> +	unsigned long va_end;
> +	unsigned long pfn;
> +	unsigned long *mfn_list = NULL;
> +	unsigned long size;
> +
> +	va_start = xen_start_info->mfn_list;
> +	/*We copy in increments of P2M_PER_PAGE * sizeof(unsigned long),
> +	 * so make sure it is rounded up to that */
> +	size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
> +	va_end = va_start + size;
> +
> +	/* If we were revectored already, don't do it again. */
> +	if (va_start <= __START_KERNEL_map && va_start >= __PAGE_OFFSET)
> +		return 0;
> +
> +	mfn_list = alloc_bootmem_align(size, PAGE_SIZE);
> +	if (!mfn_list) {
> +		pr_warn("Could not allocate space for a new P2M tree!\n");
> +		return xen_start_info->mfn_list;
> +	}
> +	/* Fill it out with INVALID_P2M_ENTRY value */
> +	memset(mfn_list, 0xFF, size);
> +
> +	for (pfn = 0; pfn < ALIGN(MAX_DOMAIN_PAGES, P2M_PER_PAGE); pfn += P2M_PER_PAGE) {
> +		unsigned topidx = p2m_top_index(pfn);
> +		unsigned mididx;
> +		unsigned long *mid_p;
> +
> +		if (!p2m_top[topidx])
> +			continue;
> +
> +		if (p2m_top[topidx] == p2m_mid_missing)
> +			continue;
> +
> +		mididx = p2m_mid_index(pfn);
> +		mid_p = p2m_top[topidx][mididx];
> +		if (!mid_p)
> +			continue;
> +		if ((mid_p == p2m_missing) || (mid_p == p2m_identity))
> +			continue;
> +
> +		if ((unsigned long)mid_p == INVALID_P2M_ENTRY)
> +			continue;
> +
> +		/* The old va. Rebase it on mfn_list */
> +		if (mid_p >= (unsigned long *)va_start && mid_p <= (unsigned long *)va_end) {
> +			unsigned long *new;
> +
> +			new = &mfn_list[pfn];
> +
> +			copy_page(new, mid_p);
> +			p2m_top[topidx][mididx] = &mfn_list[pfn];
> +			p2m_top_mfn_p[topidx][mididx] = virt_to_mfn(&mfn_list[pfn]);
>  
> +		}
> +		/* This should be the leafs allocated for identity from _brk. */
> +	}
> +	return (unsigned long)mfn_list;
> +
> +}
> +#else
> +unsigned long __init xen_revector_p2m_tree(void)
> +{
> +	return 0;
> +}
> +#endif
>  unsigned long get_phys_to_machine(unsigned long pfn)
>  {
>  	unsigned topidx, mididx, idx;
> diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
> index 2230f57..bb5a810 100644
> --- a/arch/x86/xen/xen-ops.h
> +++ b/arch/x86/xen/xen-ops.h
> @@ -45,6 +45,7 @@ void xen_hvm_init_shared_info(void);
>  void xen_unplug_emulated_devices(void);
>  
>  void __init xen_build_dynamic_phys_to_machine(void);
> +unsigned long __init xen_revector_p2m_tree(void);
>  
>  void xen_init_irq_ops(void);
>  void xen_setup_timer(int cpu);
> -- 
> 1.7.7.6
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@...ts.xen.org
> http://lists.xen.org/xen-devel
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ