lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 24 Aug 2012 18:27:41 +0100
From:	Stefano Stabellini <stefano.stabellini@...citrix.com>
To:	Thomas Gleixner <tglx@...utronix.de>
CC:	Stefano Stabellini <Stefano.Stabellini@...citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Borislav Petkov <bp@...en8.de>,
	Attilio Rao <attilio.rao@...rix.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"mingo@...hat.com" <mingo@...hat.com>,
	"hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH v4 1/2] XEN/X86: Improve semantic support for
 x86_init.mapping.pagetable_reserve

On Fri, 24 Aug 2012, Thomas Gleixner wrote:
> On Fri, 24 Aug 2012, Stefano Stabellini wrote:
> > On Fri, 24 Aug 2012, Thomas Gleixner wrote:
> > > I looked at the whole pgt_buf_* mess and it's amazingly stupid. We
> > > could avoid all that dance and make all of that pgt_buf_* stuff static
> > > and provide proper accessor functions and hand start, end, top to the
> > > reserve function instead of fiddling with global variables all over
> > > the place. That'd be a real cleanup and progress.
> > > 
> > > But we can't do that easily. And why? Because XEN is making magic
> > > decisions based on those globals in mask_rw_pte().
> > > 
> > > 	/*
> > > 	 * If the new pfn is within the range of the newly allocated
> > > 	 * kernel pagetable, and it isn't being mapped into an
> > > 	 * early_ioremap fixmap slot as a freshly allocated page, make sure
> > > 	 * it is RO.
> > > 	 */
> > > 	if (((!is_early_ioremap_ptep(ptep) &&
> > > 			pfn >= pgt_buf_start && pfn < pgt_buf_top)) ||
> > > 			(is_early_ioremap_ptep(ptep) && pfn != (pgt_buf_end - 1)))
> > > 
> > > This comment along with the implementation is really a master piece of
> > > obfuscation. Let's see what this is doing. RO is enforced when:
> > > 
> > > This is not an early ioreamp AND
> > > 
> > >       pfn >= pgt_buf_start && pfn < pgt_buf_top
> > > 
> > > So why is this checking pgt_buf_top? The early stuff is installed
> > > within pgt_buf_start and pgt_buf_end. Anything which is >=
> > > pgt_buf_end at this point is completely wrong.
> > 
> > Unfortunately pgt_buf_end only marks the current end of the pagetable
> > pages (pgt_buf_end keeps increasing during
> > kernel_physical_mapping_init).  However at some point
> > kernel_physical_mapping_init is going to start mapping the pagetable
> > pages themselves, when that happens some of them are not pagetable pages
> > yet (pgt_buf_end <= page < pgt_buf_top) but they are going to be in the
> > near future.
> 
> And how exactly are they allocated between from pgt_buf w/o increasing
> pgt_buf_end ?

They do increase pgt_buf_end when they are allocated, but the entire
[pgt_buf_start - pgt_buf_top) range might already be mapped at that point
as normal memory that falls in the range of addresses passed to
init_memory_mapping as argument.

This is an extract from the commit message of
279b706bf800b5967037f492dbe4fc5081ad5d0f:

--- 

As a consequence of the commit:

commit 4b239f458c229de044d6905c2b0f9fe16ed9e01e
Author: Yinghai Lu <yinghai@...nel.org>
Date:   Fri Dec 17 16:58:28 2010 -0800

    x86-64, mm: Put early page table high

at some point init_memory_mapping is going to reach the pagetable pages
area and map those pages too (mapping them as normal memory that falls
in the range of addresses passed to init_memory_mapping as argument).
Some of those pages are already pagetable pages (they are in the range
pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
everything is fine.
Some of these pages are not pagetable pages yet (they fall in the range
pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
are going to be mapped RW.  When these pages become pagetable pages and
are hooked into the pagetable, xen will find that the guest has already
a RW mapping of them somewhere and fail the operation.
The reason Xen requires pagetables to be RO is that the hypervisor needs
to verify that the pagetables are valid before using them. The validation
operations are called "pinning" (more details in arch/x86/xen/mmu.c).

---

So let's suppose that we change the check in mask_rw_pte to be:

pfn >= pgt_buf_start && pfn < pgt_buf_end

as it was originally. This is what could happen:

1) pgt_buf_start - pgt_buf_end gets mapped RO;
2) pgt_buf_end - pgt_buf_top gets mapped RW;
3) a new pagetable page is allocated, pgt_buf_end is increased;
4) this new pagetable page is hooked into the pagetable;
5) since a mapping of this page already exists (it was done in
   point 2), and this mapping is RW, Linux crashes.


Thanks for taking the time to look into this issue. I know it is
difficult and not very pleasant.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ