[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090220165955.GA8347@elte.hu>
Date: Fri, 20 Feb 2009 17:59:55 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Steven Rostedt <rostedt@...dmis.org>,
Huang Ying <ying.huang@...el.com>,
Thomas Gleixner <tglx@...utronix.de>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Arjan van de Ven <arjan@...radead.org>,
Rusty Russell <rusty@...tcorp.com.au>,
Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH] x86: use the right protections for split-up pagetables
* Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> On Fri, 20 Feb 2009, Ingo Molnar wrote:
> >
> > Agreed, split_large_page() was just plain confused here - there
> > was no hidden reason for this logic. It makes no sense to bring
> > any pte level protection information to the PMD level because a
> > pmd entry covers a set of 512 ptes so there's no singular
> > protection attribute that can be carried to it.
>
> Btw, I think split_large_page() is confused in another way
> too, although I'm not entirely sure that it matters. I suspect
> that it doesn't, if I read things correctly.
>
> The confusion? When it moves the 'ref_prot' bits from the
> upper level, it doesn't do the right thing for the PAT bit.
> That bit is special, and moves around depending on level. In
> the upper levels, it's bit#12, and in the final 4k pte level
> it's bit#7.
>
> So _if_ the PAT bit ever matters, it looks like
> split_large_page() does the wrong thing.
>
> Now, it looks like we avoid the PAT bit on purpose, and we
> only ever encode four PAT values (ie we use only the PCD/PWT
> bits, and leave the PAT bit clear - we don't need any more
> cases), _but_ we actually do end up looking at the PAT bit
> anyway in cache_attr(). So it looks like at least some of the
> code is _trying_ to handle the PAT bit, but I can pretty much
> guarantee that at least split_large_page() is broken if it is
> ever set.
Yeah. This our current PAT encodings table:
/*
* PTE encoding used in Linux:
* PAT
* |PCD
* ||PWT
* |||
* 000 WB _PAGE_CACHE_WB
* 001 WC _PAGE_CACHE_WC
* 010 UC- _PAGE_CACHE_UC_MINUS
* 011 UC _PAGE_CACHE_UC
* PAT bit unused
*/
pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, UC);
it's intentionally left compressed and the extended PAT bit is
never set, we only need 4 caching types.
( Speculation: in theory it would be possible for some CPU's
TLB-fill fastpath to have some small penalty on having a
non-zero extended-PAT bit. So eliminating weird bits and
compressing pte bit usage is always a good idea. )
Nevertheless you are right that there's a disconnect here and
that were it ever set we'd unconditionally lift the 2MB/1GB PAT
bit [bit 12] over into the 4K level.
If we ever set the PAT bit on a large page then the
split_large_page() behavior would become rather nasty: we'd
corrupt pte bit 12, i.e. we'd lose linear mappings, we'd map
every even page twice (and not map any uneven pages), and we'd
start corrupting memory and would crash in interesting ways.
There's two solutions:
- make the ref_prot opt-in and explicitly enumerate all the
bits we handle correctly today
- add a debug warning for the bits we know we dont handle
I went for the second as the first one would include basically
all the meaningful bits we have:
[_PAGE_BIT_PRESENT]
_PAGE_BIT_RW
_PAGE_BIT_USER
_PAGE_BIT_PWT
_PAGE_BIT_PCD
[_PAGE_BIT_ACCESSED]
[_PAGE_BIT_DIRTY]
_PAGE_BIT_GLOBAL
_PAGE_BIT_NX
( the ones in brackets are not important because we set/clear
them anyway, but they dont hurt either. )
And if we do not include PAT and it gets used in the future the
function could break too - just in different ways. (by not
carrying over the PAT bit.)
So i think it's safest to put in a sharp debug check for the
known-unhandled bit. I've queued up the fix below in tip:x86/mm,
do you think this approach is the best?
Ingo
-------------------->
>From 7a5714e0186030676d79a7b4b9830c8e45c3b0a1 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@...e.hu>
Date: Fri, 20 Feb 2009 17:44:21 +0100
Subject: [PATCH] x86, pat: add large-PAT check to split_large_page()
Impact: future-proof the split_large_page() function
Linus noticed that split_large_page() is not safe wrt. the
PAT bit: it is bit 12 on the 1GB and 2MB page table level
(_PAGE_BIT_PAT_LARGE), and it is bit 7 on the 4K page
table level (_PAGE_BIT_PAT).
Currently it is not a problem because we never set
_PAGE_BIT_PAT_LARGE on any of the large-page mappings - but
should this happen in the future the split_large_page() would
silently lift bit 12 into the lowlevel 4K pte and would start
corrupting the physical page frame offset. Not fun.
So add a debug warning, to make sure if something ever sets
the PAT bit then this function gets updated too.
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@...e.hu>
---
arch/x86/mm/pageattr.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 7be47d1..8253bc9 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -482,6 +482,13 @@ static int split_large_page(pte_t *kpte, unsigned long address)
pbase = (pte_t *)page_address(base);
paravirt_alloc_pte(&init_mm, page_to_pfn(base));
ref_prot = pte_pgprot(pte_clrhuge(*kpte));
+ /*
+ * If we ever want to utilize the PAT bit, we need to
+ * update this function to make sure it's converted from
+ * bit 12 to bit 7 when we cross from the 2MB level to
+ * the 4K level:
+ */
+ WARN_ON_ONCE(pgprot_val(ref_prot) & _PAGE_PAT_LARGE);
#ifdef CONFIG_X86_64
if (level == PG_LEVEL_1G) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists