[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080507070217.GD32195@elte.hu>
Date: Wed, 7 May 2008 09:02:17 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Venki Pallipadi <venkatesh.pallipadi@...el.com>
Cc: Frans Pop <elendil@...net.nl>,
Jesse Barnes <jesse.barnes@...el.com>,
linux-kernel@...r.kernel.org,
"Packard, Keith" <keith.packard@...el.com>,
Yinghai Lu <yhlu.kernel@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Hugh Dickins <hugh@...itas.com>,
"H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [git head] X86_PAT & mprotect
* Venki Pallipadi <venkatesh.pallipadi@...el.com> wrote:
> There is a hole in mprotect, which lets the user to change the page
> cache type bits by-passing the kernel reserve_memtype and free_memtype
> wrappers. Fix the hole by not letting mprotect change the PAT bits.
>
> Some versions of X used the mprotect hole to change caching type from
> UC to WB, so that it can then use mtrr to program WC for that region
> [1]. Change the mmap of pci space through /sys or /proc interfaces
> from UC to UC_MINUS. With this change, X will not need to use mprotect
> hole to get WC type.
>
> [1] lkml.org/lkml/2008/4/16/369
>
> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@...el.com>
>
> ---
> arch/x86/pci/i386.c | 4 +---
> include/asm-x86/pgtable.h | 5 ++++-
> include/linux/mm.h | 1 +
> mm/mmap.c | 13 +++++++++++++
> mm/mprotect.c | 4 +++-
> 5 files changed, 22 insertions(+), 5 deletions(-)
hm, that's one dangerous looking patch. (Cc:-ed more MM folks. I've
attached the patch below for reference.)
the purpose of the fix itself seems to make some sense - we dont want
mprotect() change the PAT bits in the pte from what they got populated
with at fault or mmap time.
the pte_modify() change looks correct at first sight.
The _PAGE_PROT_PRESERVE_BITS solution looks a bit ugly (although we do
have a couple of other similar #ifndefs in the MM already).
at minimum we should add vm_get_page_prot_preserve() as an inline
function to mm.h if _PAGE_PROT_PRESERVE_BITS is not defined, and make it
call vm_get_page_prot(). Also, vm_get_page_prot() in mm/mmap.c should
probably be marked inline so that we'll have only a single function call
[vm_get_page_prot() is trivial].
but i'm wondering why similar issues never came up on other
architectures - i thought it would be rather common to have immutable
pte details. So maybe i'm missing something here ...
Ingo
------------------------------------->
Subject: generic, x86, PAT: fix mprotect
From: Venki Pallipadi <venkatesh.pallipadi@...el.com>
Date: Tue, 6 May 2008 15:42:40 -0700
There is a hole in mprotect, which lets the user to change the page
cache type bits by-passing the kernel reserve_memtype and free_memtype
wrappers. Fix the hole by not letting mprotect change the PAT bits.
Some versions of X used the mprotect hole to change caching type from UC to WB,
so that it can then use mtrr to program WC for that region [1]. Change the
mmap of pci space through /sys or /proc interfaces from UC to UC_MINUS.
With this change, X will not need to use mprotect hole to get WC type.
[1] lkml.org/lkml/2008/4/16/369
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@...el.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@...el.com>
Signed-off-by: Ingo Molnar <mingo@...e.hu>
---
arch/x86/pci/i386.c | 4 +---
include/asm-x86/pgtable.h | 5 ++++-
include/linux/mm.h | 1 +
mm/mmap.c | 13 +++++++++++++
mm/mprotect.c | 4 +++-
5 files changed, 22 insertions(+), 5 deletions(-)
Index: linux-x86.q/arch/x86/pci/i386.c
===================================================================
--- linux-x86.q.orig/arch/x86/pci/i386.c
+++ linux-x86.q/arch/x86/pci/i386.c
@@ -301,15 +301,13 @@ int pci_mmap_page_range(struct pci_dev *
prot = pgprot_val(vma->vm_page_prot);
if (pat_wc_enabled && write_combine)
prot |= _PAGE_CACHE_WC;
- else if (pat_wc_enabled)
+ else if (pat_wc_enabled || boot_cpu_data.x86 > 3)
/*
* ioremap() and ioremap_nocache() defaults to UC MINUS for now.
* To avoid attribute conflicts, request UC MINUS here
* aswell.
*/
prot |= _PAGE_CACHE_UC_MINUS;
- else if (boot_cpu_data.x86 > 3)
- prot |= _PAGE_CACHE_UC;
vma->vm_page_prot = __pgprot(prot);
Index: linux-x86.q/include/asm-x86/pgtable.h
===================================================================
--- linux-x86.q.orig/include/asm-x86/pgtable.h
+++ linux-x86.q/include/asm-x86/pgtable.h
@@ -66,6 +66,8 @@
#define _PAGE_CACHE_UC_MINUS (_PAGE_PCD)
#define _PAGE_CACHE_UC (_PAGE_PCD | _PAGE_PWT)
+#define _PAGE_PROT_PRESERVE_BITS (_PAGE_CACHE_MASK)
+
#define PAGE_NONE __pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)
#define PAGE_SHARED __pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
_PAGE_ACCESSED | _PAGE_NX)
@@ -289,7 +291,8 @@ static inline pte_t pte_modify(pte_t pte
* Chop off the NX bit (if present), and add the NX portion of
* the newprot (if present):
*/
- val &= _PAGE_CHG_MASK & ~_PAGE_NX;
+ /* We also preserve PAT bits from existing pte */
+ val &= (_PAGE_CHG_MASK | _PAGE_PROT_PRESERVE_BITS) & ~_PAGE_NX;
val |= pgprot_val(newprot) & __supported_pte_mask;
return __pte(val);
Index: linux-x86.q/include/linux/mm.h
===================================================================
--- linux-x86.q.orig/include/linux/mm.h
+++ linux-x86.q/include/linux/mm.h
@@ -1177,6 +1177,7 @@ static inline unsigned long vma_pages(st
}
pgprot_t vm_get_page_prot(unsigned long vm_flags);
+pgprot_t vm_get_page_prot_preserve(unsigned long vm_flags, pgprot_t oldprot);
struct vm_area_struct *find_extend_vma(struct mm_struct *, unsigned long addr);
int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t);
Index: linux-x86.q/mm/mmap.c
===================================================================
--- linux-x86.q.orig/mm/mmap.c
+++ linux-x86.q/mm/mmap.c
@@ -77,6 +77,19 @@ pgprot_t vm_get_page_prot(unsigned long
}
EXPORT_SYMBOL(vm_get_page_prot);
+#ifndef _PAGE_PROT_PRESERVE_BITS
+#define _PAGE_PROT_PRESERVE_BITS 0
+#endif
+
+pgprot_t vm_get_page_prot_preserve(unsigned long vm_flags, pgprot_t oldprot)
+{
+ pteval_t newprotval = pgprot_val(oldprot);
+
+ newprotval &= _PAGE_PROT_PRESERVE_BITS;
+ newprotval |= pgprot_val(vm_get_page_prot(vm_flags));
+ return __pgprot(newprotval);
+}
+
int sysctl_overcommit_memory = OVERCOMMIT_GUESS; /* heuristic overcommit */
int sysctl_overcommit_ratio = 50; /* default is 50% */
int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT;
Index: linux-x86.q/mm/mprotect.c
===================================================================
--- linux-x86.q.orig/mm/mprotect.c
+++ linux-x86.q/mm/mprotect.c
@@ -192,7 +192,9 @@ success:
* held in write mode.
*/
vma->vm_flags = newflags;
- vma->vm_page_prot = vm_get_page_prot(newflags);
+ vma->vm_page_prot = vm_get_page_prot_preserve(newflags,
+ vma->vm_page_prot);
+
if (vma_wants_writenotify(vma)) {
vma->vm_page_prot = vm_get_page_prot(newflags & ~VM_SHARED);
dirty_accountable = 1;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists