[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20201119175902.17394-2-aarcange@redhat.com>
Date: Thu, 19 Nov 2020 12:59:02 -0500
From: Andrea Arcangeli <aarcange@...hat.com>
To: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>
Cc: Andi Kleen <ak@...ux.intel.com>, Rafael Aquini <aquini@...hat.com>,
Waiman Long <longman@...hat.com>, linux-kernel@...r.kernel.org
Subject: [PATCH 1/1] x86: restore the write back cache of reserved RAM in iounmap()
If reserved memory is mapped with ioremap_noncache() or ioremap_wc(),
the kernel correctly splits the direct mapping and marks the PAGE_SIZE
granular region uncached, so both the virtual direct mapping and the
second virtual mapping in vmap space will be both marked uncached or
write through (i.e. _PAGE_PCD/PWT set on the pagetable).
However when iounmap is called later, nothing restores the direct
mapping write back memtype.
So if kernel executes this sequence:
SetPageReserved
ioremap_nocache
iounmap
ClearPageReserved
if the page is ever freed later it remains "uncached" indefinitely.
Those uncached regions can be tiny compared to the total size of the
RAM, so it may take a long time until a performance critical piece of
kernel memory gets allocated in a page that is uncached in the direct
mapping, long after the iounmap that left it uncached.
However when it eventually happens, it generates unexpected severe and
non reproducible kernel slowdowns.
The fix consist in restoring the original write back cache on reserved
RAM at iounmap() time. This is preferable than supporting multiple
overlapping ioremap on the same physical range because:
- the malfunction will happen when an ioremap is still outstanding,
and ideally it'll happen synchronously at iounmap time, so it should
be easier to track down than by scanning all kernel pagetables
searching for any PCD/PWT leftover bits
- two ioremap at the same time of non write back memtype are already
forbidden by the bugcheck in reserve_ram_page_type that verifies the
current page type is still _PAGE_CACHE_MODE_WB before
proceeding. And all ioremap are of write back memtype the patch will
not make a difference
- even if two ioremap at the same on RAM would be allowed, the caller
would need to still enforce they all have the same memtype, so it is
more likely able to enforce that it doesn't do overlapping ioremaps
at once than to be able to undo the changes to the direct mapping
pagetables
Fixes: f56d005d3034 ("x86: no CPA on iounmap")
Signed-off-by: Andrea Arcangeli <aarcange@...hat.com>
---
arch/x86/mm/ioremap.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 9e5ccc56f8e0..65dbc88edf43 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -437,6 +437,7 @@ EXPORT_SYMBOL(ioremap_prot);
void iounmap(volatile void __iomem *addr)
{
struct vm_struct *p, *o;
+ u64 p_start, p_end;
if ((void __force *)addr <= high_memory)
return;
@@ -472,12 +473,17 @@ void iounmap(volatile void __iomem *addr)
return;
}
- memtype_free(p->phys_addr, p->phys_addr + get_vm_area_size(p));
+ p_start = p->phys_addr;
+ p_end = p_start + get_vm_area_size(p);
+ memtype_free(p_start, p_end);
/* Finally remove it */
o = remove_vm_area((void __force *)addr);
BUG_ON(p != o || o == NULL);
kfree(p);
+ if (o)
+ memtype_kernel_map_sync(p_start, p_end,
+ _PAGE_CACHE_MODE_WB);
}
EXPORT_SYMBOL(iounmap);
Powered by blists - more mailing lists