linux-kernel - [PATCH 1/1] x86: restore the write back cache of reserved RAM in iounmap()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20201119175902.17394-2-aarcange@redhat.com>
Date:   Thu, 19 Nov 2020 12:59:02 -0500
From:   Andrea Arcangeli <aarcange@...hat.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>
Cc:     Andi Kleen <ak@...ux.intel.com>, Rafael Aquini <aquini@...hat.com>,
        Waiman Long <longman@...hat.com>, linux-kernel@...r.kernel.org
Subject: [PATCH 1/1] x86: restore the write back cache of reserved RAM in iounmap()

If reserved memory is mapped with ioremap_noncache() or ioremap_wc(),
the kernel correctly splits the direct mapping and marks the PAGE_SIZE
granular region uncached, so both the virtual direct mapping and the
second virtual mapping in vmap space will be both marked uncached or
write through (i.e.  _PAGE_PCD/PWT set on the pagetable).

However when iounmap is called later, nothing restores the direct
mapping write back memtype.

So if kernel executes this sequence:

   SetPageReserved
   ioremap_nocache
   iounmap
   ClearPageReserved

if the page is ever freed later it remains "uncached" indefinitely.

Those uncached regions can be tiny compared to the total size of the
RAM, so it may take a long time until a performance critical piece of
kernel memory gets allocated in a page that is uncached in the direct
mapping, long after the iounmap that left it uncached.

However when it eventually happens, it generates unexpected severe and
non reproducible kernel slowdowns.

The fix consist in restoring the original write back cache on reserved
RAM at iounmap() time. This is preferable than supporting multiple
overlapping ioremap on the same physical range because:

- the malfunction will happen when an ioremap is still outstanding,
  and ideally it'll happen synchronously at iounmap time, so it should
  be easier to track down than by scanning all kernel pagetables
  searching for any PCD/PWT leftover bits

- two ioremap at the same time of non write back memtype are already
  forbidden by the bugcheck in reserve_ram_page_type that verifies the
  current page type is still _PAGE_CACHE_MODE_WB before
  proceeding. And all ioremap are of write back memtype the patch will
  not make a difference

- even if two ioremap at the same on RAM would be allowed, the caller
  would need to still enforce they all have the same memtype, so it is
  more likely able to enforce that it doesn't do overlapping ioremaps
  at once than to be able to undo the changes to the direct mapping
  pagetables

Fixes: f56d005d3034 ("x86: no CPA on iounmap")
Signed-off-by: Andrea Arcangeli <aarcange@...hat.com>
---
 arch/x86/mm/ioremap.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 9e5ccc56f8e0..65dbc88edf43 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -437,6 +437,7 @@ EXPORT_SYMBOL(ioremap_prot);
 void iounmap(volatile void __iomem *addr)
 {
 	struct vm_struct *p, *o;
+	u64 p_start, p_end;

 	if ((void __force *)addr <= high_memory)
 		return;
@@ -472,12 +473,17 @@ void iounmap(volatile void __iomem *addr)
 		return;
 	}

-	memtype_free(p->phys_addr, p->phys_addr + get_vm_area_size(p));
+	p_start = p->phys_addr;
+	p_end = p_start + get_vm_area_size(p);
+	memtype_free(p_start, p_end);

 	/* Finally remove it */
 	o = remove_vm_area((void __force *)addr);
 	BUG_ON(p != o || o == NULL);
 	kfree(p);
+	if (o)
+		memtype_kernel_map_sync(p_start, p_end,
+					_PAGE_CACHE_MODE_WB);
 }
 EXPORT_SYMBOL(iounmap);