lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160819152707.GC26577@char.us.oracle.com>
Date:   Fri, 19 Aug 2016 11:27:07 -0400
From:   Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To:     One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>
Cc:     Jan Beulich <JBeulich@...e.com>,
        Andrew Cooper <andrew.cooper3@...rix.com>,
        stefan.bader@...onical.com, david.vrabel@...rix.com,
        xen-devel <xen-devel@...ts.xenproject.org>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        chuck.anderson@...cle.com, Juergen Gross <JGross@...e.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [Xen-devel] XSA 154 and ISA region (640K -> 1MB) WB cache
 instead of UC

On Thu, Aug 18, 2016 at 04:35:44PM +0100, One Thousand Gnomes wrote:
> On Thu, 18 Aug 2016 05:12:54 -0600
> "Jan Beulich" <JBeulich@...e.com> wrote:
> 
> > >>> On 18.08.16 at 12:16, <andrew.cooper3@...rix.com> wrote:  
> > > On 18/08/16 11:06, Jan Beulich wrote:  
> > >>>>> On 17.08.16 at 22:32, <konrad.wilk@...cle.com> wrote:  
> > >>>    Looking at the kernel it assumes that WB is ok for 640KB->1MB.
> > >>>    The comment says:
> > >>>    " /* Low ISA region is always mapped WB in page table. No need to track   
> > > *"  
> > >> As per above it's not clear to me what this comment is backed by.  
> > > 
> > > This states what is in the pagetables.  Not the combined result with MTRRs.
> > > 
> > > WB in the pagetables and WC/UB in the MTRRs is a legal combination which
> > > functions correctly.  
> > 
> > True, but then again - haven't I been told multiple times that Linux
> > nowadays prefers to run without using MTRRs?
> 
> The BIOS sets up the fixed MTRR registers for the 640K-1MB window. Those
> are separate to the variable range MTRR registers used for main memory
> with specific mappings for segments A000 to BFFF then C000-C7FF /
> C800-CFFF / etc up to FFFF.

OK, so BIOS-inherited.

Looking at the Intel SDM (figure 11-7), if the MTRR is UC for that, then 
having pagetables being either UC or WB are fine. Except Linux's use
of the quirk (is_untracked_pat_range) ends up always requesting WB.

And to combat the splat, the patch:


>From 5209635f23786fb88cf0ce77719da8acda63bf65 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
Date: Fri, 19 Aug 2016 11:06:44 -0400
Subject: [PATCH] x86/xen: Add x86_platform.is_untracked_pat_range quirk to
 ignore ISA regions.

On x86 whenever VMAs are setup, the 'is_ISA_range quirk' (which this
patch re-implements) is used to figure whether to ignore the
requested PAT type and always use WB (see 'reserve_memtype').
Specifically it forces the WB type for any region in the ISA space.

>From the Intel SDM, the combination of MTRR (UC, which is setup by
the BIOS) and PAT (UC or WB) for the ISA region ends up with the same
value - UC.

However on Xen, due to XSA 154 we enforce that mappings that _ANY_
pagetable entry to MMIO ranges MUST have the same the same cachability
mapping - and in this case we enforce UC.

Which means that with XSA 154 (and without this patch) any application
that maps /dev/mem to get SMBIOS information (like mcelog), and pokes
in the ISA region will not have an PTE set. That is due to
reserve_pfn_range returning -EINVAL which results in the PTE not being set.

[These are debug entries added in 'reserve_pfn_range']
mcelog:2471 0xf0000->0xf1000, req_type=write-back new_type=write-back
mcelog:2471 0xeb000->0xed000, req_type=write-back new_type=write-back

.. above are successfull ones, but:
mcelog:2471 0xeb000->0xed000, req_type=uncached new_type=uncached
[again, a debug one:]
mcelog:2471 want=uncached got=write-back strict 0x000eb000-0x000ecfff
mcelog:2471 map pfn expected mapping type uncached for [mem 0x000eb000-0x000ecfff], got write-back
 ------------[ cut here ]------------

 [<ffffffff816c66f0>] dump_stack+0x63/0x83
 [<ffffffff81084745>] warn_slowpath_common+0x95/0xe0
 [<ffffffff810847aa>] warn_slowpath_null+0x1a/0x20
 [<ffffffff810725f3>] untrack_pfn+0x93/0xc0
 [<ffffffff811b90f9>] unmap_single_vma+0xa9/0x100
 [<ffffffff811b9644>] unmap_vmas+0x54/0xa0
 [<ffffffff811bf0da>] exit_mmap+0x9a/0x150
 [<ffffffff810825d3>] mmput+0x73/0x110
 [<ffffffff81082775>] dup_mm+0x105/0x110
 [<ffffffff81083b1d>] copy_process+0x11ed/0x1240
 [<ffffffff81084009>] do_fork+0x79/0x280
 [<ffffffff810259d3>] ? syscall_trace_enter_phase1+0x153/0x180
 [<ffffffff81084226>] SyS_clone+0x16/0x20
 [<ffffffff816cb3ee>] system_call_fastpath+0x12/0x71

results in that splat.

The effective result of the function below is for 'reserver_memtype'
to ignore the result from 'x86_platform.is_untracked_pat_range' quirk.
Which means that the splat above does not happen.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
---
 arch/x86/xen/enlighten.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 8ffb089..3238d04 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -283,6 +283,27 @@ static void __init xen_banner(void)
 	       version >> 16, version & 0xffff, extra.extraversion,
 	       xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : "");
 }
+
+/*
+ * On x86 whenever VMAs are setup, the 'is_ISA_range quirk' (which we
+ * re-implement below) is used to figure whether to ignore the
+ * requested PAT type and always use WB (see 'reserve_memtype').
+ *
+ * The combination of MTRR (UC) and PAT (UC or WB) for the ISA region ends
+ * up with the same value - UC.
+ *
+ * However on Xen, due to XSA 154 we enforce that mappings to _ANY_ MMIO
+ * range MUST have the same the same cachability mapping - and in this case
+ * we enforce UC for everything.
+ *
+ * The effective result of the function below is for 'reserver_memtype'
+ * to ignore the result from 'x86_platform.is_untracked_pat_range' quirk.
+ */
+static bool xen_ignore(u64 s, u64 e)
+{
+	return false;
+}
+
 /* Check if running on Xen version (major, minor) or later */
 bool
 xen_running_on_version_or_later(unsigned int major, unsigned int minor)
@@ -1730,6 +1751,8 @@ asmlinkage __visible void __init xen_start_kernel(void)
 		x86_init.mpparse.get_smp_config = x86_init_uint_noop;
 
 		xen_boot_params_init_edd();
+
+		x86_platform.is_untracked_pat_range = xen_ignore;
 	}
 #ifdef CONFIG_PCI
 	/* PCI BIOS service won't work from a PV guest. */
-- 
2.5.5

> 
> Alan

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ