[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160819152707.GC26577@char.us.oracle.com>
Date: Fri, 19 Aug 2016 11:27:07 -0400
From: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
To: One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>
Cc: Jan Beulich <JBeulich@...e.com>,
Andrew Cooper <andrew.cooper3@...rix.com>,
stefan.bader@...onical.com, david.vrabel@...rix.com,
xen-devel <xen-devel@...ts.xenproject.org>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
chuck.anderson@...cle.com, Juergen Gross <JGross@...e.com>,
linux-kernel@...r.kernel.org
Subject: Re: [Xen-devel] XSA 154 and ISA region (640K -> 1MB) WB cache
instead of UC
On Thu, Aug 18, 2016 at 04:35:44PM +0100, One Thousand Gnomes wrote:
> On Thu, 18 Aug 2016 05:12:54 -0600
> "Jan Beulich" <JBeulich@...e.com> wrote:
>
> > >>> On 18.08.16 at 12:16, <andrew.cooper3@...rix.com> wrote:
> > > On 18/08/16 11:06, Jan Beulich wrote:
> > >>>>> On 17.08.16 at 22:32, <konrad.wilk@...cle.com> wrote:
> > >>> Looking at the kernel it assumes that WB is ok for 640KB->1MB.
> > >>> The comment says:
> > >>> " /* Low ISA region is always mapped WB in page table. No need to track
> > > *"
> > >> As per above it's not clear to me what this comment is backed by.
> > >
> > > This states what is in the pagetables. Not the combined result with MTRRs.
> > >
> > > WB in the pagetables and WC/UB in the MTRRs is a legal combination which
> > > functions correctly.
> >
> > True, but then again - haven't I been told multiple times that Linux
> > nowadays prefers to run without using MTRRs?
>
> The BIOS sets up the fixed MTRR registers for the 640K-1MB window. Those
> are separate to the variable range MTRR registers used for main memory
> with specific mappings for segments A000 to BFFF then C000-C7FF /
> C800-CFFF / etc up to FFFF.
OK, so BIOS-inherited.
Looking at the Intel SDM (figure 11-7), if the MTRR is UC for that, then
having pagetables being either UC or WB are fine. Except Linux's use
of the quirk (is_untracked_pat_range) ends up always requesting WB.
And to combat the splat, the patch:
>From 5209635f23786fb88cf0ce77719da8acda63bf65 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
Date: Fri, 19 Aug 2016 11:06:44 -0400
Subject: [PATCH] x86/xen: Add x86_platform.is_untracked_pat_range quirk to
ignore ISA regions.
On x86 whenever VMAs are setup, the 'is_ISA_range quirk' (which this
patch re-implements) is used to figure whether to ignore the
requested PAT type and always use WB (see 'reserve_memtype').
Specifically it forces the WB type for any region in the ISA space.
>From the Intel SDM, the combination of MTRR (UC, which is setup by
the BIOS) and PAT (UC or WB) for the ISA region ends up with the same
value - UC.
However on Xen, due to XSA 154 we enforce that mappings that _ANY_
pagetable entry to MMIO ranges MUST have the same the same cachability
mapping - and in this case we enforce UC.
Which means that with XSA 154 (and without this patch) any application
that maps /dev/mem to get SMBIOS information (like mcelog), and pokes
in the ISA region will not have an PTE set. That is due to
reserve_pfn_range returning -EINVAL which results in the PTE not being set.
[These are debug entries added in 'reserve_pfn_range']
mcelog:2471 0xf0000->0xf1000, req_type=write-back new_type=write-back
mcelog:2471 0xeb000->0xed000, req_type=write-back new_type=write-back
.. above are successfull ones, but:
mcelog:2471 0xeb000->0xed000, req_type=uncached new_type=uncached
[again, a debug one:]
mcelog:2471 want=uncached got=write-back strict 0x000eb000-0x000ecfff
mcelog:2471 map pfn expected mapping type uncached for [mem 0x000eb000-0x000ecfff], got write-back
------------[ cut here ]------------
[<ffffffff816c66f0>] dump_stack+0x63/0x83
[<ffffffff81084745>] warn_slowpath_common+0x95/0xe0
[<ffffffff810847aa>] warn_slowpath_null+0x1a/0x20
[<ffffffff810725f3>] untrack_pfn+0x93/0xc0
[<ffffffff811b90f9>] unmap_single_vma+0xa9/0x100
[<ffffffff811b9644>] unmap_vmas+0x54/0xa0
[<ffffffff811bf0da>] exit_mmap+0x9a/0x150
[<ffffffff810825d3>] mmput+0x73/0x110
[<ffffffff81082775>] dup_mm+0x105/0x110
[<ffffffff81083b1d>] copy_process+0x11ed/0x1240
[<ffffffff81084009>] do_fork+0x79/0x280
[<ffffffff810259d3>] ? syscall_trace_enter_phase1+0x153/0x180
[<ffffffff81084226>] SyS_clone+0x16/0x20
[<ffffffff816cb3ee>] system_call_fastpath+0x12/0x71
results in that splat.
The effective result of the function below is for 'reserver_memtype'
to ignore the result from 'x86_platform.is_untracked_pat_range' quirk.
Which means that the splat above does not happen.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>
---
arch/x86/xen/enlighten.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 8ffb089..3238d04 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -283,6 +283,27 @@ static void __init xen_banner(void)
version >> 16, version & 0xffff, extra.extraversion,
xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : "");
}
+
+/*
+ * On x86 whenever VMAs are setup, the 'is_ISA_range quirk' (which we
+ * re-implement below) is used to figure whether to ignore the
+ * requested PAT type and always use WB (see 'reserve_memtype').
+ *
+ * The combination of MTRR (UC) and PAT (UC or WB) for the ISA region ends
+ * up with the same value - UC.
+ *
+ * However on Xen, due to XSA 154 we enforce that mappings to _ANY_ MMIO
+ * range MUST have the same the same cachability mapping - and in this case
+ * we enforce UC for everything.
+ *
+ * The effective result of the function below is for 'reserver_memtype'
+ * to ignore the result from 'x86_platform.is_untracked_pat_range' quirk.
+ */
+static bool xen_ignore(u64 s, u64 e)
+{
+ return false;
+}
+
/* Check if running on Xen version (major, minor) or later */
bool
xen_running_on_version_or_later(unsigned int major, unsigned int minor)
@@ -1730,6 +1751,8 @@ asmlinkage __visible void __init xen_start_kernel(void)
x86_init.mpparse.get_smp_config = x86_init_uint_noop;
xen_boot_params_init_edd();
+
+ x86_platform.is_untracked_pat_range = xen_ignore;
}
#ifdef CONFIG_PCI
/* PCI BIOS service won't work from a PV guest. */
--
2.5.5
>
> Alan
Powered by blists - more mailing lists