linux-kernel - Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e818a621-dfeb-cde2-332c-ba1e4a632192@netscape.net>
Date:   Mon, 11 Jul 2022 07:38:35 -0400
From:   Chuck Zmudzinski <brchuckz@...scape.net>
To:     Borislav Petkov <bp@...en8.de>, Jan Beulich <jbeulich@...e.com>
Cc:     Andrew Lutomirski <luto@...nel.org>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>,
        lkml <linux-kernel@...r.kernel.org>,
        "xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when
 running on e.g. Xen

On 7/5/22 12:14 PM, Borislav Petkov wrote:
> On Tue, Jul 05, 2022 at 05:56:36PM +0200, Jan Beulich wrote:
> > Re-using pat_disabled like you do in your suggestion below won't
> > work, because mtrr_bp_init() calls pat_disable() when MTRRs
> > appear to be disabled (from the kernel's view). The goal is to
> > honor "nopat" without honoring any other calls to pat_disable().

I think Jan is speaking of the narrow goal of the patch
that is causing the regression at hand:

Commit bdd8b6c98239cad3a976d6f197afc2c794d3cef8
("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")

The author of that commit expressed the desire to
more readily handle the nopat option in Linux in an
architecture-independent way. in the commit message.
The patch was not intended to cause a functional
change, but it did - it caused a regression in Xen
Dom0s running certain Intel graphics devices.

>
> Actually, the current goal is to adjust Xen dom0 because:
>
> 1. it uses the PAT code
>
> 2. but then it does something special and hides the MTRRs
>
> which is not something real hardware does.
>
> So this one-off thing should be prominent, visible and not get in the
> way.

Actually, this is probably the most insightful comment in all
the words that have been written about this regression. This
is a one-off thing, peculiar to Xen PV, but it is not visible
enough and gets overlooked when changes are made. I guess
I agree it should not get in the way either, and it doesn't,
except when the lack of visibility of this one-off thing causes
the author of a patch to overlook it and cause unforeseen
problems for users of Xen on Linux. That is precisely what
happened five years ago with commit 99c13b8c8896d7bcb92753bf
("x86/mm/pat: Don't report PAT on CPUs that don't support it")

Have you looked at that patch? That is the one that introduced
the regression that causes pat_enabled() to return a false negative
on Xen Dom0 today, and it hid in the code for four and a half years
and was only exposed as a regression with commit
bdd8b6c98239cad3a9 last December, which again was written in
a way that caused the regression on Xen because this one-off
thing that Xen does was not visible enough to the author of the
recent patch from last December that exposed this problem as a
regression with Xen PV domains and certain Intel graphics adpapters.

That patch did not take into account this not-visible-enough Xen
case when MTRR is disabled but PAT is still supported by the
CPU in Xen PV domains. So the proper way to fix this regression
is by fixing that commit from five years ago. It is very simple.
After some code re-factoring and other changes, today that
commit can be fixed by something like:

--- a/arch/x86/mm/pat/memtype.c    2022-06-16 07:32:05.000000000 -0400
+++ b/arch/x86/mm/pat/memtype.c    2022-07-09 17:51:56.783050765 -0400
@@ -315,6 +315,20 @@
               PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
     }

+    else if (!pat_bp_enabled) {
+        /*
+         * In some environments, specifically Xen PV, PAT
+         * initialization is skipped because MTRRs are
+         * disabled even though PAT is available. In such
+         * environments, set PAT to enabled to correctly
+         * indicate to callers of pat_enabled() that CPU
+         * support for PAT is available.
+         */
+        pat_bp_enabled = true;
+        pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
+    }
+
     __init_cache_modes(pat);
 }

Like Jan's approach, it implements the fix in init_cache_modes(),
but unlike Jan's approach, it only sets pat_bp_enabled and
pat_bp_initialized to true if boot_cpu_has(X86_FEATURE_PAT)
is true and rdmsrl(MSR_IA32_CR_PAT, pat) returns a valid
value. No need to check for a hypervisor, just check if
the CPU supports PAT here and that PAT MSR returns something
valid here. If both are true, then set pat_bp_enabled to true.
Regression solved.

And that leaves the bigger goal of dealing with this one-off
thing that Xen does in a more sane way for another day. I
am working on a patch series that attempts to start the process
by first re-factoring the currently confusing pat_disable functions
and variables that will hopefully make this one-off Xen thing
more visible and easier to understand so when someone wants
to play around with the current way of deciding whether or
not PAT is enabled on X86, no regression will occur on Xen
or in any other environment.

Best Regards,

Chuck