[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9c97d8b4-ae1d-efd0-8414-91a50d22cf12@netscape.net>
Date: Wed, 13 Jul 2022 15:38:16 -0400
From: Chuck Zmudzinski <brchuckz@...scape.net>
To: Juergen Gross <jgross@...e.com>, Jan Beulich <jbeulich@...e.com>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>,
Andy Lutomirski <luto@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
Dan Williams <dan.j.williams@...el.com>,
Tom Lendacky <thomas.lendacky@....com>,
Jane Chu <jane.chu@...cle.com>,
Tianyu Lan <Tianyu.Lan@...rosoft.com>,
Randy Dunlap <rdunlap@...radead.org>,
Sean Christopherson <seanjc@...gle.com>,
xen-devel@...ts.xenproject.org, stable@...r.kernel.org,
linux-kernel@...r.kernel.org,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>
Subject: Re: [PATCH v2] Subject: x86/PAT: Report PAT on CPUs that support PAT
without MTRR
On 7/13/2022 3:22 PM, Chuck Zmudzinski wrote:
> On 7/13/2022 3:07 PM, Chuck Zmudzinski wrote:
> > On 7/13/2022 9:45 AM, Juergen Gross wrote:
> > > >> On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
> > > >> And in addition, if we are going to backport this patch to
> > > >> all current stable branches, we better have a really, really,
> > > >> good reason for changing the behavior of "nopat" on Xen.
> > > >>
> > > >> Does such a reason exist?
> > > >
> > > > Well, the simple reason is: It doesn't work the same way under Xen
> > > > and non-Xen (in turn because, before my patch or whatever equivalent
> > > > work, things don't work properly anyway, PAT-wise). Yet it definitely
> > > > ought to behave the same everywhere, imo.
> > >
> > > There is Documentation/x86/pat.rst which rather clearly states, how
> > > "nopat" is meant to work. It should not change the contents of the
> > > PAT MSR and keep it just as it was set at boot time (the doc talks
> > > about the "BIOS" setting of the MSR, and I guess in the Xen case
> > > the hypervisor is kind of acting as the BIOS).
> > >
> > > The question is, whether "nopat" needs to be translated to
> > > pat_enabled() returning "false".
> >
> > When I started working on a re-factoring effort of the logic
> > surrounding pat_enabled(), I noticed there are five different
> > reasons in the current code for setting pat_disabled to true,
> > which IMO is what should be a redundant variable that should
> > always be equal !pat_enabled() and !pat_bp_enabled, but that
> > unfortunately is not the case. The five reasons for setting
> > pat_disabled to true are given as message strings:
> >
> > 1. "MTRRs disabled, skipping PAT initialization too."
> > 2. "PAT support disabled because CONFIG_MTRR is disabled in the kernel."
> > 3. "PAT support disabled via boot option."
> > 4. "PAT not supported by the CPU."
> > 5. "PAT support disabled by the firmware."
> >
> > The only effect of setting pat_disabled to true is to inhibit
> > the execution of pat_init(), but it does not inhibit the execution
> > of init_cache_modes(), which is for handling all these cases
> > when pat_init() was skipped. The Xen case is one of those
> > cases, so in the Xen case, pat_disabled will be true yet the
> > only way to fix the current regression and the five-year-old
> > commit is by setting pat_bp_enabled to true so pat_enabled()
> > will return true. So to fix the five-year-old commit, we must have
> >
> > pat_enabled() != pat_disabled
> >
> > Something is wrong with this logic, that is why I wanted to precede
> > my fix with some re-factoring that will change some variable
> > and function names and modify some comments before trying
> > to fix the five-year-old commit, so that we will never have a situation
> > when pat_enabled() != pat_disabled.
> >
> > Chuck
> Sorry, I meant to say,
>
> To fix the five-year-old commit, we must have
>
> pat_enabled() != !pat_disabled or pat_enabled() == pat_disabled,
>
> and there is something wrong with that logic.
>
> Chuck
So to summarize, I think this means that to be comfortable
fixing the five-year-old commit and the current regression
by artificially setting pat_bp_enabled and pat_enabled() to
true, something which both my patch and Jan's patch does,
we need to come to a new understanding of what the
static boolean variable pat_disabled in
arch/x86/mm/pat/memtype.c in the code really means.
The fact is, we have a regression and the only fix we
can find is to try to make pat_enabled() == pat_disabled
I need to stop thinking about this for a while. It is time
for those who have authority to fix this regression to
make some comments about how they think this should
be fixed.
Chuck
Powered by blists - more mailing lists