[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45fd7ade-61fe-18fa-aacc-ed80fdf3fb8d@suse.com>
Date: Tue, 12 Jul 2022 15:32:37 +0200
From: Juergen Gross <jgross@...e.com>
To: Chuck Zmudzinski <brchuckz@...scape.net>,
Jan Beulich <jbeulich@...e.com>
Cc: Andrew Lutomirski <luto@...nel.org>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
lkml <linux-kernel@...r.kernel.org>,
"xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when
running on e.g. Xen
On 12.07.22 15:22, Chuck Zmudzinski wrote:
> On 7/12/2022 2:04 AM, Jan Beulich wrote:
>> On 11.07.2022 19:41, Chuck Zmudzinski wrote:
>>> Moreover... (please move to the bottom of the code snippet
>>> for more information about my tests in the Xen PV environment...)
>>>
>>> void init_cache_modes(void)
>>> {
>>> u64 pat = 0;
>>>
>>> if (pat_cm_initialized)
>>> return;
>>>
>>> if (boot_cpu_has(X86_FEATURE_PAT)) {
>>> /*
>>> * CPU supports PAT. Set PAT table to be consistent with
>>> * PAT MSR. This case supports "nopat" boot option, and
>>> * virtual machine environments which support PAT without
>>> * MTRRs. In specific, Xen has unique setup to PAT MSR.
>>> *
>>> * If PAT MSR returns 0, it is considered invalid and emulates
>>> * as No PAT.
>>> */
>>> rdmsrl(MSR_IA32_CR_PAT, pat);
>>> }
>>>
>>> if (!pat) {
>>> /*
>>> * No PAT. Emulate the PAT table that corresponds to the two
>>> * cache bits, PWT (Write Through) and PCD (Cache Disable).
>>> * This setup is also the same as the BIOS default setup.
>>> *
>>> * PTE encoding:
>>> *
>>> * PCD
>>> * |PWT PAT
>>> * || slot
>>> * 00 0 WB : _PAGE_CACHE_MODE_WB
>>> * 01 1 WT : _PAGE_CACHE_MODE_WT
>>> * 10 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
>>> * 11 3 UC : _PAGE_CACHE_MODE_UC
>>> *
>>> * NOTE: When WC or WP is used, it is redirected to UC- per
>>> * the default setup in __cachemode2pte_tbl[].
>>> */
>>> pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
>>> PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
>>> }
>>>
>>> else if (!pat_bp_enabled) {
>>> /*
>>> * In some environments, specifically Xen PV, PAT
>>> * initialization is skipped because MTRRs are
>>> * disabled even though PAT is available. In such
>>> * environments, set PAT to initialized and enabled to
>>> * correctly indicate to callers of pat_enabled() that
>>> * PAT is available and prevent PAT from being disabled.
>>> */
>>> pat_bp_enabled = true;
>>> pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
>>> }
>>>
>>> __init_cache_modes(pat);
>>> }
>>>
>>> This function, patched with the extra 'else if' block, fixes the
>>> regression on my Xen worksatation, and the pr_info message
>>> "x86/PAT: PAT enabled by init_cache_modes" appears in the logs
>>> when running this patched kernel in my Xen Dom0. This means
>>> that in the Xen PV environment on my Xen Dom0 workstation,
>>> rdmsrl(MSR_IA32_CR_PAT, pat) successfully tested for the presence
>>> of PAT on the virtual CPU that Xen exposed to the Linux kernel on my
>>> Xen Dom0 workstation. At least that is what I think my tests prove.
>>>
>>> So why is this not a valid way to test for the existence of
>>> PAT in the Xen PV environment? Are the existing comments
>>> in init_cache_modes() about supporting both the case when
>>> the "nopat" boot option is set and the specific case of Xen and
>>> MTRR disabled wrong? My testing confirms those comments are
>>> correct.
>>
>> At the very least this ignores the possible "nopat" an admin may
>> have passed to the kernel.
>
> I realize that. The patch I proposed here only fixes the regression. It
> would be easy to also modify the patch to also observe the 'nopat"
> setting. I think your patch had a force_pat_disable local variable that
> is set if pat is disabled by the administrator with "nopat." With that
> variable available, modifying the patch so in init_cache_modes we have:
>
> if (!pat || force_pat_disable) {
> /*
> * No PAT. Emulate the PAT table that corresponds to the two
>
> Instead of:
>
> if (!pat) {
> /*
> * No PAT. Emulate the PAT table that corresponds to the two
>
> would cause the kernel to respect the "nopat" setting by the administrator
> in the Xen PV Dom0 environment.
Chuck, could you please send out a proper patch with your initial fix
(setting pat_bp_enabled) and the fix above?
I've chatted with Boris Petkov on IRC and he is fine with that.
> I agree this needs to be fixed up, because currently the code is very
> confusing and the current variable names and function names do not
> always accurately describe what they actually do in the code. That is
> why I am working on a patch to do some re-factoring, which only consists
> of function and variable name changes and comment changes to fix
> the places where the comments in the code are misleading or incomplete.
Boris and I agreed to pursue my approach further by removing the
dependency between PAT and MTRR and to make this whole mess more
clear.
I will start to work on this as soon as possible, which will
probably be some time in September.
> I think perhaps the most misnamed variable here is the local
> variable pat_disabled in memtypes.c and the most misnamed function is the
> pat_disable function in memtypes.c. They should be named pat_init_disabled
> and pat_init_disable, respectively, because they do not really disable
> PAT in
> the code but only prevent execution of the pat_init function. That
> leaves open
> the possibility for PAT to be enabled by init_cache_modes, which actually
> occurs in the current code in the Xen PV Dom0 environment, but the current
> code neglects to set pat_bp_enabled to true in that case. So we need a patch
> to fix that in order to fix the regression.
In principle I agree, but you should be aware of my refactoring plans.
Juergen
Download attachment "OpenPGP_0xB0DE9DD628BF132F.asc" of type "application/pgp-keys" (3099 bytes)
Download attachment "OpenPGP_signature" of type "application/pgp-signature" (496 bytes)
Powered by blists - more mailing lists