[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7ff738f2-716f-f42a-2f92-3a7d94722d33@suse.com>
Date: Tue, 12 Jul 2022 17:30:02 +0200
From: Juergen Gross <jgross@...e.com>
To: Chuck Zmudzinski <brchuckz@...scape.net>,
Jan Beulich <jbeulich@...e.com>
Cc: Andrew Lutomirski <luto@...nel.org>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
lkml <linux-kernel@...r.kernel.org>,
"xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when
running on e.g. Xen
On 12.07.22 17:09, Chuck Zmudzinski wrote:
> On 7/12/2022 9:32 AM, Juergen Gross wrote:
>> On 12.07.22 15:22, Chuck Zmudzinski wrote:
>>> On 7/12/2022 2:04 AM, Jan Beulich wrote:
>>>> On 11.07.2022 19:41, Chuck Zmudzinski wrote:
>>>>> Moreover... (please move to the bottom of the code snippet
>>>>> for more information about my tests in the Xen PV environment...)
>>>>>
>>>>> void init_cache_modes(void)
>>>>> {
>>>>> u64 pat = 0;
>>>>>
>>>>> if (pat_cm_initialized)
>>>>> return;
>>>>>
>>>>> if (boot_cpu_has(X86_FEATURE_PAT)) {
>>>>> /*
>>>>> * CPU supports PAT. Set PAT table to be consistent with
>>>>> * PAT MSR. This case supports "nopat" boot option, and
>>>>> * virtual machine environments which support PAT without
>>>>> * MTRRs. In specific, Xen has unique setup to PAT MSR.
>>>>> *
>>>>> * If PAT MSR returns 0, it is considered invalid and emulates
>>>>> * as No PAT.
>>>>> */
>>>>> rdmsrl(MSR_IA32_CR_PAT, pat);
>>>>> }
>>>>>
>>>>> if (!pat) {
>>>>> /*
>>>>> * No PAT. Emulate the PAT table that corresponds to the two
>>>>> * cache bits, PWT (Write Through) and PCD (Cache Disable).
>>>>> * This setup is also the same as the BIOS default setup.
>>>>> *
>>>>> * PTE encoding:
>>>>> *
>>>>> * PCD
>>>>> * |PWT PAT
>>>>> * || slot
>>>>> * 00 0 WB : _PAGE_CACHE_MODE_WB
>>>>> * 01 1 WT : _PAGE_CACHE_MODE_WT
>>>>> * 10 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
>>>>> * 11 3 UC : _PAGE_CACHE_MODE_UC
>>>>> *
>>>>> * NOTE: When WC or WP is used, it is redirected to UC- per
>>>>> * the default setup in __cachemode2pte_tbl[].
>>>>> */
>>>>> pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
>>>>> PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
>>>>> }
>>>>>
>>>>> else if (!pat_bp_enabled) {
>>>>> /*
>>>>> * In some environments, specifically Xen PV, PAT
>>>>> * initialization is skipped because MTRRs are
>>>>> * disabled even though PAT is available. In such
>>>>> * environments, set PAT to initialized and enabled to
>>>>> * correctly indicate to callers of pat_enabled() that
>>>>> * PAT is available and prevent PAT from being disabled.
>>>>> */
>>>>> pat_bp_enabled = true;
>>>>> pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
>>>>> }
>>>>>
>>>>> __init_cache_modes(pat);
>>>>> }
>>>>>
>>>>> This function, patched with the extra 'else if' block, fixes the
>>>>> regression on my Xen worksatation, and the pr_info message
>>>>> "x86/PAT: PAT enabled by init_cache_modes" appears in the logs
>>>>> when running this patched kernel in my Xen Dom0. This means
>>>>> that in the Xen PV environment on my Xen Dom0 workstation,
>>>>> rdmsrl(MSR_IA32_CR_PAT, pat) successfully tested for the presence
>>>>> of PAT on the virtual CPU that Xen exposed to the Linux kernel on my
>>>>> Xen Dom0 workstation. At least that is what I think my tests prove.
>>>>>
>>>>> So why is this not a valid way to test for the existence of
>>>>> PAT in the Xen PV environment? Are the existing comments
>>>>> in init_cache_modes() about supporting both the case when
>>>>> the "nopat" boot option is set and the specific case of Xen and
>>>>> MTRR disabled wrong? My testing confirms those comments are
>>>>> correct.
>>>>
>>>> At the very least this ignores the possible "nopat" an admin may
>>>> have passed to the kernel.
>>>
>>> I realize that. The patch I proposed here only fixes the regression. It
>>> would be easy to also modify the patch to also observe the 'nopat"
>>> setting. I think your patch had a force_pat_disable local variable that
>>> is set if pat is disabled by the administrator with "nopat." With that
>>> variable available, modifying the patch so in init_cache_modes we have:
>>>
>>> if (!pat || force_pat_disable) {
>>> /*
>>> * No PAT. Emulate the PAT table that corresponds to the two
>>>
>>> Instead of:
>>>
>>> if (!pat) {
>>> /*
>>> * No PAT. Emulate the PAT table that corresponds to the two
>>>
>>> would cause the kernel to respect the "nopat" setting by the administrator
>>> in the Xen PV Dom0 environment.
>>
>> Chuck, could you please send out a proper patch with your initial fix
>> (setting pat_bp_enabled) and the fix above?
>>
>> I've chatted with Boris Petkov on IRC and he is fine with that.
>
> That's great, I will submit a formal patch later today.
>
>>
>>> I agree this needs to be fixed up, because currently the code is very
>>> confusing and the current variable names and function names do not
>>> always accurately describe what they actually do in the code. That is
>>> why I am working on a patch to do some re-factoring, which only consists
>>> of function and variable name changes and comment changes to fix
>>> the places where the comments in the code are misleading or incomplete.
>>
>> Boris and I agreed to pursue my approach further by removing the
>> dependency between PAT and MTRR and to make this whole mess more
>> clear.
>>
>> I will start to work on this as soon as possible, which will
>> probably be some time in September.
>
> Good, I will look for your patches and try them out.
>
>>
>>> I think perhaps the most misnamed variable here is the local
>>> variable pat_disabled in memtypes.c and the most misnamed function is the
>>> pat_disable function in memtypes.c. They should be named pat_init_disabled
>>> and pat_init_disable, respectively, because they do not really disable
>>> PAT in
>>> the code but only prevent execution of the pat_init function. That
>>> leaves open
>>> the possibility for PAT to be enabled by init_cache_modes, which actually
>>> occurs in the current code in the Xen PV Dom0 environment, but the current
>>> code neglects to set pat_bp_enabled to true in that case. So we need a patch
>>> to fix that in order to fix the regression.
>>
>> In principle I agree, but you should be aware of my refactoring plans.
>
> I will defer to you and stop working on this re-factoring effort, but I
> will prepare a formal patch to fix the regression later today.
>
> I do think Jan's point about respecting the administrator's "nopat" setting
> should also be considered. AFAICT, the "nopat" option in current code
Yes, please add that, too. This was what I meant with "the fix above".
> is also not being respected on the bare metal, and the patch to
> init_cache_modes with a force_no_pat variable is also needed to
> ensure "nopat" is respected on the bare metal, AFAICT.
Hmm, I don't see how the PAT MSR will be written on bare metal when "nopat"
has been specified.
I just tried it with a 5.19 kernel and it booted with the correct PAT
settings:
[ 0.000000] x86/PAT: PAT support disabled via boot option.
[ 0.000986] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
Juergen
Download attachment "OpenPGP_0xB0DE9DD628BF132F.asc" of type "application/pgp-keys" (3099 bytes)
Download attachment "OpenPGP_signature" of type "application/pgp-signature" (496 bytes)
Powered by blists - more mailing lists