[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3037CF06-6C1B-4FF1-8BE5-057B60DDCF01@nutanix.com>
Date: Tue, 23 Dec 2025 04:15:54 +0000
From: Jon Kohler <jon@...anix.com>
To: Jon Kohler <jon@...anix.com>
CC: Sean Christopherson <seanjc@...gle.com>,
"pbonzini@...hat.com"
<pbonzini@...hat.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"x86@...nel.org"
<x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
"kvm@...r.kernel.org"
<kvm@...r.kernel.org>,
"linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>,
Mickaël Salaün
<mic@...ikod.net>,
"amit.shah@....com" <amit.shah@....com>
Subject: Re: [RFC PATCH 14/18] KVM: x86/mmu: Extend is_executable_pte to
understand MBEC
> On May 12, 2025, at 10:09 PM, Jon Kohler <jon@...anix.com> wrote:
>
>> On May 12, 2025, at 5:16 PM, Sean Christopherson <seanjc@...gle.com> wrote:
>>
>> On Thu, Mar 13, 2025, Jon Kohler wrote:
>>> @@ -359,15 +360,17 @@ TRACE_EVENT(
>>> __entry->sptep = virt_to_phys(sptep);
>>> __entry->level = level;
>>> __entry->r = shadow_present_mask || (__entry->spte & PT_PRESENT_MASK);
>>> - __entry->x = is_executable_pte(__entry->spte);
>>> + __entry->kx = is_executable_pte(__entry->spte, true, vcpu);
>>> + __entry->ux = is_executable_pte(__entry->spte, false, vcpu);
>>> __entry->u = shadow_user_mask ? !!(__entry->spte & shadow_user_mask) : -1;
>>> ),
>>>
>>> - TP_printk("gfn %llx spte %llx (%s%s%s%s) level %d at %llx",
>>> + TP_printk("gfn %llx spte %llx (%s%s%s%s%s) level %d at %llx",
>>> __entry->gfn, __entry->spte,
>>> __entry->r ? "r" : "-",
>>> __entry->spte & PT_WRITABLE_MASK ? "w" : "-",
>>> - __entry->x ? "x" : "-",
>>> + __entry->kx ? "X" : "-",
>>> + __entry->ux ? "x" : "-",
>>
>> I don't have a better idea, but I do worry that X vs. x will lead to confusion.
>> But as I said, I don't have a better idea...
>
> Rampant confusion on this in our internal review, but it was the best we could
> come up with on the first go-around here (outside of additional rigor on code
> comments, etc) … which certainly don’t help at run/trace time.
I still couldn’t come up with something cleaner than BigX, LittleX here for v1,
But I’m open to feedback if anyones got comments.
>>> __entry->u == -1 ? "" : (__entry->u ? "u" : "-"),
>>> __entry->level, __entry->sptep
>>> )
>>> diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
>>> index 1f7b388a56aa..fd7e29a0a567 100644
>>> --- a/arch/x86/kvm/mmu/spte.h
>>> +++ b/arch/x86/kvm/mmu/spte.h
>>> @@ -346,9 +346,20 @@ static inline bool is_last_spte(u64 pte, int level)
>>> return (level == PG_LEVEL_4K) || is_large_pte(pte);
>>> }
>>>
>>> -static inline bool is_executable_pte(u64 spte)
>>> +static inline bool is_executable_pte(u64 spte, bool for_kernel_mode,
>>
>> s/for_kernel_mode/is_user_access and invert. A handful of KVM comments describe
>> supervisor as "kernel mode", but those are quite old and IMO unnecessarily imprecise.
Ack/done - thanks for the feedback, I’ve integrated this. See what I did in v1,
but I’ve also added a TDP aware version is_executable_pte_fault for fault->is_tdp
as I *think* we actually need to check this to figure out this user access vs non user
access. I might be misinterpreting TDP, so I’d appreciate some sanity checking there.
>>> + struct kvm_vcpu *vcpu)
>>
>> This needs to be an mmu (or maybe a root role?). Hmm, thinking about the page
>> role, I don't think one new bit will suffice. Simply adding ACC_USER_EXEC_MASK
>> won't let KVM differentiate between shadow pages created with ACC_EXEC_MASK for
>> an MMU without MBEC, and a page created explicitly without ACC_USER_EXEC_MASK
>> for an MMU *with* MBEC.
>>
>> What I'm not sure about is if MBEC/GMET support needs to be captured in the base
>> page role, or if it shoving it in kvm_mmu_extended_role will suffice. I'll think
>> more on this and report back, need to refresh all the shadowing paging stuff, again...
Sold! I made this part of kvm_mmu_page_role for access bit and also a new “has_mbec”
which both helped simplified the overall work (thanks!) and made it cleaner to enable
>>> {
>>> - return (spte & (shadow_x_mask | shadow_nx_mask)) == shadow_x_mask;
>>> + u64 x_mask = shadow_x_mask;
>>> +
>>> + if (vcpu->arch.pt_guest_exec_control) {
>>> + x_mask |= shadow_ux_mask;
>>> + if (for_kernel_mode)
>>> + x_mask &= ~VMX_EPT_USER_EXECUTABLE_MASK;
>>> + else
>>> + x_mask &= ~VMX_EPT_EXECUTABLE_MASK;
>>> + }
>>
>> This is going to get messy when GMET support comes along, because the U/S bit
>> would need to be inverted to do the right thing for supervisor fetches. Rather
>> than trying to shoehorn support into the existing code, I think we should prep
>> for GMET and make the code a wee bit easier to follow in the process. We can
>> even implement the actual GMET semanctics, but guarded with a WARN (emulating
>> GMET isn't a terrible fallback in the event of a KVM bug).
>
> +Amit
>
> We’re on the same page there. In fact, Amit and I have been talking off list about
> GMET with (notionally) this same goal in mind, of trying to make sure we do this in
> such a way where we don’t need to rework the whole thing for GMET.
>
>>
>> if (spte & shadow_nx_mask)
>> return false;
>>
>> if (!role.has_mode_based_exec)
>> return (spte & shadow_x_mask) == shadow_x_mask;
>>
>> if (WARN_ON_ONCE(!shadow_x_mask))
>> return is_user_access || !(spte & shadow_user_mask);
>>
>> return spte & (is_user_access ? shadow_ux_mask : shadow_x_mask);
Thanks for the suggestion on the code side, I’ve integrated that. I haven’t fully
put my teeth into GMET, but Amit I’d appreciate a review of what I’ve done on the
MMU design side for V1 series and let me know if there are GMET prep tweaks that
make sense (or we could just punt it to a future GMET-specific series, either way).
Powered by blists - more mailing lists