lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCJDzU1p_SFNRIJd@google.com>
Date: Mon, 12 May 2025 11:54:05 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Jon Kohler <jon@...anix.com>
Cc: pbonzini@...hat.com, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, 
	dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com, 
	kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	"Mickaël Salaün" <mic@...ikod.net>
Subject: Re: [RFC PATCH 11/18] KVM: VMX: Enhance EPT violation handler for PROT_USER_EXEC

On Thu, Mar 13, 2025, Jon Kohler wrote:
> From: Mickaël Salaün <mic@...ikod.net>
> 
> Add EPT_VIOLATION_PROT_USER_EXEC (6) to reflect the user executable
> permissions of a given address when Intel MBEC is enabled.
> 
> Refactor usage of EPT_VIOLATION_RWX_TO_PROT to understand all of the
> specific bits that are now possible with MBEC.
> 
> Intel SDM 'Exit Qualification for EPT Violations' states the following
> for Bit 6.
>   If the “mode-based execute control” VM-execution control is 0, the
>   value of this bit is undefined. If that control is 1, this bit is
>   the logical-AND of bit 10 in the EPT paging-structure entries used
>   to translate the guest-physical address of the access causing the
>   EPT violation. In this case, it indicates whether the guest-physical
>   address was executable for user-mode linear addresses.
> 
>   Bit 6 is cleared to 0 if (1) the “mode-based execute control”
>   VM-execution control is 1; and (2) either (a) any of EPT
>   paging-structure entries used to translate the guest-physical address
>   of the access causing the EPT violation is not present; or
>   (b) 4-level EPT is in use and the guest-physical address sets any
>   bits in the range 51:48.
> 
> Signed-off-by: Mickaël Salaün <mic@...ikod.net>
> Co-developed-by: Jon Kohler <jon@...anix.com>
> Signed-off-by: Jon Kohler <jon@...anix.com>
> 
> ---
>  arch/x86/include/asm/vmx.h     |  7 ++++---
>  arch/x86/kvm/mmu/paging_tmpl.h | 15 ++++++++++++---
>  arch/x86/kvm/vmx/vmx.c         |  7 +++++--
>  3 files changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index ffc90d672b5d..84c5be416f5c 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -587,6 +587,7 @@ enum vm_entry_failure_code {
>  #define EPT_VIOLATION_PROT_READ		BIT(3)
>  #define EPT_VIOLATION_PROT_WRITE	BIT(4)
>  #define EPT_VIOLATION_PROT_EXEC		BIT(5)
> +#define EPT_VIOLATION_PROT_USER_EXEC	BIT(6)

Ugh, TDX added this as EPT_VIOLATION_EXEC_FOR_RING3_LIN (apparently the TDX module
enables MBEC?).  I like your name a lot better.

>  #define EPT_VIOLATION_PROT_MASK		(EPT_VIOLATION_PROT_READ  | \
>  					 EPT_VIOLATION_PROT_WRITE | \
>  					 EPT_VIOLATION_PROT_EXEC)

Hmm, so I think EPT_VIOLATION_PROT_MASK should include EPT_VIOLATION_PROT_USER_EXEC.
The existing TDX change does not, because unfortunately the bit is undefined if
MBEC is unsupported, but that's easy to solve by unconditionally clearing the bit
in handle_ept_violation().  And then when nested-EPT MBEC support comes along,
handle_ept_violation() can be modified to conditionally clear the bit based on
whether or not the current MMU supports MBEC.

I'll post a patch to include the bit in EPT_VIOLATION_PROT_MASK, and opportunistically
change the name.

> @@ -596,7 +597,7 @@ enum vm_entry_failure_code {
>  #define EPT_VIOLATION_READ_TO_PROT(__epte) (((__epte) & VMX_EPT_READABLE_MASK) << 3)
>  #define EPT_VIOLATION_WRITE_TO_PROT(__epte) (((__epte) & VMX_EPT_WRITABLE_MASK) << 3)
>  #define EPT_VIOLATION_EXEC_TO_PROT(__epte) (((__epte) & VMX_EPT_EXECUTABLE_MASK) << 3)
> -#define EPT_VIOLATION_RWX_TO_PROT(__epte) (((__epte) & VMX_EPT_RWX_MASK) << 3)

Why?  There's no escaping the fact that EXEC, a.k.a. X, is doing double duty as
"exec for all" and "kernel exec".  And KVM has nearly two decades of history
using EXEC/X to refer to "exec for all".  I see no reason to throw all of that
away and discard the intuitive and pervasive RWX logic.

> @@ -510,7 +511,15 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>  		 * Note, pte_access holds the raw RWX bits from the EPTE, not
>  		 * ACC_*_MASK flags!
>  		 */
> -		walker->fault.exit_qualification |= EPT_VIOLATION_RWX_TO_PROT(pte_access);
> +		walker->fault.exit_qualification |=
> +			EPT_VIOLATION_READ_TO_PROT(pte_access);
> +		walker->fault.exit_qualification |=
> +			EPT_VIOLATION_WRITE_TO_PROT(pte_access);
> +		walker->fault.exit_qualification |=
> +			EPT_VIOLATION_EXEC_TO_PROT(pte_access);

IMO, this is a big net negative.  I much prefer the existing code, as it highlights
that USER_EXEC is the oddball.

> +		if (vcpu->arch.pt_guest_exec_control)

This is wrong on multiple fronts.  As mentioned earlier in the series, this is a
property of the MMU (more specifically, the root role), not of the vCPU.

And consulting MBEC support *only* when synthesizing the exit qualifcation is
wrong, because it means pte_access contains bogus data when consumed by
FNAME(gpte_access).  At a glance, FNAME(gpte_access) probably needs to be modified
to take in the page role, e.g. like FNAME(sync_spte) and FNAME(prefetch_gpte)
already adjust the access based on the owning shadow page's access mask.

> +			walker->fault.exit_qualification |=
> +				EPT_VIOLATION_USER_EXEC_TO_PROT(pte_access);
>  	}
>  #endif
>  	walker->fault.address = addr;
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 116910159a3f..0aadfa924045 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5809,7 +5809,7 @@ static int handle_task_switch(struct kvm_vcpu *vcpu)
>  
>  static int handle_ept_violation(struct kvm_vcpu *vcpu)
>  {
> -	unsigned long exit_qualification;
> +	unsigned long exit_qualification, rwx_mask;
>  	gpa_t gpa;
>  	u64 error_code;
>  
> @@ -5839,7 +5839,10 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
>  	error_code |= (exit_qualification & EPT_VIOLATION_ACC_INSTR)
>  		      ? PFERR_FETCH_MASK : 0;
>  	/* ept page table entry is present? */
> -	error_code |= (exit_qualification & EPT_VIOLATION_PROT_MASK)
> +	rwx_mask = EPT_VIOLATION_PROT_MASK;
> +	if (vcpu->arch.pt_guest_exec_control)
> +		rwx_mask |= EPT_VIOLATION_PROT_USER_EXEC;
> +	error_code |= (exit_qualification & rwx_mask)
>  		      ? PFERR_PRESENT_MASK : 0;

As mentioned above, if KVM clears EPT_VIOLATION_PROT_USER_EXEC when it's
undefined, then this can simply use EPT_VIOLATION_PROT_MASK unchanged.

>  
>  	if (error_code & EPT_VIOLATION_GVA_IS_VALID)
> -- 
> 2.43.0
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ