linux-kernel - Re: [PATCH v5 3/3] KVM: x86: add new nested vmexit tracepoints

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4c1c999c29809c683cc79bc8c77cbe5d7eca37b7.camel@redhat.com>
Date: Thu, 19 Dec 2024 12:49:46 -0500
From: Maxim Levitsky <mlevitsk@...hat.com>
To: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org
Cc: x86@...nel.org, Dave Hansen <dave.hansen@...ux.intel.com>, Thomas
 Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>, Ingo Molnar
 <mingo@...hat.com>, Sean Christopherson <seanjc@...gle.com>, "H. Peter
 Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 3/3] KVM: x86: add new nested vmexit tracepoints

On Thu, 2024-12-19 at 18:33 +0100, Paolo Bonzini wrote:
> On 9/10/24 22:03, Maxim Levitsky wrote:
> > Add 3 new tracepoints for nested VM exits which are intended
> > to capture extra information to gain insights about the nested guest
> > behavior.
> > 
> > The new tracepoints are:
> > 
> > - kvm_nested_msr
> > - kvm_nested_hypercall
> > 
> > These tracepoints capture extra register state to be able to know
> > which MSR or which hypercall was done.
> > 
> > - kvm_nested_page_fault
> > 
> > This tracepoint allows to capture extra info about which host pagefault
> > error code caused the nested page fault.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@...hat.com>
> > ---
> >   arch/x86/kvm/svm/nested.c | 22 +++++++++++
> >   arch/x86/kvm/trace.h      | 82 +++++++++++++++++++++++++++++++++++++--
> >   arch/x86/kvm/vmx/nested.c | 27 +++++++++++++
> >   arch/x86/kvm/x86.c        |  3 ++
> >   4 files changed, 131 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> > index 6f704c1037e51..2020307481553 100644
> > --- a/arch/x86/kvm/svm/nested.c
> > +++ b/arch/x86/kvm/svm/nested.c
> > @@ -38,6 +38,8 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
> >   {
> >   	struct vcpu_svm *svm = to_svm(vcpu);
> >   	struct vmcb *vmcb = svm->vmcb;
> > +	u64 host_error_code = vmcb->control.exit_info_1;
> > +
> >   
> >   	if (vmcb->control.exit_code != SVM_EXIT_NPF) {
> >   		/*
> > @@ -48,11 +50,15 @@ static void nested_svm_inject_npf_exit(struct kvm_vcpu *vcpu,
> >   		vmcb->control.exit_code_hi = 0;
> >   		vmcb->control.exit_info_1 = (1ULL << 32);
> >   		vmcb->control.exit_info_2 = fault->address;
> > +		host_error_code = 0;
> >   	}
> >   
> >   	vmcb->control.exit_info_1 &= ~0xffffffffULL;
> >   	vmcb->control.exit_info_1 |= fault->error_code;
> >   
> > +	trace_kvm_nested_page_fault(fault->address, host_error_code,
> > +				    fault->error_code);
> > +
> 
> I disagree with Sean about trace_kvm_nested_page_fault.  It's a useful 
> addition and it is easier to understand what's happening with a 
> dedicated tracepoint (especially on VMX).
> 
> Tracepoint are not an exact science and they aren't entirely kernel API. 
>   At least they can just go away at any time (changing them is a lot 
> more tricky, but their presence is not guaranteed).  The one below has 
> the slight ugliness of having to do some computation in 
> nested_svm_vmexit(), this one should go in.
> 
> >   	nested_svm_vmexit(svm);
> >   }
> >   
> > @@ -1126,6 +1132,22 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
> >   				       vmcb12->control.exit_int_info_err,
> >   				       KVM_ISA_SVM);
> >   
> > +	/* Collect some info about nested VM exits */
> > +	switch (vmcb12->control.exit_code) {
> > +	case SVM_EXIT_MSR:
> > +		trace_kvm_nested_msr(vmcb12->control.exit_info_1 == 1,
> > +				     kvm_rcx_read(vcpu),
> > +				     (vmcb12->save.rax & 0xFFFFFFFFull) |
> > +				     (((u64)kvm_rdx_read(vcpu) << 32)));
> > +		break;
> > +	case SVM_EXIT_VMMCALL:
> > +		trace_kvm_nested_hypercall(vmcb12->save.rax,
> > +					   kvm_rbx_read(vcpu),
> > +					   kvm_rcx_read(vcpu),
> > +					   kvm_rdx_read(vcpu));
> > +		break;
> 
> Here I probably would have preferred an unconditional tracepoint giving 
> RAX/RBX/RCX/RDX after a nested vmexit.  This is not exactly what Sean 
> wanted but perhaps it strikes a middle ground?  I know you wrote this 
> for a debugging tool, do you really need to have everything in a single 
> tracepoint, or can you correlate the existing exit tracepoint with this 
> hypothetical trace_kvm_nested_exit_regs, to pick RDMSR vs. WRMSR?


Hi!

If the new trace_kvm_nested_exit_regs tracepoint has a VM exit number argument, then
I can enable this new tracepoint twice with a different filter (vm_exit_num number == msr and vm_exit_num == vmcall),
and each instance will count the events that I need.

So this can work.

Thanks!
Best regards,
	Maxim Levitsky

> 
> Paolo
>