linux-kernel - Re: [PATCH v5 3/3] KVM: x86: add new nested vmexit tracepointsg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z2M7GajddaBqDFnC@google.com>
Date: Wed, 18 Dec 2024 13:14:01 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Maxim Levitsky <mlevitsk@...hat.com>
Cc: kvm@...r.kernel.org, x86@...nel.org, 
	Dave Hansen <dave.hansen@...ux.intel.com>, Thomas Gleixner <tglx@...utronix.de>, 
	Borislav Petkov <bp@...en8.de>, Paolo Bonzini <pbonzini@...hat.com>, Ingo Molnar <mingo@...hat.com>, 
	"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 3/3] KVM: x86: add new nested vmexit tracepointsg

On Tue, Sep 10, 2024, Maxim Levitsky wrote:
> Add 3 new tracepoints for nested VM exits which are intended
> to capture extra information to gain insights about the nested guest
> behavior.
> 
> The new tracepoints are:
> 
> - kvm_nested_msr
> - kvm_nested_hypercall

I 100% agree that not having register state in the exit tracepoints is obnoxious,
but I don't think we should add one-off tracepoints for the most annoying cases.
I would much prefer to figure out a way to capture register state in kvm_entry
and kvm_exit.  E.g. I've lost track of the number of times I've observed an MSR
exit without having trace_kvm_msr enabled.

One idea would be to capture E{A,B,C,D}X, which would cover MSRs, CPUID, and
most hypercalls.  And then we might even be able to drop the dedicated MSR and
CPUID tracepoints (not sure if that's a good idea).

Side topic, arch/s390/kvm/trace.h has the concept of COMMON information that is
captured for multiple tracepoints.  I haven't looked closely, but I gotta imagine
we can/should use a similar approach for x86.

> These tracepoints capture extra register state to be able to know
> which MSR or which hypercall was done.
> 
> - kvm_nested_page_fault
> 
> This tracepoint allows to capture extra info about which host pagefault
> error code caused the nested page fault.

The host error code, a.k.a. qualification info, is readily available in the
kvm_exit (or nested variant) tracepoint.  I don't letting userspace skip a
tracepoint that's probably already enabled is worth the extra code to support
this tracepoint.  The nested_svm_inject_npf_exit() code in particular is wonky,
and I think it's a good example of why userspace "needs" trace_kvm_exit, e.g. to
observe that a nested stage-2 page fault didn't originate from a hardware stage-2
fault.