lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 5 May 2022 00:26:58 +0000
From:   Jon Kohler <jon@...anix.com>
To:     Sean Christopherson <seanjc@...gle.com>
CC:     Jon Kohler <jon@...anix.com>, Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] KVM: X86: correct trace_kvm_pv_tlb_flush stats



> On May 4, 2022, at 5:47 PM, Sean Christopherson <seanjc@...gle.com> wrote:
> 
> On Wed, May 04, 2022, Jon Kohler wrote:
>> The trace point in record_steal_time() is above the conditional
>> that fires kvm_vcpu_flush_tlb_guest(), so even when we might
>> not be flushing tlb, we still record that we are.
> 
> No, it records whether not a TLB flush is needed.

Sure, the trace does, but the stat makes it seem like the host is going
nuts with doing pv tlb flushes when in reality it may not really be
doing all that much work.

> 
>> Fix by nestling trace_kvm_pv_tlb_flush() under appropriate
>> conditional. This results in the stats for kvm:kvm_pv_tlb_flush,
>> as trivially observable by perf stat -e "kvm:*" -a sleep Xs, in
>> reporting the amount of times we actually do a pv tlb flush,
>> instead of just the amount of times we happen to call
>> record_steal_time().
>> 
>> Signed-off-by: Jon Kohler <jon@...anix.com>
>> ---
>> arch/x86/kvm/x86.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 4790f0d7d40b..8d4e0e58ec34 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3410,9 +3410,9 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
>> 
>> 		vcpu->arch.st.preempted = 0;
>> 
>> -		trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
>> -				       st_preempted & KVM_VCPU_FLUSH_TLB);
>> 		if (st_preempted & KVM_VCPU_FLUSH_TLB)
>> +			trace_kvm_pv_tlb_flush(vcpu->vcpu_id,
>> +				st_preempted & KVM_VCPU_FLUSH_TLB);
> 
> If you're going to trace only when a flush is needed, this should simply be:
> 
> 			trace_kvm_pv_tlb_flush(vcpu->vcpu_id);
> 
> I haven't used this tracepoint often (at all?) so I don't have a strong preference,
> but I can see the "no TLB flush needed" information being extremely valuable when
> debugging a supsected TLB flushing bug.

Yea thats fair; however, this is only calling into some other function that is
actually doing the work. Those other flush TLB areas do not have traces
AFAIK, so even that is a bit in complete.

The net problem here is really that the stat is likely incorrect; however,
one other oddity I didn’t quite understand after looking into this is that
the call site for all of this is in record_steal_time(), which is only called
from vcpu_enter_guest(), and that is called *after*
kvm_service_local_tlb_flush_requests(), which also calls
kvm_vcpu_flush_tlb_guest() if request == KVM_REQ_TLB_FLUSH_GUEST

That request may be there set from a few different places. 

I don’t have any proof of this, but it seems to me like we might have a
situation where we double flush?

Put another way, I wonder if there is any sense behind maybe hoisting
if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu)) up before
Other tlb flushes, and have it clear the FLUSH_GUEST if it was set?

Just thinking aloud, seemed fishy.

Regardless, this pv tlb flush stat needs some love, open to suggestions
on the most appropriate way to tackle it?

> 
>> 			kvm_vcpu_flush_tlb_guest(vcpu);
>> 
>> 		if (!user_access_begin(st, sizeof(*st)))
>> -- 
>> 2.30.1 (Apple Git-130)

Powered by blists - more mailing lists