lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b7177278-2a39-4fe6-9690-573b70e2ed0e@oracle.com>
Date: Thu, 15 Feb 2024 13:12:22 -0500
From: Alejandro Jimenez <alejandro.j.jimenez@...cle.com>
To: Dongli Zhang <dongli.zhang@...cle.com>, kvm@...r.kernel.org
Cc: seanjc@...gle.com, pbonzini@...hat.com, linux-kernel@...r.kernel.org,
        joao.m.martins@...cle.com, boris.ostrovsky@...cle.com,
        mark.kanda@...cle.com, suravee.suthikulpanit@....com,
        mlevitsk@...hat.com
Subject: Re: [RFC 2/3] x86: KVM: stats: Add stat counter for IRQs injected via
 APICv

Hi Dongli

On 2/15/24 11:16, Dongli Zhang wrote:
> Hi Alejandro,
> 
> Is there any use case of this counter in the bug?

I don't have a specific bug in mind that this is trying to address.  This patch is just an example is to show how existing data points (i.e. the trace_kvm_apicv_accept_irq tracepoint) can also be exposed via the stats framework with minimal overhead, and to support the point in the cover letter that querying the binary stats could be the best choice for a "single source" that tells us the full status of APICv/AVIC (i.e. is SVM and IOMMU AVIC both working, are there any inhibits set, etc)

> 
> E.g., there are already trace_kvm_apicv_accept_irq() there. The ftrace or ebpf
> would be able to tell if the hardware accelerated interrupt delivery is active?.

Yes, the tracepoint already provides information if you know it exists AND have sufficient privileges to use tracefs or ebpf. The purpose of the RFC is to agree on a mechanism by which to expose all the apicv relevant data (and any new additions) via a single interface so that the sources of information are not scattered across tracepoints, debugfs entries, or in data structures that need to be read via BPF.

My understanding is that the stats subsystem method can work when using ftrace of bpftrace is not possible, so that is why I am suggesting that is used as the "standard" method to expose this info.
There will of course be some duplication with existing tracepoints, but there is already precedent in KVM where both stats and tracepoints are updated simultaneously (e.g. mmu_{un}sync_page(), {svm|vmx}_inject_irq()).

> 
> Any extra benefits? E.g., if this counter may need to match with any other
> counter in the KVM/guest so that a bug can be detected? That will be very helpful.

Again, I didn't have a specific scenario for using this counter other than the associated tracepoint is the one I typically use to determine if APICv is active. But let's think of an example on the spot: In a hypothetical scenario where I want to determine the ratio that a vCPU spends blocking or in guest mode, I could add another stat e.g.:

+
+       ++vcpu->stat.apicv_accept_irq;
+
         if (in_guest_mode) {
                 /*
                  * Signal the doorbell to tell hardware to inject the IRQ.  If
                  * the vCPU exits the guest before the doorbell chimes, hardware
                  * will automatically process AVIC interrupts at the next VMRUN.
                  */
                 avic_ring_doorbell(vcpu);
+		++vcpu->stat.avic_doorbell_rung;
         } else {
                 /*
                  * Wake the vCPU if it was blocking.  KVM will then detect the
                  * pending IRQ when checking if the vCPU has a wake event.
                  */
                 kvm_vcpu_wake_up(vcpu);
         }

and then the ratio of (avic_doorbell_rung / apicv_accept_irq) lets me estimate a percentage of time the target vCPU is idle or running. There are likely better ways of determining this, but you get the idea. The goal is to have a general consensus for whether or not I should opt to add a new tracepoint (trace_kvm_avic_ring_doorbell) or a new stat as the "preferred" solution. Obviously there are still cases where a tracepoint is the best approach (e.g. it transfers more information).

Hopefully I didn't stray too far from your question/point.

Alejandro

> 
> Thank you very much!
> 
> Dongli Zhang
> 
> On 2/15/24 08:01, Alejandro Jimenez wrote:
>> Export binary stat counting how many interrupts have been delivered via
>> APICv/AVIC acceleration from the host. This is one of the most reliable
>> methods to detect when hardware accelerated interrupt delivery is active,
>> since APIC timer interrupts are regularly injected and exercise these
>> code paths.
>>
>> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@...cle.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h | 1 +
>>   arch/x86/kvm/svm/svm.c          | 3 +++
>>   arch/x86/kvm/vmx/vmx.c          | 2 ++
>>   arch/x86/kvm/x86.c              | 1 +
>>   4 files changed, 7 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 9b960a523715..b6f18084d504 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1564,6 +1564,7 @@ struct kvm_vcpu_stat {
>>   	u64 preemption_other;
>>   	u64 guest_mode;
>>   	u64 notify_window_exits;
>> +	u64 apicv_accept_irq;
>>   };
>>   
>>   struct x86_instruction_info;
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index e90b429c84f1..2243af08ed39 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -3648,6 +3648,9 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu *vcpu, int delivery_mode,
>>   	}
>>   
>>   	trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode, trig_mode, vector);
>> +
>> +	++vcpu->stat.apicv_accept_irq;
>> +
>>   	if (in_guest_mode) {
>>   		/*
>>   		 * Signal the doorbell to tell hardware to inject the IRQ.  If
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index d4e6625e0a9a..f7db75ae2c55 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -4275,6 +4275,8 @@ static void vmx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode,
>>   	} else {
>>   		trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode,
>>   					   trig_mode, vector);
>> +
>> +		++vcpu->stat.apicv_accept_irq;
>>   	}
>>   }
>>   
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index f7f598f066e7..2ad70cf6e52c 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -304,6 +304,7 @@ const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = {
>>   	STATS_DESC_COUNTER(VCPU, preemption_other),
>>   	STATS_DESC_IBOOLEAN(VCPU, guest_mode),
>>   	STATS_DESC_COUNTER(VCPU, notify_window_exits),
>> +	STATS_DESC_COUNTER(VCPU, apicv_accept_irq),
>>   };
>>   
>>   const struct kvm_stats_header kvm_vcpu_stats_header = {

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ