[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0c774dad-9f16-e9c1-56ea-3865cdfaeee0@redhat.com>
Date: Sat, 13 Oct 2018 10:09:11 +0200
From: Paolo Bonzini <pbonzini@...hat.com>
To: Andi Kleen <ak@...ux.intel.com>, Wei Wang <wei.w.wang@...el.com>
Cc: linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
peterz@...radead.org, mingo@...hat.com, rkrcmar@...hat.com,
like.xu@...el.com
Subject: Re: [PATCH v1] KVM/x86/vPMU: Guest PMI Optimization
On 12/10/2018 18:30, Andi Kleen wrote:
>> 4. Results
>> - Without this optimization, the guest pmi handling time is
>> ~4500000 ns, and the max sampling rate is reduced to 250.
>> - With this optimization, the guest pmi handling time is ~9000 ns
>> (i.e. 1 / 500 of the non-optimization case), and the max sampling
>> rate remains at the original 100000.
>
> Impressive performance improvement!
Agreed!
> It's not clear to me why you're special casing PMIs here. The optimization
> should work generically, right?
Yeah, you can even just check if the counter is in the struct
cpu_hw_events guest mask, and if so always write the counter MSR directly.
>> @@ -237,9 +267,23 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> default:
>> if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
>> (pmc = get_fixed_pmc(pmu, msr))) {
>> - if (!msr_info->host_initiated)
>> - data = (s64)(s32)data;
>> - pmc->counter += data - pmc_read_counter(pmc);
>> + if (pmu->in_pmi) {
>> + /*
>> + * Since we are not re-allocating a perf event
>> + * to reconfigure the sampling time when the
>> + * guest pmu is in PMI, just set the value to
>> + * the hardware perf counter. Counting will
>> + * continue after the guest enables the
>> + * counter bit in MSR_CORE_PERF_GLOBAL_CTRL.
>> + */
>> + struct hw_perf_event *hwc =
>> + &pmc->perf_event->hw;
>> + wrmsrl(hwc->event_base, data);
>
> Is that guaranteed to be always called on the right CPU that will run the vcpu?
>
> AFAIK there's an ioctl to set MSRs in the guest from qemu, I'm pretty sure
> it won't handle that.
How much of the performance improvement comes from here? In theory
pmc_read_counter() should always hit a relatively fast path, because the
smp_call_function_single in perf_event_read doesn't need an IPI.
In any case, this should be a separate patch.
Paolo
> May need to be delayed to entry time.
>
> -Andi
>
Powered by blists - more mailing lists