[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b57b1852-9245-d539-c254-28d1834a64dc@amperemail.onmicrosoft.com>
Date: Wed, 9 Aug 2023 17:37:51 +0800
From: Shijie Huang <shijie@...eremail.onmicrosoft.com>
To: Mark Rutland <mark.rutland@....com>,
Oliver Upton <oliver.upton@...ux.dev>
Cc: Huang Shijie <shijie@...amperecomputing.com>, maz@...nel.org,
james.morse@....com, suzuki.poulose@....com, yuzenghui@...wei.com,
catalin.marinas@....com, will@...nel.org, pbonzini@...hat.com,
peterz@...radead.org, ingo@...hat.com, acme@...nel.org,
alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
namhyung@...nel.org, irogers@...gle.com,
linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
linux-perf-users@...r.kernel.org, patches@...erecomputing.com,
zwang@...erecomputing.com
Subject: Re: [PATCH] perf/core: fix the bug in the event multiplexing
Hi Mark,
在 2023/8/9 17:22, Mark Rutland 写道:
> On Wed, Aug 09, 2023 at 08:25:07AM +0000, Oliver Upton wrote:
>> Hi Huang,
>>
>> On Wed, Aug 09, 2023 at 09:39:53AM +0800, Huang Shijie wrote:
>>> 2.) Root cause.
>>> There is only 7 counters in my arm64 platform:
>>> (one cycle counter) + (6 normal counters)
>>>
>>> In 1.3 above, we will use 10 event counters.
>>> Since we only have 7 counters, the perf core will trigger
>>> event multiplexing in hrtimer:
>>> merge_sched_in() -->perf_mux_hrtimer_restart() -->
>>> perf_rotate_context().
>>>
>>> In the perf_rotate_context(), it does not restore some PMU registers
>>> as context_switch() does. In context_switch():
>>> kvm_sched_in() --> kvm_vcpu_pmu_restore_guest()
>>> kvm_sched_out() --> kvm_vcpu_pmu_restore_host()
>>>
>>> So we got wrong result.
>> This is a rather vague description of the problem. AFAICT, the
>> issue here is on VHE systems we wind up getting the EL0 count
>> enable/disable bits backwards when entering the guest, which is
>> corroborated by the data you have below.
> Yep; IIUC the issue here is that when we take an IRQ from a guest and reprogram
> the PMU in the IRQ handler, the IRQ handler will program the PMU with
> appropriate host/guest/user/etc filters for a *host* context, and then we'll
> return back into the guest without reconfigurign the event filtering for a
> *guest* context.
Yes.
>
> That can happen for perf_rotate_context(), or when we install an event into a
> running context, as that'll happen via an IPI.
>
>>> +void arch_perf_rotate_pmu_set(void)
>>> +{
>>> + if (is_guest())
>>> + kvm_vcpu_pmu_restore_guest(NULL);
>>> + else
>>> + kvm_vcpu_pmu_restore_host(NULL);
>>> +}
>>> +
>> This sort of hook is rather nasty, and I'd strongly prefer a solution
>> that's confined to KVM. I don't think the !is_guest() branch is
>> necessary at all. Regardless of how the pmu context is changed, we need
>> to go through vcpu_put() before getting back out to userspace.
>>
>> We can check for a running vCPU (ick) from kvm_set_pmu_events() and either
>> do the EL0 bit flip there or make a request on the vCPU to call
>> kvm_vcpu_pmu_restore_guest() immediately before reentering the guest.
>> I'm slightly leaning towards the latter, unless anyone has a better idea
>> here.
> The latter sounds reasonable to me.
okay. I prefer the latter one now. :)
Thanks
Huang Shijie
>
> I suspect we need to take special care here to make sure we leave *all* events
> in a good state when re-entering the guest or if we get to kvm_sched_out()
> after *removing* an event via an IPI -- it'd be easy to mess either case up and
> leave some events in a bad state.
>
> Thanks,
> Mark.
Powered by blists - more mailing lists