[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c9b4df65-ca2d-083a-883d-415e6be52ac2@linux.intel.com>
Date: Fri, 8 May 2020 16:48:30 +0800
From: Like Xu <like.xu@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
Sean Christopherson <sean.j.christopherson@...el.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, wei.w.wang@...el.com,
ak@...ux.intel.com
Subject: Re: [PATCH v10 08/11] KVM: x86/pmu: Add LBR feature emulation via
guest LBR event
Hi Peter,
On 2020/4/27 11:16, Like Xu wrote:
> Hi Peter,
>
> On 2020/4/24 20:16, Peter Zijlstra wrote:
>> On Thu, Apr 23, 2020 at 04:14:09PM +0800, Like Xu wrote:
>>> +static int intel_pmu_create_lbr_event(struct kvm_vcpu *vcpu)
>>> +{
>>> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>>> + struct perf_event *event;
>>> +
>>> + /*
>>> + * The perf_event_attr is constructed in the minimum efficient way:
>>> + * - set 'pinned = true' to make it task pinned so that if another
>>> + * cpu pinned event reclaims LBR, the event->oncpu will be set to
>>> -1;
>>> + *
>>> + * - set 'sample_type = PERF_SAMPLE_BRANCH_STACK' and
>>> + * 'exclude_host = true' to mark it as a guest LBR event which
>>> + * indicates host perf to schedule it without but a fake counter,
>>> + * check is_guest_lbr_event() and intel_guest_event_constraints();
>>> + *
>>> + * - set 'branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
>>> + * PERF_SAMPLE_BRANCH_USER' to configure it to use callstack mode,
>>> + * which allocs 'ctx->task_ctx_data' and request host perf subsystem
>>> + * to save/restore guest LBR records during host context switches,
>>> + * check branch_user_callstack() and intel_pmu_lbr_sched_task();
>>> + */
>>> + struct perf_event_attr attr = {
>>> + .type = PERF_TYPE_RAW,
>>
>> This is not right; this needs a .config
>
> Now we know the default value .config = 0 for attr is not acceptable.
>
>>
>> And I suppose that is why you need that horrible:
>> needs_guest_lbr_without_counter() thing to begin with.
>
> Do you suggest to use event->attr.config check to replace
> "needs_branch_stack(event) && is_kernel_event(event) &&
> event->attr.exclude_host" check for guest LBR event ?
>
>>
>> Please allocate yourself an event from the pseudo event range:
>> event==0x00. Currently we only have umask==3 for Fixed2 and umask==4
>> for Fixed3, given you claim 58, which is effectively Fixed25,
>> umask==0x1a might be appropriate.
>
> OK, I assume that adding one more field ".config = 0x1a00" is
> efficient enough for perf_event_attr to allocate guest LBR events.
Do you have any comment for this ?
>
>>
>> Also, I suppose we need to claim 0x0000 as an error, so that other
>> people won't try this again.
>
> Does the following fix address your concern on this ?
Does the following fix address your concern on this ?
>
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 2405926e2dba..32d2a3f8c51f 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -498,6 +498,9 @@ int x86_pmu_max_precise(void)
>
> int x86_pmu_hw_config(struct perf_event *event)
> {
> + if (!unlikely(event->attr.config & X86_ARCH_EVENT_MASK))
> + return -EINVAL;
> +
> if (event->attr.precise_ip) {
> int precise = x86_pmu_max_precise();
>
> diff --git a/arch/x86/include/asm/perf_event.h
> b/arch/x86/include/asm/perf_event.h
> index 2e6c59308344..bdba87a6f0af 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -47,6 +47,8 @@
> (ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32))
> #define INTEL_ARCH_EVENT_MASK \
> (ARCH_PERFMON_EVENTSEL_UMASK | ARCH_PERFMON_EVENTSEL_EVENT)
> +#define X86_ARCH_EVENT_MASK \
> + (ARCH_PERFMON_EVENTSEL_UMASK | ARCH_PERFMON_EVENTSEL_EVENT)
>
> #define AMD64_L3_SLICE_SHIFT 48
> #define AMD64_L3_SLICE_MASK
>
>>
>>> + .size = sizeof(attr),
>>> + .pinned = true,
>>> + .exclude_host = true,
>>> + .sample_type = PERF_SAMPLE_BRANCH_STACK,
>>> + .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
>>> + PERF_SAMPLE_BRANCH_USER,
>>> + };
>>> +
>>> + if (unlikely(pmu->lbr_event))
>>> + return 0;
>>> +
>>> + event = perf_event_create_kernel_counter(&attr, -1,
>>> + current, NULL, NULL);
>>> + if (IS_ERR(event)) {
>>> + pr_debug_ratelimited("%s: failed %ld\n",
>>> + __func__, PTR_ERR(event));
>>> + return -ENOENT;
>>> + }
>>> + pmu->lbr_event = event;
>>> + pmu->event_count++;
>>> + return 0;
>>> +}
>>
>> Also, what happens if you fail programming due to a conflicting cpu
>> event? That pinned doesn't guarantee you'll get the event, it just means
>> you'll error instead of getting RR.
>>
>> I didn't find any code checking the event state.
>>
>
> Error instead of RR is expected.
>
> If the KVM fails programming due to a conflicting cpu event
> the LBR registers will not be passthrough to the guest,
> and KVM would return zero for any guest LBR records accesses
> until the next attempt to program the guest LBR event.
>
> Every time before cpu enters the non-root mode where irq is
> disabled, the "event-> oncpu! =-1" check will be applied.
> (more details in the comment around intel_pmu_availability_check())
>
> The guests administer is supposed to know the result of guest
> LBR records is inaccurate if someone is using LBR to record
> guest or hypervisor on the host side.
>
> Is this acceptable to you?
>
> If there is anything needs to be improved, please let me know.
Is this acceptable to you?
If there is anything needs to be improved, please let me know.
>
> Thanks,
> Like Xu
>
Powered by blists - more mailing lists