[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <713471d3-ab05-7884-66fd-1efff9f6aeea@gmail.com>
Date: Wed, 21 Jul 2021 20:10:11 +0800
From: Like Xu <like.xu.linux@...il.com>
To: Jim Mattson <jmattson@...gle.com>
Cc: Zhu Lingshan <lingshan.zhu@...el.com>, peterz@...radead.org,
pbonzini@...hat.com, bp@...en8.de, seanjc@...gle.com,
vkuznets@...hat.com, wanpengli@...cent.com, joro@...tes.org,
ak@...ux.intel.com, wei.w.wang@...el.com, eranian@...gle.com,
liuxiangdong5@...wei.com, linux-kernel@...r.kernel.org,
x86@...nel.org, kvm@...r.kernel.org, boris.ostrvsky@...cle.com,
"Liang, Kan" <kan.liang@...ux.intel.com>
Subject: Re: [PATCH V8 00/18] KVM: x86/pmu: Add *basic* support to enable
guest PEBS via DS
On 19/7/2021 8:41 am, Liang, Kan wrote:
>
>
> On 7/16/2021 5:07 PM, Jim Mattson wrote:
>> On Fri, Jul 16, 2021 at 12:00 PM Liang, Kan
>> <kan.liang@...ux.intel.com> wrote:
>>>
>>>
>>>
>>> On 7/16/2021 1:02 PM, Jim Mattson wrote:
>>>> On Fri, Jul 16, 2021 at 1:54 AM Zhu Lingshan
>>>> <lingshan.zhu@...el.com> wrote:
>>>>>
>>>>> The guest Precise Event Based Sampling (PEBS) feature can provide an
>>>>> architectural state of the instruction executed after the guest
>>>>> instruction
>>>>> that exactly caused the event. It needs new hardware facility only
>>>>> available
>>>>> on Intel Ice Lake Server platforms. This patch set enables the
>>>>> basic PEBS
>>>>> feature for KVM guests on ICX.
>>>>>
>>>>> We can use PEBS feature on the Linux guest like native:
>>>>>
>>>>> # echo 0 > /proc/sys/kernel/watchdog (on the host)
>>>>> # perf record -e instructions:ppp ./br_instr a
>>>>> # perf record -c 100000 -e instructions:pp ./br_instr a
>>>>>
>>>>> To emulate guest PEBS facility for the above perf usages,
>>>>> we need to implement 2 code paths:
>>>>>
>>>>> 1) Fast path
>>>>>
>>>>> This is when the host assigned physical PMC has an identical index
>>>>> as the
>>>>> virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
>>>>> This path is used in most common use cases.
>>>>>
>>>>> 2) Slow path
>>>>>
>>>>> This is when the host assigned physical PMC has a different index
>>>>> from the
>>>>> virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0) In
>>>>> this case,
>>>>> KVM needs to rewrite the PEBS records to change the applicable
>>>>> counter indexes
>>>>> to the virtual PMC indexes, which would otherwise contain the
>>>>> physical counter
>>>>> index written by PEBS facility, and switch the counter reset values
>>>>> to the
>>>>> offset corresponding to the physical counter indexes in the DS data
>>>>> structure.
>>>>>
>>>>> The previous version [0] enables both fast path and slow path,
>>>>> which seems
>>>>> a bit more complex as the first step. In this patchset, we want to
>>>>> start with
>>>>> the fast path to get the basic guest PEBS enabled while keeping the
>>>>> slow path
>>>>> disabled. More focused discussion on the slow path [1] is planned
>>>>> to be put to
>>>>> another patchset in the next step.
>>>>>
>>>>> Compared to later versions in subsequent steps, the functionality
>>>>> to support
>>>>> host-guest PEBS both enabled and the functionality to emulate guest
>>>>> PEBS when
>>>>> the counter is cross-mapped are missing in this patch set
>>>>> (neither of these are typical scenarios).
>>>>
>>>> I'm not sure exactly what scenarios you're ruling out here. In our
>>>> environment, we always have to be able to support host-level
>>>> profiling, whether or not the guest is using the PMU (for PEBS or
>>>> anything else). Hence, for our *basic* vPMU offering, we only expose
>>>> two general purpose counters to the guest, so that we can keep two
>>>> general purpose counters for the host. In this scenario, I would
>>>> expect cross-mapped counters to be common. Are we going to be able to
>>>> use this implementation?
>>>>
>>>
>>> Let's say we have 4 GP counters in HW.
>>> Do you mean that the host owns 2 GP counters (counter 0 & 1) and the
>>> guest own the other 2 GP counters (counter 2 & 3) in your envirinment?
>>> We did a similar implementation in V1, but the proposal has been denied.
>>> https://lore.kernel.org/kvm/20200306135317.GD12561@hirez.programming.kicks-ass.net/
>>>
>>
>> It's the other way around. AFAIK, there is no architectural way to
>> specify that only counters 2 and 3 are available, so we have to give
>> the guest counters 0 and 1.
>
> How about the host? Can the host see all 4 counters?
>
>>
>>> For the current proposal, both guest and host can see all 4 GP counters.
>>> The counters are shared.
>>
>> I don't understand how that can work. If the host programs two
>> counters, how can you give the guest four counters?
>>
>>> The guest cannot know the availability of the counters. It may requires
>>> a counter (e.g., counter 0) which may has been used by the host. Host
>>> may provides another counter (e.g., counter 1) to the guest. This is the
>>> case described in the slow path. For this case, we have to modify the
>>> guest PEBS record. Because the counter index in the PEBS record is 1,
>>> while the guest perf driver expects 0.
>>
>> If we reserve counters 0 and 1 for the guest, this is not a problem
>> (assuming we tell the guest it only has two counters). If we don't
>> statically partition the counters, I don't see how you can ensure that
>> the guest behaves as architected. For example, what do you do when the
>> guest programs four counters and the host programs two?
>
> Ideally, we should do multiplexing if the guest requires four and the
> host requires two. But I doubt this patch set implements the
> multiplexing, because the multiplexing should be part of the slow path,
> which will be supported in the next step.
>
> Could you please share more details regarding your environment?
Jim, would you mind sharing more details about the statically
partitioned hardware counters in your virtualization scenario ?
It may be useful for subsequent designs for advanced PEBS features.
Otherwise we will follow the sharing rules defined by perf subsystem.
> How do you handle the case that guest programs two counters and the host
> programs four counters?
>
>>
>>> If counter 0 is available, guests can use counter 0. That's the fast
>>> path. I think the fast path should be more common even both host and
>>> guest are profiling. Because except for some specific events, we may
>>> move the host event to the counters which are not required by guest if
>>> we have enough resources.
>>
>> And if you don't have enough resources?
>
> As my understanding, multiplexing should be the only choice if we don't
> have enough resources.
>
> Thanks,
> Kan
Powered by blists - more mailing lists