[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <13b0b125-ec05-41b9-8ea9-a36597634b54@linux.intel.com>
Date: Tue, 23 Apr 2024 11:59:33 +0800
From: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
To: maobibo <maobibo@...ngson.cn>, Sean Christopherson <seanjc@...gle.com>
Cc: Mingwei Zhang <mizhang@...gle.com>,
Xiong Zhang <xiong.y.zhang@...ux.intel.com>, pbonzini@...hat.com,
peterz@...radead.org, kan.liang@...el.com, zhenyuw@...ux.intel.com,
jmattson@...gle.com, kvm@...r.kernel.org, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org, zhiyuan.lv@...el.com, eranian@...gle.com,
irogers@...gle.com, samantha.alt@...el.com, like.xu.linux@...il.com,
chao.gao@...el.com
Subject: Re: [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU
state for Intel CPU
On 4/23/2024 11:26 AM, maobibo wrote:
>
>
> On 2024/4/23 上午11:13, Mi, Dapeng wrote:
>>
>> On 4/23/2024 10:53 AM, maobibo wrote:
>>>
>>>
>>> On 2024/4/23 上午10:44, Mi, Dapeng wrote:
>>>>
>>>> On 4/23/2024 9:01 AM, maobibo wrote:
>>>>>
>>>>>
>>>>> On 2024/4/23 上午1:01, Sean Christopherson wrote:
>>>>>> On Mon, Apr 22, 2024, maobibo wrote:
>>>>>>> On 2024/4/16 上午6:45, Sean Christopherson wrote:
>>>>>>>> On Mon, Apr 15, 2024, Mingwei Zhang wrote:
>>>>>>>>> On Mon, Apr 15, 2024 at 10:38 AM Sean Christopherson
>>>>>>>>> <seanjc@...gle.com> wrote:
>>>>>>>>>> One my biggest complaints with the current vPMU code is that
>>>>>>>>>> the roles and
>>>>>>>>>> responsibilities between KVM and perf are poorly defined,
>>>>>>>>>> which leads to suboptimal
>>>>>>>>>> and hard to maintain code.
>>>>>>>>>>
>>>>>>>>>> Case in point, I'm pretty sure leaving guest values in PMCs
>>>>>>>>>> _would_ leak guest
>>>>>>>>>> state to userspace processes that have RDPMC permissions, as
>>>>>>>>>> the PMCs might not
>>>>>>>>>> be dirty from perf's perspective (see
>>>>>>>>>> perf_clear_dirty_counters()).
>>>>>>>>>>
>>>>>>>>>> Blindly clearing PMCs in KVM "solves" that problem, but in
>>>>>>>>>> doing so makes the
>>>>>>>>>> overall code brittle because it's not clear whether KVM
>>>>>>>>>> _needs_ to clear PMCs,
>>>>>>>>>> or if KVM is just being paranoid.
>>>>>>>>>
>>>>>>>>> So once this rolls out, perf and vPMU are clients directly to
>>>>>>>>> PMU HW.
>>>>>>>>
>>>>>>>> I don't think this is a statement we want to make, as it opens
>>>>>>>> a discussion
>>>>>>>> that we won't win. Nor do I think it's one we *need* to make.
>>>>>>>> KVM doesn't need
>>>>>>>> to be on equal footing with perf in terms of owning/managing
>>>>>>>> PMU hardware, KVM
>>>>>>>> just needs a few APIs to allow faithfully and accurately
>>>>>>>> virtualizing a guest PMU.
>>>>>>>>
>>>>>>>>> Faithful cleaning (blind cleaning) has to be the baseline
>>>>>>>>> implementation, until both clients agree to a "deal" between
>>>>>>>>> them.
>>>>>>>>> Currently, there is no such deal, but I believe we could have
>>>>>>>>> one via
>>>>>>>>> future discussion.
>>>>>>>>
>>>>>>>> What I am saying is that there needs to be a "deal" in place
>>>>>>>> before this code
>>>>>>>> is merged. It doesn't need to be anything fancy, e.g. perf can
>>>>>>>> still pave over
>>>>>>>> PMCs it doesn't immediately load, as opposed to using
>>>>>>>> cpu_hw_events.dirty to lazily
>>>>>>>> do the clearing. But perf and KVM need to work together from
>>>>>>>> the get go, ie. I
>>>>>>>> don't want KVM doing something without regard to what perf
>>>>>>>> does, and vice versa.
>>>>>>>>
>>>>>>> There is similar issue on LoongArch vPMU where vm can directly
>>>>>>> pmu hardware
>>>>>>> and pmu hw is shard with guest and host. Besides context switch
>>>>>>> there are
>>>>>>> other places where perf core will access pmu hw, such as tick
>>>>>>> timer/hrtimer/ipi function call, and KVM can only intercept
>>>>>>> context switch.
>>>>>>
>>>>>> Two questions:
>>>>>>
>>>>>> 1) Can KVM prevent the guest from accessing the PMU?
>>>>>>
>>>>>> 2) If so, KVM can grant partial access to the PMU, or is it all
>>>>>> or nothing?
>>>>>>
>>>>>> If the answer to both questions is "yes", then it sounds like
>>>>>> LoongArch *requires*
>>>>>> mediated/passthrough support in order to virtualize its PMU.
>>>>>
>>>>> Hi Sean,
>>>>>
>>>>> Thank for your quick response.
>>>>>
>>>>> yes, kvm can prevent guest from accessing the PMU and grant
>>>>> partial or all to access to the PMU. Only that if one pmu event is
>>>>> granted to VM, host can not access this pmu event again. There
>>>>> must be pmu event switch if host want to.
>>>>
>>>> PMU event is a software entity which won't be shared. did you mean
>>>> if a PMU HW counter is granted to VM, then Host can't access the
>>>> PMU HW counter, right?
>>> yes, if PMU HW counter/control is granted to VM. The value comes
>>> from guest, and is not meaningful for host. Host pmu core does not
>>> know that it is granted to VM, host still think that it owns pmu.
>>
>> That's one issue this patchset tries to solve. Current new mediated
>> x86 vPMU framework doesn't allow Host or Guest own the PMU HW
>> resource simultaneously. Only when there is no !exclude_guest event
>> on host, guest is allowed to exclusively own the PMU HW resource.
>>
>>
>>>
>>> Just like FPU register, it is shared by VM and host during different
>>> time and it is lately switched. But if IPI or timer interrupt uses
>>> FPU register on host, there will be the same issue.
>>
>> I didn't fully get your point. When IPI or timer interrupt reach, a
>> VM-exit is triggered to make CPU traps into host first and then the
>> host interrupt handler is called. Or are you complaining the
>> executing sequence of switching guest PMU MSRs and these interrupt
>> handler?
> It is not necessary to save/restore PMU HW at every vm exit, it had
> better be lately saved/restored, such as only when vcpu thread is
> sched-out/sched-in, else the cost will be a little expensive.
I suspect this optimization deferring guest PMU state save/restore to
vCPU task switching boundary would be really landed into KVM since it
would make host lose the capability to profile KVM and It seems Sean
object this.
>
> I know little about perf core. However there is PMU HW access in
> interrupt mode. That means PMU HW access should be irq disabled in
> general mode, else there may be nested PMU HW access. Is that true?
I had no idea that timer irq handler would access PMU MSRs before. Could
you please show me the code and I would look at it first. Thanks.
>
>>
>>
>>>
>>> Regards
>>> Bibo Mao
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>> Can we add callback handler in structure kvm_guest_cbs? just
>>>>>>> like this:
>>>>>>> @@ -6403,6 +6403,7 @@ static struct perf_guest_info_callbacks
>>>>>>> kvm_guest_cbs
>>>>>>> = {
>>>>>>> .state = kvm_guest_state,
>>>>>>> .get_ip = kvm_guest_get_ip,
>>>>>>> .handle_intel_pt_intr = NULL,
>>>>>>> + .lose_pmu = kvm_guest_lose_pmu,
>>>>>>> };
>>>>>>>
>>>>>>> By the way, I do not know should the callback handler be
>>>>>>> triggered in perf
>>>>>>> core or detailed pmu hw driver. From ARM pmu hw driver, it is
>>>>>>> triggered in
>>>>>>> pmu hw driver such as function kvm_vcpu_pmu_resync_el0,
>>>>>>> but I think it will be better if it is done in perf core.
>>>>>>
>>>>>> I don't think we want to take the approach of perf and KVM guests
>>>>>> "fighting" over
>>>>>> the PMU. That's effectively what we have today, and it's a mess
>>>>>> for KVM because
>>>>>> it's impossible to provide consistent, deterministic behavior for
>>>>>> the guest. And
>>>>>> it's just as messy for perf, which ends up having wierd,
>>>>>> cumbersome flows that
>>>>>> exists purely to try to play nice with KVM.
>>>>> With existing pmu core code, in tick timer interrupt or IPI
>>>>> function call interrupt pmu hw may be accessed by host when VM is
>>>>> running and pmu is already granted to guest. KVM can not intercept
>>>>> host IPI/timer interrupt, there is no pmu context switch, there
>>>>> will be problem.
>>>>>
>>>>> Regards
>>>>> Bibo Mao
>>>>>
>>>
>
Powered by blists - more mailing lists