linux-kernel - Re: [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU state for Intel CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <13b0b125-ec05-41b9-8ea9-a36597634b54@linux.intel.com>
Date: Tue, 23 Apr 2024 11:59:33 +0800
From: "Mi, Dapeng" <dapeng1.mi@...ux.intel.com>
To: maobibo <maobibo@...ngson.cn>, Sean Christopherson <seanjc@...gle.com>
Cc: Mingwei Zhang <mizhang@...gle.com>,
 Xiong Zhang <xiong.y.zhang@...ux.intel.com>, pbonzini@...hat.com,
 peterz@...radead.org, kan.liang@...el.com, zhenyuw@...ux.intel.com,
 jmattson@...gle.com, kvm@...r.kernel.org, linux-perf-users@...r.kernel.org,
 linux-kernel@...r.kernel.org, zhiyuan.lv@...el.com, eranian@...gle.com,
 irogers@...gle.com, samantha.alt@...el.com, like.xu.linux@...il.com,
 chao.gao@...el.com
Subject: Re: [RFC PATCH 23/41] KVM: x86/pmu: Implement the save/restore of PMU
 state for Intel CPU


On 4/23/2024 11:26 AM, maobibo wrote:
>
>
> On 2024/4/23 上午11:13, Mi, Dapeng wrote:
>>
>> On 4/23/2024 10:53 AM, maobibo wrote:
>>>
>>>
>>> On 2024/4/23 上午10:44, Mi, Dapeng wrote:
>>>>
>>>> On 4/23/2024 9:01 AM, maobibo wrote:
>>>>>
>>>>>
>>>>> On 2024/4/23 上午1:01, Sean Christopherson wrote:
>>>>>> On Mon, Apr 22, 2024, maobibo wrote:
>>>>>>> On 2024/4/16 上午6:45, Sean Christopherson wrote:
>>>>>>>> On Mon, Apr 15, 2024, Mingwei Zhang wrote:
>>>>>>>>> On Mon, Apr 15, 2024 at 10:38 AM Sean Christopherson 
>>>>>>>>> <seanjc@...gle.com> wrote:
>>>>>>>>>> One my biggest complaints with the current vPMU code is that 
>>>>>>>>>> the roles and
>>>>>>>>>> responsibilities between KVM and perf are poorly defined, 
>>>>>>>>>> which leads to suboptimal
>>>>>>>>>> and hard to maintain code.
>>>>>>>>>>
>>>>>>>>>> Case in point, I'm pretty sure leaving guest values in PMCs 
>>>>>>>>>> _would_ leak guest
>>>>>>>>>> state to userspace processes that have RDPMC permissions, as 
>>>>>>>>>> the PMCs might not
>>>>>>>>>> be dirty from perf's perspective (see 
>>>>>>>>>> perf_clear_dirty_counters()).
>>>>>>>>>>
>>>>>>>>>> Blindly clearing PMCs in KVM "solves" that problem, but in 
>>>>>>>>>> doing so makes the
>>>>>>>>>> overall code brittle because it's not clear whether KVM 
>>>>>>>>>> _needs_ to clear PMCs,
>>>>>>>>>> or if KVM is just being paranoid.
>>>>>>>>>
>>>>>>>>> So once this rolls out, perf and vPMU are clients directly to 
>>>>>>>>> PMU HW.
>>>>>>>>
>>>>>>>> I don't think this is a statement we want to make, as it opens 
>>>>>>>> a discussion
>>>>>>>> that we won't win.  Nor do I think it's one we *need* to make. 
>>>>>>>> KVM doesn't need
>>>>>>>> to be on equal footing with perf in terms of owning/managing 
>>>>>>>> PMU hardware, KVM
>>>>>>>> just needs a few APIs to allow faithfully and accurately 
>>>>>>>> virtualizing a guest PMU.
>>>>>>>>
>>>>>>>>> Faithful cleaning (blind cleaning) has to be the baseline
>>>>>>>>> implementation, until both clients agree to a "deal" between 
>>>>>>>>> them.
>>>>>>>>> Currently, there is no such deal, but I believe we could have 
>>>>>>>>> one via
>>>>>>>>> future discussion.
>>>>>>>>
>>>>>>>> What I am saying is that there needs to be a "deal" in place 
>>>>>>>> before this code
>>>>>>>> is merged.  It doesn't need to be anything fancy, e.g. perf can 
>>>>>>>> still pave over
>>>>>>>> PMCs it doesn't immediately load, as opposed to using 
>>>>>>>> cpu_hw_events.dirty to lazily
>>>>>>>> do the clearing.  But perf and KVM need to work together from 
>>>>>>>> the get go, ie. I
>>>>>>>> don't want KVM doing something without regard to what perf 
>>>>>>>> does, and vice versa.
>>>>>>>>
>>>>>>> There is similar issue on LoongArch vPMU where vm can directly 
>>>>>>> pmu hardware
>>>>>>> and pmu hw is shard with guest and host. Besides context switch 
>>>>>>> there are
>>>>>>> other places where perf core will access pmu hw, such as tick
>>>>>>> timer/hrtimer/ipi function call, and KVM can only intercept 
>>>>>>> context switch.
>>>>>>
>>>>>> Two questions:
>>>>>>
>>>>>>   1) Can KVM prevent the guest from accessing the PMU?
>>>>>>
>>>>>>   2) If so, KVM can grant partial access to the PMU, or is it all 
>>>>>> or nothing?
>>>>>>
>>>>>> If the answer to both questions is "yes", then it sounds like 
>>>>>> LoongArch *requires*
>>>>>> mediated/passthrough support in order to virtualize its PMU.
>>>>>
>>>>> Hi Sean,
>>>>>
>>>>> Thank for your quick response.
>>>>>
>>>>> yes, kvm can prevent guest from accessing the PMU and grant 
>>>>> partial or all to access to the PMU. Only that if one pmu event is 
>>>>> granted to VM, host can not access this pmu event again. There 
>>>>> must be pmu event switch if host want to.
>>>>
>>>> PMU event is a software entity which won't be shared. did you mean 
>>>> if a PMU HW counter is granted to VM, then Host can't access the 
>>>> PMU HW counter, right?
>>> yes, if PMU HW counter/control is granted to VM. The value comes 
>>> from guest, and is not meaningful for host.  Host pmu core does not 
>>> know that it is granted to VM, host still think that it owns pmu.
>>
>> That's one issue this patchset tries to solve. Current new mediated 
>> x86 vPMU framework doesn't allow Host or Guest own the PMU HW 
>> resource simultaneously. Only when there is no !exclude_guest event 
>> on host, guest is allowed to exclusively own the PMU HW resource.
>>
>>
>>>
>>> Just like FPU register, it is shared by VM and host during different 
>>> time and it is lately switched. But if IPI or timer interrupt uses 
>>> FPU register on host, there will be the same issue.
>>
>> I didn't fully get your point. When IPI or timer interrupt reach, a 
>> VM-exit is triggered to make CPU traps into host first and then the 
>> host interrupt handler is called. Or are you complaining the 
>> executing sequence of switching guest PMU MSRs and these interrupt 
>> handler?
> It is not necessary to save/restore PMU HW at every vm exit, it had 
> better be lately saved/restored, such as only when vcpu thread is 
> sched-out/sched-in, else the cost will be a little expensive.

I suspect this optimization deferring guest PMU state save/restore to 
vCPU task switching boundary would be really landed into KVM since it 
would make host lose the capability to profile KVM and It seems Sean 
object this.


>
> I know little about perf core. However there is PMU HW access in 
> interrupt mode. That means PMU HW access should be irq disabled in 
> general mode, else there may be nested PMU HW access. Is that true?

I had no idea that timer irq handler would access PMU MSRs before. Could 
you please show me the code and I would look at it first. Thanks.


>
>>
>>
>>>
>>> Regards
>>> Bibo Mao
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>> Can we add callback handler in structure kvm_guest_cbs?  just 
>>>>>>> like this:
>>>>>>> @@ -6403,6 +6403,7 @@ static struct perf_guest_info_callbacks 
>>>>>>> kvm_guest_cbs
>>>>>>> = {
>>>>>>>          .state                  = kvm_guest_state,
>>>>>>>          .get_ip                 = kvm_guest_get_ip,
>>>>>>>          .handle_intel_pt_intr   = NULL,
>>>>>>> +       .lose_pmu               = kvm_guest_lose_pmu,
>>>>>>>   };
>>>>>>>
>>>>>>> By the way, I do not know should the callback handler be 
>>>>>>> triggered in perf
>>>>>>> core or detailed pmu hw driver. From ARM pmu hw driver, it is 
>>>>>>> triggered in
>>>>>>> pmu hw driver such as function kvm_vcpu_pmu_resync_el0,
>>>>>>> but I think it will be better if it is done in perf core.
>>>>>>
>>>>>> I don't think we want to take the approach of perf and KVM guests 
>>>>>> "fighting" over
>>>>>> the PMU.  That's effectively what we have today, and it's a mess 
>>>>>> for KVM because
>>>>>> it's impossible to provide consistent, deterministic behavior for 
>>>>>> the guest.  And
>>>>>> it's just as messy for perf, which ends up having wierd, 
>>>>>> cumbersome flows that
>>>>>> exists purely to try to play nice with KVM.
>>>>> With existing pmu core code, in tick timer interrupt or IPI 
>>>>> function call interrupt pmu hw may be accessed by host when VM is 
>>>>> running and pmu is already granted to guest. KVM can not intercept 
>>>>> host IPI/timer interrupt, there is no pmu context switch, there 
>>>>> will be problem.
>>>>>
>>>>> Regards
>>>>> Bibo Mao
>>>>>
>>>
>