lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 22 Dec 2021 14:56:55 +0800
From:   Like Xu <like.xu.linux@...il.com>
To:     Jim Mattson <jmattson@...gle.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        kvm list <kvm@...r.kernel.org>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Joerg Roedel <joro@...tes.org>, wei.huang2@....com,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Li RongQing <lirongqing@...du.com>,
        Like Xu <likexu@...cent.com>
Subject: Re: [PATCH] KVM: X86: Emulate APERF/MPERF to report actual VCPU
 frequency

On 24/6/2020 4:34 am, Jim Mattson wrote:
> On Tue, Jun 23, 2020 at 12:05 PM Sean Christopherson
> <sean.j.christopherson@...el.com> wrote:
>>
>> On Tue, Jun 23, 2020 at 11:39:16AM -0700, Jim Mattson wrote:
>>> On Tue, Jun 23, 2020 at 11:29 AM Sean Christopherson
>>> <sean.j.christopherson@...el.com> wrote:
>>>>
>>>> On Tue, Jun 23, 2020 at 02:35:30PM +0800, Like Xu wrote:
>>>>> The aperf/mperf are used to report current CPU frequency after 7d5905dc14a
>>>>> "x86 / CPU: Always show current CPU frequency in /proc/cpuinfo". But guest
>>>>> kernel always reports a fixed VCPU frequency in the /proc/cpuinfo, which
>>>>> may confuse users especially when turbo is enabled on the host.
>>>>>
>>>>> Emulate guest APERF/MPERF capability based their values on the host.
>>>>>
>>>>> Co-developed-by: Li RongQing <lirongqing@...du.com>
>>>>> Signed-off-by: Li RongQing <lirongqing@...du.com>
>>>>> Reviewed-by: Chai Wen <chaiwen@...du.com>
>>>>> Reviewed-by: Jia Lina <jialina01@...du.com>
>>>>> Signed-off-by: Like Xu <like.xu@...ux.intel.com>
>>>>> ---
>>>>
>>>> ...
>>>>
>>>>> @@ -8312,7 +8376,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>>>>                dm_request_for_irq_injection(vcpu) &&
>>>>>                kvm_cpu_accept_dm_intr(vcpu);
>>>>>        fastpath_t exit_fastpath;
>>>>> -
>>>>> +     u64 enter_mperf = 0, enter_aperf = 0, exit_mperf = 0, exit_aperf = 0;
>>>>>        bool req_immediate_exit = false;
>>>>>
>>>>>        if (kvm_request_pending(vcpu)) {
>>>>> @@ -8516,8 +8580,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>>>>>                vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_RELOAD;
>>>>>        }
>>>>>
>>>>> +     if (unlikely(vcpu->arch.hwp.hw_coord_fb_cap))
>>>>> +             get_host_amperf(&enter_mperf, &enter_aperf);
>>>>> +
>>>>>        exit_fastpath = kvm_x86_ops.run(vcpu);
>>>>>
>>>>> +     if (unlikely(vcpu->arch.hwp.hw_coord_fb_cap)) {
>>>>> +             get_host_amperf(&exit_mperf, &exit_aperf);
>>>>> +             vcpu_update_amperf(vcpu, get_amperf_delta(enter_aperf, exit_aperf),
>>>>> +                     get_amperf_delta(enter_mperf, exit_mperf));
>>>>> +     }
>>>>> +
>>>>
>>>> Is there an alternative approach that doesn't require 4 RDMSRs on every VMX
>>>> round trip?  That's literally more expensive than VM-Enter + VM-Exit
>>>> combined.

It looks like we have quite a few users who are expecting this feature in 
different scenarios.

I will add a fast path for RO usage and a slow path if the guest tries to change 
the AMPERF values.

>>>>
>>>> E.g. what about adding KVM_X86_DISABLE_EXITS_APERF_MPERF and exposing the
>>>> MSRs for read when that capability is enabled?
>>>
>>> When would you load the hardware MSRs with the guest/host values?
>>
>> Ugh, I was thinking the MSRs were read-only.
> 
> EVen if they were read-only, they should power on to zero, and they
> will most likely not be zero when a guest powers on.

Can we assume that "not zero when the guest is on" will not harm any guests ?

> 
>> Doesn't this also interact with TSC scaling?
> 
> Yes, it should!

We have too much of a historical burden on TSC emulations.

For practical reasons, what if we only expose the AMPERF cap
if the host/guest has both CONSTANT_TSC and NONSTOP_TSC ?

One more design concern, I wonder if it is *safe* for the guest to
read amperf on pCPU[x] the first time and on pCPU[y] the next time.

Any input ?

Thanks,
Like Xu


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ