lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9ec0e0ba-bef6-710e-1e9c-36beaedae16e@intel.com>
Date:   Fri, 9 Apr 2021 16:30:14 +0800
From:   "Xu, Like" <like.xu@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Sean Christopherson <seanjc@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>, eranian@...gle.com,
        andi@...stfloor.org, kan.liang@...ux.intel.com,
        wei.w.wang@...el.com, Wanpeng Li <wanpengli@...cent.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        x86@...nel.org, linux-kernel@...r.kernel.org,
        Andi Kleen <ak@...ux.intel.com>,
        Like Xu <like.xu@...ux.intel.com>
Subject: Re: [PATCH v4 08/16] KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to
 manage guest DS buffer

On 2021/4/9 15:59, Peter Zijlstra wrote:
> On Fri, Apr 09, 2021 at 03:07:38PM +0800, Xu, Like wrote:
>> Hi Peter,
>>
>> On 2021/4/8 15:52, Peter Zijlstra wrote:
>>>> This is because in the early part of this function, we have operations:
>>>>
>>>>       if (x86_pmu.flags & PMU_FL_PEBS_ALL)
>>>>           arr[0].guest &= ~cpuc->pebs_enabled;
>>>>       else
>>>>           arr[0].guest &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);
>>>>
>>>> and if guest has PEBS_ENABLED, we need these bits back for PEBS counters:
>>>>
>>>>       arr[0].guest |= arr[1].guest;
>>> I don't think that's right, who's to say they were set in the first
>>> place? The guest's GLOBAL_CTRL could have had the bits cleared at VMEXIT
>>> time. You can't unconditionally add PEBS_ENABLED into GLOBAL_CTRL,
>>> that's wrong.
>> I can't keep up with you on this comment and would you explain more ?
> Well, it could be I'm terminally confused on how virt works (I usually
> am, it just doesn't make any sense ever).

I may help you a little on this.

>
> On top of that this code doesn't have any comments to help.

More comments will be added.

>
> So perf_guest_switch_msr has two msr values: guest and host.
>
> In my naive understanding guest is the msr value the guest sees and host
> is the value the host has. If it is not that, then the naming is just
> misleading at best.
>
> But thinking more about it, if these are fully emulated MSRs (which I
> think they are), then there might actually be 3 different values, not 2.

You are right about 3 different values.

>
> We have the value the guest sees when it uses {RD,WR}MSR.
> We have the value the hardware has when it runs a guest.
> We have the value the hardware has when it doesn't run a guest.
>
> And somehow this code does something, but I can't for the life of me
> figure out what and how.

Just focus on the last two values and the enabling bits (on the GLOBAL_CTRL
and PEBS_ENABLE) of "the value the hardware has when it runs a guest"
are exclusive with "the value the hardware has when it doesn't run a guest."

>> To address your previous comments, does the code below look good to you?
>>
>> static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>> {
>>      struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>>      struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
>>      struct debug_store *ds = __this_cpu_read(cpu_hw_events.ds);
>>      struct kvm_pmu *pmu = (struct kvm_pmu *)data;
>>      u64 pebs_mask = (x86_pmu.flags & PMU_FL_PEBS_ALL) ?
>>              cpuc->pebs_enabled : (cpuc->pebs_enabled & PEBS_COUNTER_MASK);
>>      int i = 0;
>>
>>      arr[i].msr = MSR_CORE_PERF_GLOBAL_CTRL;
>>      arr[i].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
>>      arr[i].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
>>      arr[i].guest &= ~pebs_mask;
>>
>>      if (!x86_pmu.pebs)
>>          goto out;
>>
>>      /*
>>       * If PMU counter has PEBS enabled it is not enough to
>>       * disable counter on a guest entry since PEBS memory
>>       * write can overshoot guest entry and corrupt guest
>>       * memory. Disabling PEBS solves the problem.
>>       *
>>       * Don't do this if the CPU already enforces it.
>>       */
>>      if (x86_pmu.pebs_no_isolation) {
>>          i++;
>>          arr[i].msr = MSR_IA32_PEBS_ENABLE;
>>          arr[i].host = cpuc->pebs_enabled;
>>          arr[i].guest = 0;
>>          goto out;
>>      }
>>
>>      if (!pmu || !x86_pmu.pebs_vmx)
>>          goto out;
>>
>>      i++;
>>      arr[i].msr = MSR_IA32_DS_AREA;
>>      arr[i].host = (unsigned long)ds;
>>      arr[i].guest = pmu->ds_area;
>>
>>      if (x86_pmu.intel_cap.pebs_baseline) {
>>          i++;
>>          arr[i].msr = MSR_PEBS_DATA_CFG;
>>          arr[i].host = cpuc->pebs_data_cfg;
>>          arr[i].guest = pmu->pebs_data_cfg;
>>      }
>>
>>      i++;
>>      arr[i].msr = MSR_IA32_PEBS_ENABLE;
>>      arr[i].host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask;
>>      arr[i].guest = pebs_mask & ~cpuc->intel_ctrl_host_mask;
>>
>>      if (arr[i].host) {
>>          /* Disable guest PEBS if host PEBS is enabled. */
>>          arr[i].guest = 0;
>>      } else {
>>          /* Disable guest PEBS for cross-mapped PEBS counters. */
>>          arr[i].guest &= ~pmu->host_cross_mapped_mask;
>>          arr[0].guest |= arr[i].guest;
>>      }
>>
>> out:
>>      *nr = ++i;
>>      return arr;
>> }
> The ++ is in a weird location, if you place it after filling out an
> entry it makes more sense I think. Something like:
>
> 	arr[i].msr = MSR_CORE_PERF_GLOBAL_CTRL;
> 	arr[i].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
> 	arr[i].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
> 	arr[i].guest &= ~pebs_mask;
> 	i++;
>
> or, perhaps even like:
>
> 	arr[i++] = (struct perf_guest_switch_msr){
> 		.msr = MSR_CORE_PERF_GLOBAL_CTRL,
> 		.host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
> 		.guest = x86_pmu.intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
> 	};

The later one looks good to me and I'll apply it.

> But it doesn't address the fundamental confusion I seem to be having,
> what actual msr value is what.

VMX hardware has the capability to switch MSR values atomically:
- for vm-entry instruction, it loads the value of arr[i].guest to arr[i].msr;
- for vm-exit instruction, it loads the value of arr[i].host to arr[i].msr;

The intel_guest_get_msrs() will populate arr[i].guest and arr[i].host values
before each vm-entry and its caller does the optimization to skip the switch
if arr[i].guest == arr[i].host.

Just let me know if you have more questions,
otherwise I assume we have reached an agreement on this part of code.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ