lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 12 Jun 2020 05:28:49 +0000
From:   "Kang, Luwei" <luwei.kang@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        "Liang, Kan" <kan.liang@...ux.intel.com>
CC:     "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "acme@...nel.org" <acme@...nel.org>,
        "mark.rutland@....com" <mark.rutland@....com>,
        "alexander.shishkin@...ux.intel.com" 
        <alexander.shishkin@...ux.intel.com>,
        "jolsa@...hat.com" <jolsa@...hat.com>,
        "namhyung@...nel.org" <namhyung@...nel.org>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "bp@...en8.de" <bp@...en8.de>, "hpa@...or.com" <hpa@...or.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "Christopherson, Sean J" <sean.j.christopherson@...el.com>,
        "vkuznets@...hat.com" <vkuznets@...hat.com>,
        "wanpengli@...cent.com" <wanpengli@...cent.com>,
        "jmattson@...gle.com" <jmattson@...gle.com>,
        "joro@...tes.org" <joro@...tes.org>,
        "pawan.kumar.gupta@...ux.intel.com" 
        <pawan.kumar.gupta@...ux.intel.com>,
        "ak@...ux.intel.com" <ak@...ux.intel.com>,
        "thomas.lendacky@....com" <thomas.lendacky@....com>,
        "Yu, Fenghua" <fenghua.yu@...el.com>,
        "like.xu@...ux.intel.com" <like.xu@...ux.intel.com>,
        "Wang, Wei W" <wei.w.wang@...el.com>
Subject: RE: [PATCH v1 01/11] perf/x86/core: Support KVM to assign a dedicated
 counter for guest PEBS

> > > Suppose your KVM thing claims counter 0/2 (ICL/SKL) for some random
> > > PEBS event, and then the host wants to use PREC_DIST.. Then one of
> > > them will be screwed for no reason what so ever.
> > >
> >
> > The multiplexing should be triggered.
> >
> > For host, if both user A and user B requires PREC_DIST, the
> > multiplexing should be triggered for them.
> > Now, the user B is KVM. I don't think there is difference. The
> > multiplexing should still be triggered. Why it is screwed?
> 
> Becuase if KVM isn't PREC_DIST we should be able to reschedule it to a
> different counter.
> 
> > > How is that not destroying scheduling freedom? Any other situation
> > > we'd have moved the !PREC_DIST PEBS event to another counter.
> > >
> >
> > All counters are equivalent for them. It doesn't matter if we move it
> > to another counter. There is no impact for the user.
> 
> But we cannot move it to another counter, because you're pinning it.

Hi Peter,

To avoid the pinning counters, I have tried to do some evaluation about
patching the PEBS record for guest in KVM. In this approach, about ~30% 
time increased on guest PEBS PMI handler latency (
e.g.perf record -e branch-loads:p -c 1000 ~/Tools/br_instr a).

Some implementation details as below:
1. Patching the guest PEBS records "Applicable Counters" filed when the guest
     required counter is not the same with the host. Because the guest PEBS
     driver will drop these PEBS records if the "Applicable Counters" not the
     same with the required counter index.
2. Traping the guest driver's behavior(VM-exit) of disabling PEBS. 
     It happens before reading PEBS records (e.g. PEBS PMI handler, before
     application exit and so on)
3. To patch the Guest PEBS records in KVM, we need to get the HPA of the
     guest PEBS buffer.
     <1> Trapping the guest write of IA32_DS_AREA register and get the GVA
             of guest DS_AREA.
     <2> Translate the DS AREA GVA to GPA(kvm_mmu_gva_to_gpa_read)
             and get the GVA of guest PEBS buffer from DS AREA
             (kvm_vcpu_read_guest_atomic).
     <3> Although we have got the GVA of PEBS buffer, we need to do the
             address translation(GVA->GPA->HPA) for each page. Because we can't
             assume the GPAs of Guest PEBS buffer are always continuous.
	
But we met another issue about the PEBS counter reset field in DS AREA.
pebs_event_reset in DS area has to be set for auto reload, which is per
counter. Guest and Host may use different counters. Let's say guest wants to
use counter 0, but host assign counter 1 to guest. Guest sets the reset value to
pebs_event_reset[0]. However, since counter 1 is the one which is eventually
scheduled, HW will use  pebs_event_reset[1] as reset value.

We can't copy the value of the guest pebs_event_reset[0] to
pebs_event_reset[1] directly(Patching DS AREA) because the guest driver may
confused, and we can't assume the guest counter 0 and 1 are not used for this
PEBS task at the same time. And what's more, KVM can't aware the guest
read/write to the DS AREA because it just a general memory for guest.

What is your opinion or do you have a better proposal?

Thanks,
Luwei Kang

> 
> > In the new proposal, KVM user is treated the same as other host events
> > with event constraint. The scheduler is free to choose whether or not
> > to assign a counter for it.
> 
> That's what it does, I understand that. I'm saying that that is creating artificial
> contention.
> 
> 
> Why is this needed anyway? Can't we force the guest to flush and then move it
> over to a new counter?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ