[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <07f23e5e-2747-b0bc-1b93-f83f3982649a@amd.com>
Date: Wed, 15 Nov 2017 13:04:03 -0600
From: "Natarajan, Janakarajan" <Janakarajan.Natarajan@....com>
To: Borislav Petkov <bp@...e.de>
Cc: kvm@...r.kernel.org, x86@...nel.org, linux-kernel@...r.kernel.org,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
"H . Peter Anvin" <hpa@...or.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Radim Krcmar <rkrcmar@...hat.com>,
Len Brown <len.brown@...el.com>, Kyle Huey <me@...ehuey.com>,
Kan Liang <Kan.liang@...el.com>,
Grzegorz Andrejczuk <grzegorz.andrejczuk@...el.com>,
Tom Lendacky <thomas.lendacky@....com>,
Tony Luck <tony.luck@...el.com>
Subject: Re: [PATCH v2 3/4] Add support for AMD Core Perf Extension in guest
On 11/9/2017 12:34 PM, Borislav Petkov wrote:
>> Subject: Re: [PATCH v2 3/4] Add support for AMD Core Perf Extension in guest
> Btw, your subjects need a prefix:
>
> "x86/kvm: Add guest support for the AMD core performance counters"
>
> for example.
Okay.
> On Mon, Nov 06, 2017 at 11:44:25AM -0600, Janakarajan Natarajan wrote:
>> This patch adds support for AMD Core Performance counters in the guest.
> Never say "This patch" in the commit message of a patch. It is
> tautologically useless.
Okay.
>> The base event select and counter MSRs are changed. In addition, with
>> the core extension, there are 2 extra counters available for performance
>> measurements for a total of 6.
>>
>> With the new MSRs, the logic to map them to the gp_counters[] is changed.
>> New functions are introduced to get the right base MSRs and to check the
>> validity of the get/set MSRs.
>>
>> If the guest has vcpus of either family 16h or a generation < 15h, it
> You're only talking about families here, the "generation" thing is
> confusing.
I'll change that.
>> falls back to using K7 MSRs and the number of counters the guest can
>> access is set to 4.
>>
>> Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@....com>
>> ---
>> arch/x86/kvm/pmu_amd.c | 133 +++++++++++++++++++++++++++++++++++++++++++------
>> arch/x86/kvm/x86.c | 1 +
>> 2 files changed, 120 insertions(+), 14 deletions(-)
> ...
>
>> +static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
>> + enum pmu_type type)
>> +{
>> + unsigned int base = get_msr_base(pmu, type);
>> +
>> + if (base == MSR_F15H_PERF_CTL) {
>> + switch (msr) {
>> + case MSR_F15H_PERF_CTL0:
>> + case MSR_F15H_PERF_CTL1:
>> + case MSR_F15H_PERF_CTL2:
>> + case MSR_F15H_PERF_CTL3:
>> + case MSR_F15H_PERF_CTL4:
>> + case MSR_F15H_PERF_CTL5:
>> + /*
>> + * AMD Perf Extension MSRs are not continuous.
>> + *
>> + * E.g. MSR_F15H_PERF_CTR0 -> 0xc0010201
>> + * MSR_F15H_PERF_CTR1 -> 0xc0010203
>> + *
>> + * These are mapped to work with gp_counters[].
>> + * The index into the array is calculated by
>> + * dividing the difference between the requested
>> + * msr and the msr base by 2.
>> + *
>> + * E.g. MSR_F15H_PERF_CTR1 uses
>> + * ->gp_counters[(0xc0010203-0xc0010201)/2]
>> + * ->gp_counters[1]
>> + */
>> + return &pmu->gp_counters[(msr - base) >> 1];
> Ok, it took me a bit of staring to understand what you're doing here.
> And frankly, this scheme is silly and fragile. You're relying on the
> fact that you can do math with the MSR numbers to get you the GP counter
> number. The moment that changes in future families, you are going to
> have to devise a new scheme for the new family.
>
> And instead of doing that, you're much better off producing a simple
> MSR -> counter mapping for each family which is a simple switch-case.
> No need to do get_msr_base() and whatnot - you simply feed in the MSR
> number and the function spits out a gp_counters index. You only need to
> check the family of the vcpu.
>
> Then lookers will be able to understand the code at a quick glance too.
>
> ...
I'll put out a v3 with those changes.
>> @@ -153,8 +251,15 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>> static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
>> {
>> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>> + int family, nr_counters;
>> +
>> + family = guest_cpuid_family(vcpu);
>> + if (family == 0x15 || family == 0x17)
>> + nr_counters = AMD64_NUM_COUNTERS_CORE;
>> + else
>> + nr_counters = AMD64_NUM_COUNTERS;
>>
>> - pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS;
>> + pmu->nr_arch_gp_counters = nr_counters;
>> pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
>> pmu->reserved_bits = 0xffffffff00200000ull;
>> /* not applicable to AMD; but clean them to prevent any fall out */
>> @@ -169,7 +274,7 @@ static void amd_pmu_init(struct kvm_vcpu *vcpu)
>> struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>> int i;
>>
>> - for (i = 0; i < AMD64_NUM_COUNTERS ; i++) {
>> + for (i = 0; i < AMD64_NUM_COUNTERS_CORE ; i++) {
> This all works because INTEL_PMC_MAX_GENERIC is bigger than the AMD num
> counters but you need to check all that.
>
> Also, the finding out of the nr_counters you do in amd_pmu_refresh()
> should happen here, in the init function so that you have
> pmu->nr_arch_gp_counters properly set and then when you iterate over
> counters in the remaining functions, you do:
>
> for (i = 0; i < pmu->nr_arch_gp_counters ; i++) {
>
> instead of using those defines which are not always correct, depending
> on the family.
So, when the amd_pmu_init is called, a query to guest_cpuid_family()
gives a value of -1.
If this is the case where qemu sets the family details later on with a
kvm ioctl, it would make sense
to just initialize the maximum number of gp_counters that will be used
and set the nr_arch_gp_counters
based on the family during the amd_pmu_refresh().
Thanks,
Janakarajan
>
Powered by blists - more mailing lists