linux-kernel - Re: [PATCH v2 3/4] Add support for AMD Core Perf Extension in guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <07f23e5e-2747-b0bc-1b93-f83f3982649a@amd.com>
Date:   Wed, 15 Nov 2017 13:04:03 -0600
From:   "Natarajan, Janakarajan" <Janakarajan.Natarajan@....com>
To:     Borislav Petkov <bp@...e.de>
Cc:     kvm@...r.kernel.org, x86@...nel.org, linux-kernel@...r.kernel.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H . Peter Anvin" <hpa@...or.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Radim Krcmar <rkrcmar@...hat.com>,
        Len Brown <len.brown@...el.com>, Kyle Huey <me@...ehuey.com>,
        Kan Liang <Kan.liang@...el.com>,
        Grzegorz Andrejczuk <grzegorz.andrejczuk@...el.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        Tony Luck <tony.luck@...el.com>
Subject: Re: [PATCH v2 3/4] Add support for AMD Core Perf Extension in guest

On 11/9/2017 12:34 PM, Borislav Petkov wrote:
>> Subject: Re: [PATCH v2 3/4] Add support for AMD Core Perf Extension in guest
> Btw, your subjects need a prefix:
>
> "x86/kvm: Add guest support for the AMD core performance counters"
>
> for example.

Okay.

> On Mon, Nov 06, 2017 at 11:44:25AM -0600, Janakarajan Natarajan wrote:
>> This patch adds support for AMD Core Performance counters in the guest.
> Never say "This patch" in the commit message of a patch. It is
> tautologically useless.

Okay.

>> The base event select and counter MSRs are changed. In addition, with
>> the core extension, there are 2 extra counters available for performance
>> measurements for a total of 6.
>>
>> With the new MSRs, the logic to map them to the gp_counters[] is changed.
>> New functions are introduced to get the right base MSRs and to check the
>> validity of the get/set MSRs.
>>
>> If the guest has vcpus of either family 16h or a generation < 15h, it
> You're only talking about families here, the "generation" thing is
> confusing.

I'll change that.

>> falls back to using K7 MSRs and the number of counters the guest can
>> access is set to 4.
>>
>> Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@....com>
>> ---
>>   arch/x86/kvm/pmu_amd.c | 133 +++++++++++++++++++++++++++++++++++++++++++------
>>   arch/x86/kvm/x86.c     |   1 +
>>   2 files changed, 120 insertions(+), 14 deletions(-)
> ...
>
>> +static inline struct kvm_pmc *get_gp_pmc_amd(struct kvm_pmu *pmu, u32 msr,
>> +					     enum pmu_type type)
>> +{
>> +	unsigned int base = get_msr_base(pmu, type);
>> +
>> +	if (base == MSR_F15H_PERF_CTL) {
>> +		switch (msr) {
>> +		case MSR_F15H_PERF_CTL0:
>> +		case MSR_F15H_PERF_CTL1:
>> +		case MSR_F15H_PERF_CTL2:
>> +		case MSR_F15H_PERF_CTL3:
>> +		case MSR_F15H_PERF_CTL4:
>> +		case MSR_F15H_PERF_CTL5:
>> +			/*
>> +			 * AMD Perf Extension MSRs are not continuous.
>> +			 *
>> +			 * E.g. MSR_F15H_PERF_CTR0 -> 0xc0010201
>> +			 *	MSR_F15H_PERF_CTR1 -> 0xc0010203
>> +			 *
>> +			 * These are mapped to work with gp_counters[].
>> +			 * The index into the array is calculated by
>> +			 * dividing the difference between the requested
>> +			 * msr and the msr base by 2.
>> +			 *
>> +			 * E.g. MSR_F15H_PERF_CTR1 uses
>> +			 *	->gp_counters[(0xc0010203-0xc0010201)/2]
>> +			 *	->gp_counters[1]
>> +			 */
>> +			return &pmu->gp_counters[(msr - base) >> 1];
> Ok, it took me a bit of staring to understand what you're doing here.
> And frankly, this scheme is silly and fragile. You're relying on the
> fact that you can do math with the MSR numbers to get you the GP counter
> number. The moment that changes in future families, you are going to
> have to devise a new scheme for the new family.
>
> And instead of doing that, you're much better off producing a simple
> MSR -> counter mapping for each family which is a simple switch-case.
> No need to do get_msr_base() and whatnot - you simply feed in the MSR
> number and the function spits out a gp_counters index. You only need to
> check the family of the vcpu.
>
> Then lookers will be able to understand the code at a quick glance too.
>
> ...

I'll put out a v3 with those changes.

>> @@ -153,8 +251,15 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
>>   {
>>   	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>> +	int family, nr_counters;
>> +
>> +	family = guest_cpuid_family(vcpu);
>> +	if (family == 0x15 || family == 0x17)
>> +		nr_counters = AMD64_NUM_COUNTERS_CORE;
>> +	else
>> +		nr_counters = AMD64_NUM_COUNTERS;
>>   
>> -	pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS;
>> +	pmu->nr_arch_gp_counters = nr_counters;
>>   	pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
>>   	pmu->reserved_bits = 0xffffffff00200000ull;
>>   	/* not applicable to AMD; but clean them to prevent any fall out */
>> @@ -169,7 +274,7 @@ static void amd_pmu_init(struct kvm_vcpu *vcpu)
>>   	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
>>   	int i;
>>   
>> -	for (i = 0; i < AMD64_NUM_COUNTERS ; i++) {
>> +	for (i = 0; i < AMD64_NUM_COUNTERS_CORE ; i++) {
> This all works because INTEL_PMC_MAX_GENERIC is bigger than the AMD num
> counters but you need to check all that.
>
> Also, the finding out of the nr_counters you do in amd_pmu_refresh()
> should happen here, in the init function so that you have
> pmu->nr_arch_gp_counters properly set and then when you iterate over
> counters in the remaining functions, you do:
>
> 	for (i = 0; i < pmu->nr_arch_gp_counters ; i++) {
>
> instead of using those defines which are not always correct, depending
> on the family.

So, when the amd_pmu_init is called, a query to guest_cpuid_family() 
gives a value of -1.
If this is the case where qemu sets the family details later on with a 
kvm ioctl, it would make sense
to just initialize the maximum number of gp_counters that will be used 
and set the nr_arch_gp_counters
based on the family during the amd_pmu_refresh().

Thanks,
Janakarajan
>