[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <95726705-765d-020b-8c85-62fb917f2c14@amperemail.onmicrosoft.com>
Date: Fri, 11 Aug 2023 09:46:49 +0800
From: Shijie Huang <shijie@...eremail.onmicrosoft.com>
To: Marc Zyngier <maz@...nel.org>,
Huang Shijie <shijie@...amperecomputing.com>
Cc: oliver.upton@...ux.dev, james.morse@....com,
suzuki.poulose@....com, yuzenghui@...wei.com,
catalin.marinas@....com, will@...nel.org,
linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev,
linux-kernel@...r.kernel.org, patches@...erecomputing.com,
zwang@...erecomputing.com, Mark Rutland <mark.rutland@....com>
Subject: Re: [PATCH v2] KVM/arm64: reconfigurate the event filters for guest
context
Hi Marc,
在 2023/8/10 23:27, Marc Zyngier 写道:
> Huang,
>
> Please make sure you add everyone who commented on v1 (I've Cc'd Mark
> so that he can shime need as needed).
thanks.
>
> On Thu, 10 Aug 2023 08:29:06 +0100,
> Huang Shijie <shijie@...amperecomputing.com> wrote:
>> 1.) Background.
>> 1.1) In arm64, start a guest with Qemu which is running as a VMM of KVM,
>> and bind the guest to core 33 and run program "a" in guest.
>> The code of "a" shows below:
>> ----------------------------------------------------------
>> #include <stdio.h>
>>
>> int main()
>> {
>> unsigned long i = 0;
>>
>> for (;;) {
>> i++;
>> }
>>
>> printf("i:%ld\n", i);
>> return 0;
>> }
>> ----------------------------------------------------------
>>
>> 1.2) Use the following perf command in host:
>> #perf stat -e cycles:G,cycles:H -C 33 -I 1000 sleep 1
>> # time counts unit events
>> 1.000817400 3,299,471,572 cycles:G
>> 1.000817400 3,240,586 cycles:H
>>
>> This result is correct, my cpu's frequency is 3.3G.
>>
>> 1.3) Use the following perf command in host:
>> #perf stat -e cycles:G,cycles:H -C 33 -d -d -I 1000 sleep 1
>> time counts unit events
>> 1.000831480 153,634,097 cycles:G (70.03%)
>> 1.000831480 3,147,940,599 cycles:H (70.03%)
>> 1.000831480 1,143,598,527 L1-dcache-loads (70.03%)
>> 1.000831480 9,986 L1-dcache-load-misses # 0.00% of all L1-dcache accesses (70.03%)
>> 1.000831480 <not supported> LLC-loads
>> 1.000831480 <not supported> LLC-load-misses
>> 1.000831480 580,887,696 L1-icache-loads (70.03%)
>> 1.000831480 77,855 L1-icache-load-misses # 0.01% of all L1-icache accesses (70.03%)
>> 1.000831480 6,112,224,612 dTLB-loads (70.03%)
>> 1.000831480 16,222 dTLB-load-misses # 0.00% of all dTLB cache accesses (69.94%)
>> 1.000831480 590,015,996 iTLB-loads (59.95%)
>> 1.000831480 505 iTLB-load-misses # 0.00% of all iTLB cache accesses (59.95%)
>>
>> This result is wrong. The "cycle:G" should be nearly 3.3G.
>>
>> 2.) Root cause.
>> There is only 7 counters in my arm64 platform:
>> (one cycle counter) + (6 normal counters)
>>
>> In 1.3 above, we will use 10 event counters.
>> Since we only have 7 counters, the perf core will trigger
>> multiplexing in hrtimer:
>> perf_mux_hrtimer_restart() --> perf_rotate_context().
>>
>> If the hrtimer occurs when the host is running, it's fine.
>> If the hrtimer occurs when the guest is running,
>> the perf_rotate_context() will program the PMU with filters for
>> host context. The KVM does not have a chance to restore
>> PMU registers with kvm_vcpu_pmu_restore_guest().
>> The PMU does not work correctly, so we got wrong result.
>>
>> 3.) About this patch.
>> Make a KVM_REQ_RELOAD_PMU request before reentering the
>> guest. The request will call kvm_vcpu_pmu_restore_guest()
>> to reconfigurate the filters for guest context.
>>
>> 4.) Test result of this patch:
>> #perf stat -e cycles:G,cycles:H -C 33 -d -d -I 1000 sleep 1
>> time counts unit events
>> 1.001006400 3,298,348,656 cycles:G (70.03%)
>> 1.001006400 3,144,532 cycles:H (70.03%)
>> 1.001006400 941,149 L1-dcache-loads (70.03%)
>> 1.001006400 17,937 L1-dcache-load-misses # 1.91% of all L1-dcache accesses (70.03%)
>> 1.001006400 <not supported> LLC-loads
>> 1.001006400 <not supported> LLC-load-misses
>> 1.001006400 1,101,889 L1-icache-loads (70.03%)
>> 1.001006400 121,638 L1-icache-load-misses # 11.04% of all L1-icache accesses (70.03%)
>> 1.001006400 1,031,228 dTLB-loads (70.03%)
>> 1.001006400 26,952 dTLB-load-misses # 2.61% of all dTLB cache accesses (69.93%)
>> 1.001006400 1,030,678 iTLB-loads (59.94%)
>> 1.001006400 338 iTLB-load-misses # 0.03% of all iTLB cache accesses (59.94%)
>>
>> The result is correct. The "cycle:G" is nearly 3.3G now.
>>
>> Signed-off-by: Huang Shijie <shijie@...amperecomputing.com>
>> ---
>> v1 --> v2:
>> Do not change perf/core code, only change the ARM64 kvm code.
>> v1: https://lkml.org/lkml/2023/8/8/1465
>>
>> ---
>> arch/arm64/kvm/arm.c | 11 ++++++++++-
>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index c2c14059f6a8..475a2f0e0e40 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -919,8 +919,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>> if (!ret)
>> ret = 1;
>>
>> - if (ret > 0)
>> + if (ret > 0) {
>> + /*
>> + * The perf_rotate_context() may rotate the events and
>> + * reprogram PMU with filters for host context.
>> + * So make a request before reentering the guest to
>> + * reconfigurate the event filters for guest context.
>> + */
>> + kvm_make_request(KVM_REQ_RELOAD_PMU, vcpu);
>> +
>> ret = check_vcpu_requests(vcpu);
>> + }
> This looks extremely heavy handed. You're performing the reload on
> *every* entry, and I don't think this is right (exit-heavy workloads
> will suffer from it)
>
> Furthermore, you're also reloading the virtual state of the PMU
> (recreating guest events and other things), all of which looks pretty
> pointless, as all we're interested in is what is being counted on the
> *host*.
okay. What about to add a _new_ request, such as KVM_REQ_RESTROE_PMU_GUEST?
> Instead, we can restrict the reload of the host state (and only that)
> to situations where:
>
> - we're running on a VHE system
>
> - we have a host PMUv3 (not everybody does), as that's the only way we
> can profile a guest
okay. No problem.
>
> and ideally we would have a way to detect that a rotation happened
> (which may requires some help from the low-level PMU code).
I will check it, hope we can find a better way.
Thanks
Huang Shijie
Powered by blists - more mailing lists