[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ba2af89a-cdf4-4cb8-bfed-67034faa0f6e@bytedance.com>
Date: Wed, 23 Aug 2023 11:19:23 +0800
From: zhaoxu <zhaoxu.35@...edance.com>
To: Marc Zyngier <maz@...nel.org>
Cc: pbonzini@...hat.com, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, zhouyibo@...edance.com,
zhouliang.001@...edance.com, Oliver Upton <oliver.upton@...ux.dev>,
kvmarm@...ts.linux.dev, Mark Rutland <mark.rutland@....com>
Subject: Re: [RFC] KVM: arm/arm64: optimize vSGI injection performance
On 2023/8/22 16:28, Marc Zyngier wrote:
> On Tue, 22 Aug 2023 04:51:30 +0100,
> zhaoxu <zhaoxu.35@...edance.com> wrote:
>> In fact, the core vCPU search algorithm remains the same in the latest
>> kernel: iterate all vCPUs, if mpidr matches, inject. next version will
>> based on latest kernel.
>
> My point is that performance numbers on such an ancient kernel hardly
> make any sense, as a large portion of the code will be different. We
> aim to live in the future, not in the past.
>
Yes, i got it, thanks.
>>
>>> - which current guest OS *currently* make use of broadcast or 1:N
>>> SGIs? Linux doesn't and overall SGI multicasting is pretty useless
>>> to an OS.
>>>
>>> [...]
>> Yes, arm64 linux almost never send broadcast ipi. I will use another
>> test data to prove performence improvement
>
> Exactly. I also contend that *no* operating system uses broadcast (or
> even multicast) signalling, because this is a very pointless
> operation.
>
> So what are you optimising for?
>
Explanation at the end.
>>>
>>>>> /*
>>>>> - * Compare a given affinity (level 1-3 and a level 0 mask, from the SGI
>>>>> - * generation register ICC_SGI1R_EL1) with a given VCPU.
>>>>> - * If the VCPU's MPIDR matches, return the level0 affinity, otherwise
>>>>> - * return -1.
>>>>> + * Get affinity routing index from ICC_SGI_* register
>>>>> + * format:
>>>>> + * aff3 aff2 aff1 aff0
>>>>> + * |- 8 bits -|- 8 bits -|- 8 bits -|- 4 bits or 8bits -|
>>>
>>> OK, so you are implementing RSS support:
>>>
>>> - Why isn't that mentioned anywhere in the commit log?
>>>
>>> - Given that KVM actively limits the MPIDR to 4 bits at Aff0, how does
>>> it even work the first place?
>>>
>>> - How is that advertised to the guest?
>>>
>>> - How can the guest enable RSS support?
>>>
>> thanks to mention that, I also checked the relevant code, guest can't
>> enable RSS, it was my oversight. This part has removed in next
>> version.
>
> Then what's the point of your patch? You don't explain anything, which
> makes it very hard to guess what you're aiming for.
This patch aims to optimize the vCPU search algorithm when injecting vSGI.
For example, in a 64-core VM, the CPU topology consists of 4 aff0 groups
(0-15, 16-31, 32-47, 48-63). When the guest wants to send a SGI to core
63, in the previous logic, kvm needs to iterate over all vCPUs to
identify core 63 using the kvm_for_each_vcpu function, and then inject
the vSGI into it. However, the ICC_SGI* register provides affinity
routing information, enabling us to bypass the initial three aff0
groups, starting with the last one. As a result, the iteration times
will reduced from the number of vCPUs (64 in this case) to 16 or 8
times(Using a mask to determine the distribution of a target list in
ICC_SGI* register).
This optimization effect is evident under the following conditions: 1. A
VM with more than 16 cores. 2. The inject target vCPU is located after
the 16th core. Therefore, this patch must ensure that the performance
will not deteriorate when the inject target is aff0 group (core 0-15),
that’s the reason why I put these test data in the patch.
>
> M.
>
Xu.
Powered by blists - more mailing lists