[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ec014e8d-eb5f-03cc-3ed1-da58039ef034@bytedance.com>
Date: Mon, 25 Oct 2021 11:14:13 +0800
From: zhenwei pi <pizhenwei@...edance.com>
To: Wanpeng Li <kernellwp@...il.com>,
Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
Jonathan Corbet <corbet@....net>,
Wanpeng Li <wanpengli@...cent.com>,
LKML <linux-kernel@...r.kernel.org>, linux-doc@...r.kernel.org
Subject: Re: [PATCH] x86/kvm: Introduce boot parameter no-kvm-pvipi
On 10/21/21 3:17 PM, zhenwei pi wrote:
> On 10/21/21 1:03 PM, Wanpeng Li wrote:
>> On Thu, 21 Oct 2021 at 11:05, zhenwei pi <pizhenwei@...edance.com> wrote:
>>>
>>>
>>> On 10/21/21 4:12 AM, Sean Christopherson wrote:
>>>> On Wed, Oct 20, 2021, Wanpeng Li wrote:
>>>>> On Wed, 20 Oct 2021 at 20:08, zhenwei pi <pizhenwei@...edance.com>
>>>>> wrote:
>>>>>>
>>>>>> Although host side exposes KVM PV SEND IPI feature to guest side,
>>>>>> guest should still have a chance to disable it.
>>>>>>
>>>>>> A typicall case of this parameter:
>>>>>> If the host AMD server enables AVIC feature, the flat mode of APIC
>>>>>> get better performance in the guest.
>>>>>
>>>>> Hmm, I didn't find enough valuable information in your posting. We
>>>>> observe AMD a lot before.
>>>>> https://lore.kernel.org/all/CANRm+Cx597FNRUCyVz1D=B6Vs2GX3Sw57X7Muk+yMpi_hb+v1w@mail.gmail.com/T/#u
>>>>>
>>>>
>>>> I too would like to see numbers. I suspect the answer is going to
>>>> be that
>>>> AVIC performs poorly in CPU overcommit scenarios because of the cost
>>>> of managing
>>>> the tables and handling "failed delivery" exits, but that AVIC does
>>>> quite well
>>>> when vCPUs are pinned 1:1 and IPIs rarely require an exit to the host.
>>>>
>>>
>>> Test env:
>>> CPU: AMD EPYC 7642 48-Core Processor
>>>
>>> Kmod args(enable avic and disable nested):
>>> modprobe kvm-amd nested=0 avic=1 npt=1
>>>
>>> QEMU args(disable x2apic):
>>> ... -cpu host,x2apic=off ...
>>>
>>> Benchmark tool:
>>> https://github.com/bytedance/kvm-utils/tree/master/microbenchmark/apic-ipi
>>>
>>>
>>> ~# insmod apic_ipi.ko options=5 && dmesg -c
>>>
>>> apic_ipi: 1 NUMA node(s)
>>> apic_ipi: apic [flat]
>>> apic_ipi: apic->send_IPI[default_send_IPI_single+0x0/0x40]
>>> apic_ipi: apic->send_IPI_mask[kvm_send_ipi_mask+0x0/0x10]
>>> apic_ipi: IPI[kvm_send_ipi_mask] from CPU[0] to CPU[1]
>>> apic_ipi: total cycles 375671259, avg 3756
>>> apic_ipi: IPI[flat_send_IPI_mask] from CPU[0] to CPU[1]
>>> apic_ipi: total cycles 221961822, avg 2219
>>>
>>>
>>> apic->send_IPI_mask[kvm_send_ipi_mask+0x0/0x10]
>>> -> This line show current send_IPI_mask is kvm_send_ipi_mask(because
>>> of PV SEND IPI FEATURE)
>>>
>>> apic_ipi: IPI[kvm_send_ipi_mask] from CPU[0] to CPU[1]
>>> apic_ipi: total cycles 375671259, avg 3756
>>> -->These lines show the average cycles of each kvm_send_ipi_mask:
>>> 3756
>>>
>>> apic_ipi: IPI[flat_send_IPI_mask] from CPU[0] to CPU[1]
>>> apic_ipi: total cycles 221961822, avg 2219
>>> -->These lines show the average cycles of each
>>> flat_send_IPI_mask: 2219
>>
>> Just single target IPI is not eough.
>>
>> Wanpeng
>>
>
> Benchmark smp_call_function_single
> (https://github.com/bytedance/kvm-utils/blob/master/microbenchmark/ipi-bench/ipi_bench.c):
>
>
> Test env:
> CPU: AMD EPYC 7642 48-Core Processor
>
> Kmod args(enable avic and disable nested):
> modprobe kvm-amd nested=0 avic=1 npt=1
>
> QEMU args(disable x2apic):
> ... -cpu host,x2apic=off ...
>
> 1> without no-kvm-pvipi:
> ipi_bench_single wait[1], CPU0[NODE0] -> CPU1[NODE0], loop = 100000
> elapsed = 424945631 cycles, average = 4249 cycles
> ipitime = 385246136 cycles, average = 3852 cycles
> ipi_bench_single wait[0], CPU0[NODE0] -> CPU1[NODE0], loop = 100000
> elapsed = 419057953 cycles, average = 4190 cycles
>
> 2> with no-kvm-pvipi:
> ipi_bench_single wait[1], CPU0[NODE0] -> CPU1[NODE0], loop = 100000
> elapsed = 321756407 cycles, average = 3217 cycles
> ipitime = 299433550 cycles, average = 2994 cycles
> ipi_bench_single wait[0], CPU0[NODE0] -> CPU1[NODE0], loop = 100000
> elapsed = 295382146 cycles, average = 2953 cycles
>
>
Hi, Wanpeng & Sean
Also benchmark redis(by 127.0.0.1) in a guest(2vCPU), 'no-kvm-pvipi'
gets better performance.
Test env:
Host side: pin 2vCPU on 2core in a die.
Guest side: run command:
taskset -c 1 ./redis-server --appendonly no
taskset -c 0 ./redis-benchmark -h 127.0.0.1 -d 1024 -n 10000000 -t get
1> without no-kvm-pvipi:
redis QPS: 193203.12 requests per second
kvm_pv_send_ipi exit: ~18K/s
2> with no-kvm-pvipi:
redis QPS: 196028.47 requests per second
avic_incomplete_ipi_interception exit: ~5K/s
--
zhenwei pi
Powered by blists - more mailing lists