[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190515173832.62afdd90@donnerap.cambridge.arm.com>
Date: Wed, 15 May 2019 17:38:32 +0100
From: Andre Przywara <andre.przywara@....com>
To: Marc Zyngier <marc.zyngier@....com>
Cc: Zenghui Yu <yuzenghui@...wei.com>, <christoffer.dall@....com>,
<eric.auger@...hat.com>, <james.morse@....com>,
<julien.thierry@....com>, <suzuki.poulose@....com>,
<kvmarm@...ts.cs.columbia.edu>, <mst@...hat.com>,
<pbonzini@...hat.com>, <rkrcmar@...hat.com>, <kvm@...r.kernel.org>,
<wanghaibin.wang@...wei.com>,
<linux-arm-kernel@...ts.infradead.org>,
<linux-kernel@...r.kernel.org>,
"Raslan, KarimAllah" <karahmed@...zon.de>
Subject: Re: [RFC PATCH] KVM: arm/arm64: Enable direct irqfd MSI injection
On Mon, 18 Mar 2019 13:30:40 +0000
Marc Zyngier <marc.zyngier@....com> wrote:
Hi,
> On Sun, 17 Mar 2019 19:35:48 +0000
> Marc Zyngier <marc.zyngier@....com> wrote:
>
> [...]
>
> > A first approach would be to keep a small cache of the last few
> > successful translations for this ITS, cache that could be looked-up by
> > holding a spinlock instead. A hit in this cache could directly be
> > injected. Any command that invalidates or changes anything (DISCARD,
> > INV, INVALL, MAPC with V=0, MAPD with V=0, MOVALL, MOVI) should nuke
> > the cache altogether.
>
> And to explain what I meant with this, I've pushed a branch[1] with a
> basic prototype. It is good enough to get a VM to boot, but I wouldn't
> trust it for anything serious just yet.
>
> If anyone feels like giving it a go and check whether it has any
> benefit performance wise, please do so.
So I took a stab at the performance aspect, and it took me a while to find
something where it actually makes a difference. The trick is to create *a
lot* of interrupts. This is my setup now:
- GICv3 and ITS
- 5.1.0 kernel vs. 5.1.0 plus Marc's rebased "ITS cache" patches on top
- 4 VCPU guest on a 4 core machine
- passing through a M.2 NVMe SSD (or a USB3 controller) to the guest
- running FIO in the guest, with:
- 4K block size, random reads, queue depth 16, 4 jobs (small)
- 1M block size, sequential reads, QD 1, 1 job (big)
For the NVMe disk I see a whopping 19% performance improvement with Marc's
series (for the small blocks). For a SATA SSD connected via USB3.0 I still
see 6% improvement. For NVMe there were 50,000 interrupts per second on
the host, the USB3 setup came only up to 10,000/s. For big blocks (with
IRQs in the low thousands/s) the win is less, but still a measurable 3%.
Now that I have the setup, I can rerun experiments very quickly (given I
don't loose access to the machine), so let me know if someone needs
further tests.
Cheers,
Andre.
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/its-translation-cache
Powered by blists - more mailing lists