[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4C875735.9050808@redhat.com>
Date: Wed, 08 Sep 2010 12:28:21 +0300
From: Avi Kivity <avi@...hat.com>
To: Krishna Kumar2 <krkumar2@...ibm.com>
CC: anthony@...emonkey.ws, davem@...emloft.net, kvm@...r.kernel.org,
mst@...hat.com, netdev@...r.kernel.org, rusty@...tcorp.com.au
Subject: Re: [RFC PATCH 0/4] Implement multiqueue virtio-net
On 09/08/2010 12:22 PM, Krishna Kumar2 wrote:
> Avi Kivity<avi@...hat.com> wrote on 09/08/2010 01:17:34 PM:
>
>> On 09/08/2010 10:28 AM, Krishna Kumar wrote:
>>> Following patches implement Transmit mq in virtio-net. Also
>>> included is the user qemu changes.
>>>
>>> 1. This feature was first implemented with a single vhost.
>>> Testing showed 3-8% performance gain for upto 8 netperf
>>> sessions (and sometimes 16), but BW dropped with more
>>> sessions. However, implementing per-txq vhost improved
>>> BW significantly all the way to 128 sessions.
>> Why were vhost kernel changes required? Can't you just instantiate more
>> vhost queues?
> I did try using a single thread processing packets from multiple
> vq's on host, but the BW dropped beyond a certain number of
> sessions.
Oh - so the interface has not changed (which can be seen from the
patch). That was my concern, I remembered that we planned for vhost-net
to be multiqueue-ready.
The new guest and qemu code work with old vhost-net, just with reduced
performance, yes?
> I don't have the code and performance numbers for that
> right now since it is a bit ancient, I can try to resuscitate
> that if you want.
No need.
>>> Guest interrupts for a 4 TXQ device after a 5 min test:
>>> # egrep "virtio0|CPU" /proc/interrupts
>>> CPU0 CPU1 CPU2 CPU3
>>> 40: 0 0 0 0 PCI-MSI-edge virtio0-config
>>> 41: 126955 126912 126505 126940 PCI-MSI-edge virtio0-input
>>> 42: 108583 107787 107853 107716 PCI-MSI-edge virtio0-output.0
>>> 43: 300278 297653 299378 300554 PCI-MSI-edge virtio0-output.1
>>> 44: 372607 374884 371092 372011 PCI-MSI-edge virtio0-output.2
>>> 45: 162042 162261 163623 162923 PCI-MSI-edge virtio0-output.3
>> How are vhost threads and host interrupts distributed? We need to move
>> vhost queue threads to be colocated with the related vcpu threads (if no
>> extra cores are available) or on the same socket (if extra cores are
>> available). Similarly, move device interrupts to the same core as the
>> vhost thread.
> All my testing was without any tuning, including binding netperf&
> netserver (irqbalance is also off). I assume (maybe wrongly) that
> the above might give better results?
I hope so!
> Are you suggesting this
> combination:
> IRQ on guest:
> 40: CPU0
> 41: CPU1
> 42: CPU2
> 43: CPU3 (all CPUs are on socket #0)
> vhost:
> thread #0: CPU0
> thread #1: CPU1
> thread #2: CPU2
> thread #3: CPU3
> qemu:
> thread #0: CPU4
> thread #1: CPU5
> thread #2: CPU6
> thread #3: CPU7 (all CPUs are on socket#1)
May be better to put vcpu threads and vhost threads on the same socket.
Also need to affine host interrupts.
> netperf/netserver:
> Run on CPUs 0-4 on both sides
>
> The reason I did not optimize anything from user space is because
> I felt showing the default works reasonably well is important.
Definitely. Heavy tuning is not a useful path for general end users.
We need to make sure the the scheduler is able to arrive at the optimal
layout without pinning (but perhaps with hints).
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists