[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <OF2EF80349.03D44EF4-ON65257798.002D0021-65257798.003352A5@in.ibm.com>
Date: Wed, 8 Sep 2010 14:52:54 +0530
From: Krishna Kumar2 <krkumar2@...ibm.com>
To: Avi Kivity <avi@...hat.com>
Cc: anthony@...emonkey.ws, davem@...emloft.net, kvm@...r.kernel.org,
mst@...hat.com, netdev@...r.kernel.org, rusty@...tcorp.com.au
Subject: Re: [RFC PATCH 0/4] Implement multiqueue virtio-net
Avi Kivity <avi@...hat.com> wrote on 09/08/2010 01:17:34 PM:
> On 09/08/2010 10:28 AM, Krishna Kumar wrote:
> > Following patches implement Transmit mq in virtio-net. Also
> > included is the user qemu changes.
> >
> > 1. This feature was first implemented with a single vhost.
> > Testing showed 3-8% performance gain for upto 8 netperf
> > sessions (and sometimes 16), but BW dropped with more
> > sessions. However, implementing per-txq vhost improved
> > BW significantly all the way to 128 sessions.
>
> Why were vhost kernel changes required? Can't you just instantiate more
> vhost queues?
I did try using a single thread processing packets from multiple
vq's on host, but the BW dropped beyond a certain number of
sessions. I don't have the code and performance numbers for that
right now since it is a bit ancient, I can try to resuscitate
that if you want.
> > Guest interrupts for a 4 TXQ device after a 5 min test:
> > # egrep "virtio0|CPU" /proc/interrupts
> > CPU0 CPU1 CPU2 CPU3
> > 40: 0 0 0 0 PCI-MSI-edge virtio0-config
> > 41: 126955 126912 126505 126940 PCI-MSI-edge virtio0-input
> > 42: 108583 107787 107853 107716 PCI-MSI-edge virtio0-output.0
> > 43: 300278 297653 299378 300554 PCI-MSI-edge virtio0-output.1
> > 44: 372607 374884 371092 372011 PCI-MSI-edge virtio0-output.2
> > 45: 162042 162261 163623 162923 PCI-MSI-edge virtio0-output.3
>
> How are vhost threads and host interrupts distributed? We need to move
> vhost queue threads to be colocated with the related vcpu threads (if no
> extra cores are available) or on the same socket (if extra cores are
> available). Similarly, move device interrupts to the same core as the
> vhost thread.
All my testing was without any tuning, including binding netperf &
netserver (irqbalance is also off). I assume (maybe wrongly) that
the above might give better results? Are you suggesting this
combination:
IRQ on guest:
40: CPU0
41: CPU1
42: CPU2
43: CPU3 (all CPUs are on socket #0)
vhost:
thread #0: CPU0
thread #1: CPU1
thread #2: CPU2
thread #3: CPU3
qemu:
thread #0: CPU4
thread #1: CPU5
thread #2: CPU6
thread #3: CPU7 (all CPUs are on socket#1)
netperf/netserver:
Run on CPUs 0-4 on both sides
The reason I did not optimize anything from user space is because
I felt showing the default works reasonably well is important.
Thanks,
- KK
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists