[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49D3518B.1030403@novell.com>
Date: Wed, 01 Apr 2009 07:35:39 -0400
From: Gregory Haskins <ghaskins@...ell.com>
To: Rusty Russell <rusty@...tcorp.com.au>
CC: linux-kernel@...r.kernel.org, agraf@...e.de, pmullaney@...ell.com,
pmorreale@...ell.com, anthony@...emonkey.ws,
netdev@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [RFC PATCH 00/17] virtual-bus
Rusty Russell wrote:
> On Wednesday 01 April 2009 05:12:47 Gregory Haskins wrote:
>
>> Bare metal: tput = 4078Mb/s, round-trip = 25593pps (39us rtt)
>> Virtio-net: tput = 4003Mb/s, round-trip = 320pps (3125us rtt)
>> Venet: tput = 4050Mb/s, round-trip = 15255 (65us rtt)
>>
>
> That rtt time is awful. I know the notification suppression heuristic
> in qemu sucks.
>
> I could dig through the code, but I'll ask directly: what heuristic do
> you use for notification prevention in your venet_tap driver?
>
I am not 100% sure I know what you mean with "notification prevention",
but let me take a stab at it.
So like most of these kinds of constructs, I have two rings (rx + tx on
the guest is reversed to tx + rx on the host), each of which can signal
in either direction for a total of 4 events, 2 on each side of the
connection. I utilize what I call "bidirectional napi" so that only the
first packet submitted needs to signal across the guest/host boundary.
E.g. first ingress packet injects an interrupt, and then does a
napi_schedule and masks future irqs. Likewise, first egress packet does
a hypercall, and then does a "napi_schedule" (I dont actually use napi
in this path, but its conceptually identical) and masks future
hypercalls. So thats is my first form of what I would call notification
prevention.
The second form occurs on the "tx-complete" path (that is guest->host
tx). I only signal back to the guest to reclaim its skbs every 10
packets, or if I drain the queue, whichever comes first (note to self:
make this # configurable).
The nice part about this scheme is it significantly reduces the amount
of guest/host transitions, while still providing the lowest latency
response for single packets possible. e.g. Send one packet, and you get
one hypercall, and one tx-complete interrupt as soon as it queues on the
hardware. Send 100 packets, and you get one hypercall and 10
tx-complete interrupts as frequently as every tenth packet queues on the
hardware. There is no timer governing the flow, etc.
Is that what you were asking?
> As you point out, 350-450 is possible, which is still bad, and it's at least
> partially caused by the exit to userspace and two system calls. If virtio_net
> had a backend in the kernel, we'd be able to compare numbers properly.
>
:)
But that is the whole point, isnt it? I created vbus specifically as a
framework for putting things in the kernel, and that *is* one of the
major reasons it is faster than virtio-net...its not the difference in,
say, IOQs vs virtio-ring (though note I also think some of the
innovations we have added such as bi-dir napi are helping too, but these
are not "in-kernel" specific kinds of features and could probably help
the userspace version too).
I would be entirely happy if you guys accepted the general concept and
framework of vbus, and then worked with me to actually convert what I
have as "venet-tap" into essentially an in-kernel virtio-net. I am not
specifically interested in creating a competing pv-net driver...I just
needed something to showcase the concepts and I didnt want to hack the
virtio-net infrastructure to do it until I had everyone's blessing.
Note to maintainers: I *am* perfectly willing to maintain the venet
drivers if, for some reason, we decide that we want to keep them as
is. Its just an ideal for me to collapse virtio-net and venet-tap
together, and I suspect our community would prefer this as well.
-Greg
Download attachment "signature.asc" of type "application/pgp-signature" (258 bytes)
Powered by blists - more mailing lists