[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A0860D7.6010708@codemonkey.ws>
Date: Mon, 11 May 2009 12:31:03 -0500
From: Anthony Liguori <anthony@...emonkey.ws>
To: Gregory Haskins <gregory.haskins@...il.com>
CC: Gregory Haskins <ghaskins@...ell.com>, Avi Kivity <avi@...hat.com>,
Chris Wright <chrisw@...s-sol.org>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
Hollis Blanchard <hollisb@...ibm.com>
Subject: Re: [RFC PATCH 0/3] generic hypercall support
Gregory Haskins wrote:
> I specifically generalized my statement above because #1 I assume
> everyone here is smart enough to convert that nice round unit into the
> relevant figure. And #2, there are multiple potential latency sources
> at play which we need to factor in when looking at the big picture. For
> instance, the difference between PF exit, and an IO exit (2.58us on x86,
> to be precise). Or whether you need to take a heavy-weight exit. Or a
> context switch to qemu, the the kernel, back to qemu, and back to the
> vcpu). Or acquire a mutex. Or get head-of-lined on the VGA models IO.
> I know you wish that this whole discussion would just go away, but these
> little "300ns here, 1600ns there" really add up in aggregate despite
> your dismissive attitude towards them. And it doesn't take much to
> affect the results in a measurable way. As stated, each 1us costs ~4%.
> My motivation is to reduce as many of these sources as possible.
>
> So, yes, the delta from PIO to HC is 350ns. Yes, this is a ~1.4%
> improvement. So what? Its still an improvement. If that improvement
> were for free, would you object? And we all know that this change isn't
> "free" because we have to change some code (+128/-0, to be exact). But
> what is it specifically you are objecting to in the first place? Adding
> hypercall support as an pv_ops primitive isn't exactly hard or complex,
> or even very much code.
>
Where does 25us come from? The number you post below are 33us and
66us. This is part of what's frustrating me in this thread. Things are
way too theoretical. Saying that "if packet latency was 25us, then it
would be a 1.4% improvement" is close to misleading. The numbers you've
posted are also measuring on-box speeds. What really matters are
off-box latencies and that's just going to exaggerate.
IIUC, if you switched vbus to using PIO today, you would go from 66us to
to 65.65, which you'd round to 66us for on-box latencies. Even if you
didn't round, it's a 0.5% improvement in latency.
Adding hypercall support as a pv_ops primitive is adding a fair bit of
complexity. You need a hypercall fd mechanism to plumb this down to
userspace otherwise, you can't support migration from in-kernel backend
to non in-kernel backend. You need some way to allocate hypercalls to
particular devices which so far, has been completely ignored. I've
already mentioned why hypercalls are also unfortunate from a guest
perspective. They require kernel patching and this is almost certainly
going to break at least Vista as a guest. Certainly Windows 7.
So it's not at all fair to trivialize the complexity introduce here.
I'm simply asking for justification to introduce this complexity. I
don't see why this is unfair for me to ask.
>> As a more general observation, we need numbers to justify an
>> optimization, not to justify not including an optimization.
>>
>> In other words, the burden is on you to present a scenario where this
>> optimization would result in a measurable improvement in a real world
>> work load.
>>
>
> I have already done this. You seem to have chosen to ignore my
> statements and results, but if you insist on rehashing:
>
> I started this project by analyzing system traces and finding some of
> the various bottlenecks in comparison to a native host. Throughput was
> already pretty decent, but latency was pretty bad (and recently got
> *really* bad, but I know you already have a handle on whats causing
> that). I digress...one of the conclusions of the research was that I
> wanted to focus on building an IO subsystem designed to minimize the
> quantity of exits, minimize the cost of each exit, and shorten the
> end-to-end signaling path to achieve optimal performance. I also wanted
> to build a system that was extensible enough to work with a variety of
> client types, on a variety of architectures, etc, so we would only need
> to solve these problems "once". The end result was vbus, and the first
> working example was venet. The measured performance data of this work
> was as follows:
>
> 802.x network, 9000 byte MTU, 2 8-core x86_64s connected back to back
> with Chelsio T3 10GE via crossover.
>
> Bare metal : tput = 9717Mb/s, round-trip = 30396pps (33us rtt)
> Virtio-net (PCI) : tput = 4578Mb/s, round-trip = 249pps (4016us rtt)
> Venet (VBUS): tput = 5802Mb/s, round-trip = 15127 (66us rtt)
>
> For more details: http://lkml.org/lkml/2009/4/21/408
>
Sending out a massive infrastructure change that does things wildly
differently from how they're done today without any indication of why
those changes were necessary is disruptive.
If you could characterize all of the changes that vbus makes that are
different from virtio, demonstrating at each stage why the change
mattered and what benefit it brought, then we'd be having a completely
different discussion. I have no problem throwing away virtio today if
there's something else better.
That's not what you've done though. You wrote a bunch of code without
understanding why virtio does things the way it does and then dropped it
all on the list. This isn't necessarily a bad exercise, but there's a
ton of work necessary to determine which things vbus does differently
actually matter. I'm not saying that you shouldn't have done vbus, but
I'm saying there's a bunch of analysis work that you haven't done that
needs to be done before we start making any changes in upstream code.
I've been trying to argue why I don't think hypercalls are an important
part of vbus from a performance perspective. I've tried to demonstrate
why I don't think this is an important part of vbus. The frustration I
have with this series is that you seem unwilling to compromise any
aspect of vbus design. I understand you've made your decisions in vbus
for some reasons and you think the way you've done things is better, but
that's not enough. We have virtio today, it provides greater
functionality than vbus does, it supports multiple guest types, and it's
gotten quite a lot of testing. It has its warts, but most things that
have been around for some time do.
> Now I know you have been quick in the past to dismiss my efforts, and to
> claim you can get the same results without needing the various tricks
> and optimizations I uncovered. But quite frankly, until you post some
> patches for community review and comparison (as I have done), it's just
> meaningless talk.
I can just as easily say that until you post a full series that covers
all of the functionality that virtio has today, vbus is just meaningless
talk. But I'm trying not to be dismissive in all of this because I do
want to see you contribute to the KVM paravirtual IO infrastructure.
Clearly, you have useful ideas.
We can't just go rewriting things without a clear understanding of why
something's better. What's missing is a detailed analysis of what
virtio-net does today and what vbus does so that it's possible to draw
some conclusions.
For instance, this could look like:
For a single packet delivery:
150ns are spent from PIO operation
320ns are spent in heavy-weight exit handler
150ns are spent transitioning to userspace
5us are spent contending on qemu_mutex
30us are spent copying data in tun/tap driver
40us are spent waiting for RX
...
For vbus, it would look like:
130ns are spent from HC instruction
100ns are spent signaling TX thread
...
But single packet delivery is just one part of the puzzle. Bulk
transfers are also important. CPU consumption is important. How we
address things like live migration, non-privileged user initialization,
and userspace plumbing are all also important.
Right now, the whole discussion around this series is wildly speculative
and quite frankly, counter productive. A few RTT benchmarks are not
sufficient to make any kind of forward progress here. I certainly like
rewriting things as much as anyone else, but you need a substantial
amount of justification for it that so far hasn't been presented.
Do you understand what my concerns are and why I don't want to just
switch to a new large infrastructure?
Do you feel like you understand what sort of data I'm looking for to
justify the changes vbus is proposing to make? Is this something your
willing to do because IMHO this is a prerequisite for any sort of merge
consideration. The analysis of the virtio-net side of things is just as
important as the vbus side of things.
I've tried to explain this to you a number of times now and so far it
doesn't seem like I've been successful. If it isn't clear, please let
me know.
Regards,
Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists