[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <461DA849.50406@qumranet.com>
Date: Thu, 12 Apr 2007 06:32:25 +0300
From: Avi Kivity <avi@...ranet.com>
To: Rusty Russell <rusty@...tcorp.com.au>
Cc: Ingo Molnar <mingo@...e.hu>, kvm-devel@...ts.sourceforge.net,
netdev <netdev@...r.kernel.org>
Subject: Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Rusty Russell wrote:
> On Wed, 2007-04-11 at 17:28 +0300, Avi Kivity wrote:
>
>> Rusty Russell wrote:
>>
>>> On Wed, 2007-04-11 at 07:26 +0300, Avi Kivity wrote:
>>>
>>>
>>>> Nope. Being async is critical for copyless networking:
>>>>
>>>>
>> With async operations, the saga continues like this: the host-side
>> driver allocates an skb, get_page()s and attaches the data to the new
>> skb, this skb crosses the bridge, trickles into the real ethernet
>> device, gets queued there, sent, interrupts fire, triggering async
>> completion. On this completion, we send a virtual interrupt to the
>> guest, which tells it to destroy the skb and reclaim the pages attached
>> to it.
>>
>
> Hi Avi!
>
> Thanks for spelling it out, I now understand your POV. I had
> considered it obvious that a (non-async) write which didn't copy would
> block until the skb was finished with, which is easy to code up within
> the tap device itself. Otherwise it's actually an async write without a
> notification mechanism, which I agree is broken.
>
>
I hadn't considered an always-blocking (or unbuffered) networking API.
It's very counter to current APIs, but does make sense with things like
syslets. Without syslets, I don't think it's very useful as you need
some artificial threads to keep things humming along.
(How would userspace specify it? O_DIRECT when opening the tap?)
I don't think there's a lot of difference between implementing aio or
always-blocking copyless writes for tap. They just differ in how they
sleep and in how to access user pages.
> Note though: if the guest can change the packet headers they can
> subvert some firewall rules and possibly crash the host. None of the
> networking code I wrote expects packets to change in flight 8(
>
> This applies to a userspace or kernelspace driver.
>
>
Umm, right. We could write-protect the packets (which would be very
expensive). We could set the evil bit on guest-originated packets, and
rewrite the entire networking stack to copy any part which is inspected
if the evil bit is set. We need more head-scratching on this.
>>> Yes, and this is already present in the tap device. Anthony suggested a
>>> slightly nasty hack for multiple sg packets in one writev()/readv, which
>>> could also give us batching.
>>>
>> No need for hacks if we get list aio support one day.
>>
>
> As you point out though, aio is not something we want to hold our breath
> for. Plus, aio never makes things simpler, and complexity kills
> puppies.
>
The puppies had better stay away from qemu then, as it is completely async.
Always-blocking writes won't reduce complexity. Suddenly you need a
thread for each request batch and some pleasant code for joining the
threads when done. Syslets do make it go away, though they're more for
the mostly-nonblocking-with-occasional-blockage stuff rather than the
always blocking thingie you describe.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists