[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <46187F4E.1080807@qumranet.com>
Date: Sun, 08 Apr 2007 08:36:14 +0300
From: Avi Kivity <avi@...ranet.com>
To: Rusty Russell <rusty@...tcorp.com.au>
Cc: Ingo Molnar <mingo@...e.hu>, kvm-devel@...ts.sourceforge.net,
netdev <netdev@...r.kernel.org>
Subject: Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work
Rusty Russell wrote:
> On Thu, 2007-04-05 at 10:17 +0300, Avi Kivity wrote:
>
>> Rusty Russell wrote:
>>
>>> You didn't quote Anthony's point about "it's more about there not being
>>> good enough userspace interfaces to do network IO."
>>>
>>> It's easier to write a kernel-space network driver, but it's not
>>> obviously the right thing to do until we can show that an efficient
>>> packet-level userspace interface isn't possible. I don't think that's
>>> been done, and it would be interesting to try.
>>>
>>>
>> In the case of networking, the copyful interfaces on receive are driven
>> by the hardware not knowing how to split the header from the data. On
>> transmit I agree, it could be made copyless from userspace (somthing
>> like sendfilev, only not file oriented).
>>
>
> Hi Avi,
>
> I don't think you've thought about this very hard. The receive copy is
> completely independent with whether the packet is going to the guest via
> a kernel driver or via userspace, so not relevant.
>
A packet received in the kernel cannot be made available to userspace in
a safe manner without a copy, as it will not be aligned with page
boundaries, so userspace cannot examine the packet until after one copy
has occured. After userspace has determined what to do with the packet,
another copy must take place to get it there.
There's a counterexample, mmapped sockets, but that works only when all
packets arriving on a card are exposed to the same process. This is
useful for tcpdump or for what you outline below but is hardly generic.
> And if all packets from the card are going to the guest, you can
> deliver directly. Userspace or kernel, no difference.
>
That is not the common case. Nor is it true when there is a mismatch
between the card's capabilties and guest expectations and constraints.
For example, guest memory is not physically contiguous so a NIC that
won't do scatter/gather will require bouncing (or an iommu, but that's
not here yet).
> And we have a "sendfilev not file oriented": it's called "writev" 8)
>
writev() cannot be made copyless for networking. One needs an async
interface so the kernel can complete the write after the NIC acks the
dma transfer, or a kernel driver.
> An in-kernel driver can avoid system call overhead and page references.
> But a better tap device helps more than just KVM.
>
I'll believe it when I see it.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists