lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 12 Apr 2007 06:32:25 +0300
From:	Avi Kivity <avi@...ranet.com>
To:	Rusty Russell <rusty@...tcorp.com.au>
Cc:	Ingo Molnar <mingo@...e.hu>, kvm-devel@...ts.sourceforge.net,
	netdev <netdev@...r.kernel.org>
Subject: Re: [kvm-devel] QEMU PIC indirection patch for in-kernel APIC work

Rusty Russell wrote:
> On Wed, 2007-04-11 at 17:28 +0300, Avi Kivity wrote:
>   
>> Rusty Russell wrote:
>>     
>>> On Wed, 2007-04-11 at 07:26 +0300, Avi Kivity wrote:
>>>   
>>>       
>>>> Nope.  Being async is critical for copyless networking:
>>>>
>>>>         
>> With async operations, the saga continues like this: the host-side 
>> driver allocates an skb, get_page()s and attaches the data to the new 
>> skb, this skb crosses the bridge, trickles into the real ethernet 
>> device, gets queued there, sent, interrupts fire, triggering async 
>> completion.  On this completion, we send a virtual interrupt to the 
>> guest, which tells it to destroy the skb and reclaim the pages attached 
>> to it.
>>     
>
> Hi Avi!
>
> 	Thanks for spelling it out, I now understand your POV.  I had
> considered it obvious that a (non-async) write which didn't copy would
> block until the skb was finished with, which is easy to code up within
> the tap device itself.  Otherwise it's actually an async write without a
> notification mechanism, which I agree is broken.
>
>   

I hadn't considered an always-blocking (or unbuffered) networking API. 
It's very counter to current APIs, but does make sense with things like
syslets.  Without syslets, I don't think it's very useful as you need
some artificial threads to keep things humming along.

(How would userspace specify it? O_DIRECT when opening the tap?)

I don't think there's a lot of difference between implementing aio or
always-blocking copyless writes for tap.  They just differ in how they
sleep and in how to access user pages.

> 	Note though: if the guest can change the packet headers they can
> subvert some firewall rules and possibly crash the host.  None of the
> networking code I wrote expects packets to change in flight 8(
>
> 	This applies to a userspace or kernelspace driver.
>
>   

Umm, right.  We could write-protect the packets (which would be very
expensive).  We could set the evil bit on guest-originated packets, and
rewrite the entire networking stack to copy any part which is inspected
if the evil bit is set.  We need more head-scratching on this.

>>> Yes, and this is already present in the tap device.  Anthony suggested a
>>> slightly nasty hack for multiple sg packets in one writev()/readv, which
>>> could also give us batching.
>>>       
>> No need for hacks if we get list aio support one day.
>>     
>
> As you point out though, aio is not something we want to hold our breath
> for.  Plus, aio never makes things simpler, and complexity kills
> puppies.
>   

The puppies had better stay away from qemu then, as it is completely async.

Always-blocking writes won't reduce complexity.  Suddenly you need a
thread for each request batch and some pleasant code for joining the
threads when done.  Syslets do make it go away, though they're more for
the mostly-nonblocking-with-occasional-blockage stuff rather than the
always blocking thingie you describe.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ