lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4B01B389.9090507@gmail.com>
Date:	Mon, 16 Nov 2009 15:18:17 -0500
From:	Gregory Haskins <gregory.haskins@...il.com>
To:	Stephen Hemminger <shemminger@...tta.com>
CC:	Herbert Xu <herbert@...dor.apana.org.au>,
	Gregory Haskins <ghaskins@...ell.com>,
	"Michael S. Tsirkin" <mst@...hat.com>,
	alacrityvm-devel@...ts.sourceforge.net,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [RFC PATCH] net: add dataref destructor to sk_buff

Stephen Hemminger wrote:
> On Sat, 14 Nov 2009 00:27:46 -0500
> Gregory Haskins <gregory.haskins@...il.com> wrote:
> 
>> Stephen Hemminger wrote:

> 
>>> People have tried doing copy-less send by page flipping, but the overhead of the IPI to
>>> invalidate the TLB exceeded the overhead of the copy. There was an Intel paper on this in
>>> at Linux Symposium (Ottawa) several years ago.
>> I think you are confusing copy-less tx with copy-less rx.  You can try
>> to do copy-less rx with page flipping, which has the IPI/TLB thrashing
>> properties you mention, and I agree is problematic.  We are talking
>> about copy-less tx here, however, and therefore no page-flipping is
>> involved.  Rather, we are just posting SG lists of pages directly to the
>> NIC (assuming the nic supports HIGH_DMA, etc).  You do not need to flip
>> the page, or invalidate the TLB (and thus IPI the other cores) to do
>> this to my knowledge.
>>
> 
> If you want to do copy-less tx for all applications, you have to
> do COW to handle the trivial case of :
> 
> while (cc = read(infd, buffer, sizeof buffer)) {
>    send(outsock, buffer, cc);
> }
> 
> 

You certainly _could_ implement this as a COW I suppose, but that would
be insane.  If someone did do this, you are right: you need TLB
invalidation.

However, if I were going to actually propose the changeover of the
system calls to use zero-copy (note that I am not), it would be based on
the concept in this patch.  That is: the send() would block until the
NIC completes the DMA and the shinfo block is freed.  Alternate
implementations would be AIO based, where the shinfo destructor
signifies the generation of the completion event.

FWIW: The latter is conceptually similar to how this is being used in
AlacrityVM.

HTH

Kind Regards,
-Greg


Download attachment "signature.asc" of type "application/pgp-signature" (268 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ