lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091116115931.6266f9c9@nehalam>
Date:	Mon, 16 Nov 2009 11:59:31 -0800
From:	Stephen Hemminger <shemminger@...tta.com>
To:	Gregory Haskins <gregory.haskins@...il.com>
Cc:	Herbert Xu <herbert@...dor.apana.org.au>,
	Gregory Haskins <ghaskins@...ell.com>,
	"Michael S. Tsirkin" <mst@...hat.com>,
	alacrityvm-devel@...ts.sourceforge.net,
	linux-kernel@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [RFC PATCH] net: add dataref destructor to sk_buff

On Sat, 14 Nov 2009 00:27:46 -0500
Gregory Haskins <gregory.haskins@...il.com> wrote:

> Stephen Hemminger wrote:
> > On Fri, 13 Nov 2009 21:27:57 -0500
> > Gregory Haskins <gregory.haskins@...il.com> wrote:
> > 
> >> Herbert Xu wrote:
> >>> On Fri, Nov 13, 2009 at 08:33:35PM -0500, Gregory Haskins wrote:
> >>>> Well, not with respect to the overall protocol, of course not.  But with
> >>>> respect to the buffer in question, it _has_ to be.  Or am I missing
> >>>> something?
> >>> sendfile() has never guaranteed that the kernel is finished with
> >>> the underlying pages when it returns.
> >>>
> >>> Cheers,
> >> Clearly there must be _some_ mechanism to synchronize (e.g.
> >> flush/barrier) though, right?  Otherwise, that interface would seem to
> >> be quite prone to races and would likely be unusable.   So what does
> >> said flush use to know when the buffer is free?
> > 
> > No all the interfaces require a copy.
> 
> I'm sorry, but I do not think that is correct.  As others have pointed
> out, that would not appear to be true for at least sendfile.

Correct.

> 
> > Actually, sendfile makes no guarantee about synchronization
> > because the receiver of said file could be arbitrarily slow, and any attempt at locking down
> > current contents of file is a denial of service exposure.
> 
> I think you are inverting the problem space.  It is fully expected that
> changing the "file", or the pages that represent the file before the
> packet is queued would constitute the modification of the stream on the
> wire.
> 
> I am more thinking about the applications of mmap+sendfile to implement
> a zero-copy interface.  As David mentions in another mail, it seems that
> there is no sync mechanism available, so this would not appear to be a
> viable use case today, unfortunately.

yes, if you do mmap/sendfile then there is no synchronization, and the stack
can hold onto your data for an arbitrary time.  The file and mapping's can
be closed but that risks tying up all of memory.


> > 
> > People have tried doing copy-less send by page flipping, but the overhead of the IPI to
> > invalidate the TLB exceeded the overhead of the copy. There was an Intel paper on this in
> > at Linux Symposium (Ottawa) several years ago.
> 
> I think you are confusing copy-less tx with copy-less rx.  You can try
> to do copy-less rx with page flipping, which has the IPI/TLB thrashing
> properties you mention, and I agree is problematic.  We are talking
> about copy-less tx here, however, and therefore no page-flipping is
> involved.  Rather, we are just posting SG lists of pages directly to the
> NIC (assuming the nic supports HIGH_DMA, etc).  You do not need to flip
> the page, or invalidate the TLB (and thus IPI the other cores) to do
> this to my knowledge.
> 

If you want to do copy-less tx for all applications, you have to
do COW to handle the trivial case of :

while (cc = read(infd, buffer, sizeof buffer)) {
   send(outsock, buffer, cc);
}




-- 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ