lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 29 Mar 2011 06:23:10 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Changli Gao <xiaosuo@...il.com>
Cc:	Viral Mehta <Viral.Mehta@...infotech.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: zero copy for relay server

Le mardi 29 mars 2011 à 10:00 +0800, Changli Gao a écrit :
> On Tue, Mar 29, 2011 at 2:34 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:

> I think he concerns the overhead of system calls. In order to omit a
> system call, I think you can implement sth. like this:
> 
> splice2(infd, outfd, pipefd, ...)
> 

Yes, but given no numbers are given, and no code yet written, I ask the
question.

Giving 4 file descriptors to a single syscall sounds convoluted.


> What you need do is maintaining pipes by yourself.
> 
> >> 2. I believe underlying PIPE that we are using will also have some size limit
> >>     (like in user space 4K or 64K, not sure)
> >
> > What kind of socket is able to deliver more than 64K frames ?
> 
> You can enlarge the size with fcntl(pipefd, F_SETPIPE_SZ,...).
> 

Not really useful, since splice() internals use automatic arrays sized
with PIPE_DEF_BUFFERS.

You can enlarge the size of pipe, but still we are limited to at most
64K in skb_splice_bits() for example [On x86 and its 4KB pages]

This doesnt matter, since skb are limited to 16 pages anyway (or 64Kb)

F_SETPIPE_SZ only can increase size of pipe ringbuffer (which should be
empty or contain at most one skb), therefore increasing dcache needs.

 
> >
> > sendfile() is based on top of splice(), but it's faster to use splice().
> >
> >
> 
> Why? Thanks.
> 

The real cost is not syscall overhead, but context switches and cache
misses. Adding a "super syscall" adds kernel text and increases icache
misses on real machine (I am not talking about machine used in micro
benchmarks)

Most likely, GRO can significantly speed this workload, while a syscall
avoidance wont.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ