netdev - Re: zero copy for relay server

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 29 Mar 2011 06:23:10 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Changli Gao <xiaosuo@...il.com>
Cc:	Viral Mehta <Viral.Mehta@...infotech.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: zero copy for relay server

Le mardi 29 mars 2011 à 10:00 +0800, Changli Gao a écrit :
> On Tue, Mar 29, 2011 at 2:34 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:

> I think he concerns the overhead of system calls. In order to omit a
> system call, I think you can implement sth. like this:
> 
> splice2(infd, outfd, pipefd, ...)
> 

Yes, but given no numbers are given, and no code yet written, I ask the
question.

Giving 4 file descriptors to a single syscall sounds convoluted.

> What you need do is maintaining pipes by yourself.
> 
> >> 2. I believe underlying PIPE that we are using will also have some size limit
> >>     (like in user space 4K or 64K, not sure)
> >
> > What kind of socket is able to deliver more than 64K frames ?
> 
> You can enlarge the size with fcntl(pipefd, F_SETPIPE_SZ,...).
> 

Not really useful, since splice() internals use automatic arrays sized
with PIPE_DEF_BUFFERS.

You can enlarge the size of pipe, but still we are limited to at most
64K in skb_splice_bits() for example [On x86 and its 4KB pages]

This doesnt matter, since skb are limited to 16 pages anyway (or 64Kb)

F_SETPIPE_SZ only can increase size of pipe ringbuffer (which should be
empty or contain at most one skb), therefore increasing dcache needs.

> >
> > sendfile() is based on top of splice(), but it's faster to use splice().
> >
> >
> 
> Why? Thanks.
> 

The real cost is not syscall overhead, but context switches and cache
misses. Adding a "super syscall" adds kernel text and increases icache
misses on real machine (I am not talking about machine used in micro
benchmarks)

Most likely, GRO can significantly speed this workload, while a syscall
avoidance wont.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html