netdev - Re: Same data to several sockets with just one syscall ?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAGWmfYp5vz7=tKKLQAJe8o6UP9MduEkNUfyC62DGGAEq4yXP6w@mail.gmail.com>
Date:	Tue, 16 Feb 2016 08:52:32 +0100
From:	Claudio Scordino <claudio@...dence.eu.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	netdev@...r.kernel.org
Subject: Re: Same data to several sockets with just one syscall ?

Hi Eric,

2016-02-15 19:16 GMT+01:00 Eric Dumazet <eric.dumazet@...il.com>:
> On Mon, 2016-02-15 at 11:03 +0100, Claudio Scordino wrote:
>> Hi Eric,
>>
>> 2016-02-12 11:35 GMT+01:00 Eric Dumazet <eric.dumazet@...il.com>:
>> > On Fri, 2016-02-12 at 09:53 +0100, Claudio Scordino wrote:
>> >
>> >> This makes the application waste time in entering/exiting the kernel
>> >> level several times.
>> >
>> > syscall overhead is usually small. Real cost is actually getting to the
>> > socket objects (fd manipulation), that you wont avoid with a
>> > super-syscall anyway.
>>
>> Thank you for answering. I see your point.
>>
>> However, assuming that a switch from user-space to kernel-space (and
>> back) needs about 200nsec of computation (which I guess is a
>> reasonable value for a 3GHz x86 architecture), the 50th receiver
>> experiences a latency of about 10 usec. In some domains (e.g.,
>> finance) this delay is not negligible.
>
> I thought these domains were using multicast.

They don't :)

There are a couple of reasons behind their choice:

- Multicast works only in SOCK_DGRAM (i.e. unreliable)

- For a limited number of receivers (e.g. 50) and depending on the
data size, the latency of multicast is almost equal to the one of TCP

>
>>
>> Moving the "fan-out" code into kernel space would remove this waste of
>> time. IMHO, the latency reduction would pay back the 100 lines of code
>> for adding a new syscall.
>
> It wont reduce the latency at all, and add a lot of maintenance hassle.
>
> syscall overhead is about 40 ns.

I thought it was slightly higher. Does this time also include the
interrupt return to go back to user-space ?


> This is the time taken to transmit ~50 bytes on 10Gbit link.
>
> 40ns * 50 = 2 usec only.
>
> Feel free to implement your idea and test it, you'll discover the added
> complexity is not worth it.

Honestly, I can't see how it could be that difficult: the kernel-side
code could just iterate on the existing syscall...

Can you please elaborate a bit further to let me understand why it
would be that complex ?

Many thanks and best regards,

                 Claudio