lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD8XO3b0m5Qn1Ey3gu3HPmcOanN-yjCYBJZEUEu754X=5jAtOA@mail.gmail.com>
Date:   Thu, 25 Apr 2019 11:01:22 +0300
From:   Maxim Uvarov <maxim.uvarov@...aro.org>
To:     Eric Dumazet <eric.dumazet@...il.com>
Cc:     netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        Ilias Apalodimas <ilias.apalodimas@...aro.org>
Subject: Re: RFC: zero copy recv()

On Wed, 24 Apr 2019 at 18:59, Eric Dumazet <eric.dumazet@...il.com> wrote:
>
>
>
> On 04/23/2019 11:23 PM, Maxim Uvarov wrote:
> > Hello,
> >
> > On different conferences I see that people are trying to accelerate
> > network with putting packet processing with protocol level completely
> > to user space. It might be DPDK, ODP or AF_XDP  plus some network
> > stack on top of it. Then people are trying to test this solution with
> > some existence applications. And in better way do not modify
> > application binaries and just LD_PRELOAD sockets syscalls (recv(),
> > sendto() and etc). Current recv() expects that application allocates
> > memory and call will "copy" packet to that memory. Copy per packet is
> > slow.  Can we consider about implementing zero copy API calls
> > friendly? Can this change be accepted to kernel?
>

Hello Eric, thanks for responding.

> Generic zero copy is hard.
>

yes that is true.

> As soon as you have multiple consumers in different domains for the data,
> you need some kind of multiplexing, typically using hardware capabilities.
>
> For TCP, we implemented zero copy last year, which works quite well
> on x86 if your network uses MTU of 4096+headers.
>
> tools/testing/selftests/net/tcp_mmap.c  reaches line rate (100Gbit) on
> a single TCP flow, if using a NIC able to perform header split.
>

That is great work. But isn't there context switches on
getsockopt(TCP_ZEROCOPY_RECEIVE) and read() per packet?

I played with AF_XDP where one core can be isolated and do polling of
umem pool memory and some other core can do softirq processing.
And polling of umem is really fast - about 96ns on 2.5Ghz x86 laptop
and no context switches on umem polling core.

But in general for tcp_mmap.c code if getsockopt()+read() will be
changed to one zero copy call, something like recvmsg_zc() then it can
be LD_PRELOADED.
mmap() can be also moved under socket creation to simplify api. Does
it look reasonable?

> But the model is not to run a legacy application with some LD_PRELOAD
> hack/magic, sorry.
>
More likely that legacy applications will like to use zero copy
networking. Once api will be stable they will support it, especially
if api can be used with minimal changes for apps.
Than it will be quite easy to LD_PRELOAD hack or change application to
use some other IP stack.

Maxim.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ