lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181213125553.GA16149@1wt.eu>
Date:   Thu, 13 Dec 2018 13:55:53 +0100
From:   Willy Tarreau <w@....eu>
To:     Marek Majkowski <marek@...udflare.com>
Cc:     netdev@...r.kernel.org
Subject: Re: splice() performance for TCP socket forwarding

Hi Marek,

On Thu, Dec 13, 2018 at 12:25:20PM +0100, Marek Majkowski wrote:
> Hi!
> 
> I'm basically trying to do TCP splicing in Linux. I'm focusing on
> performance of the simplest case: receive data from one TCP socket,
> write data to another TCP socket. I get poor performance with splice.
> 
> First, the naive code, pretty much:
> 
> while(1){
>  n = read(rs, buf);
>  write(ws, buf, n);
> }
> 
> With GRO enabled, this code does roughly line-rate of 10Gbps, hovering
> ~50% of CPU in application (sys mostly).
> 
> When replaced with splice version:
> 
> pipe(pfd);
> fcntl(pfd[0], F_SETPIPE_SZ, 1024 * 1024);
> while(1) {
>  n = splice(rd, NULL, pfd[1], NULL, 1024*1024,
>                        SPLICE_F_MOVE);
>   splice(pfd[0], NULL, wd, NULL, n, SPLICE_F_MOVE);
> }
> 
> Full code:
> https://gist.github.com/majek/c58a97b9be7d9217fe3ebd6c1328faaa#file-proxy-splice-c-L59
> 
> I get 100% cpu (sys) and dramatically worse performance (1.5x slower).

It's quite strange, it doesn't match at all what I'm used to. In haproxy
we're using splicing as well between sockets, and for medium to large
objects we always get much better performance with splicing than without.
3 years ago during a test, we reached 60 Gbps on a 4-core machine using
2 40G NICs, which is not an exceptional sizing. And between processes on
the loopback, numbers around 100G are totally possible. By the way this
is one test you should start with, to verify if the issue is more on the
splice side or on the NIC's side. It might be that your network driver is
totally inefficient when used with GRO/GSO. In my case, multi-10G using
ixgbe and 40G using mlx5 have always shown excellent results.

Regards,
Willy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ