lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 13 Dec 2018 15:04:11 +0100
From:   Willy Tarreau <w@....eu>
To:     Marek Majkowski <marek@...udflare.com>
Cc:     eric.dumazet@...il.com, netdev@...r.kernel.org
Subject: Re: splice() performance for TCP socket forwarding

On Thu, Dec 13, 2018 at 02:17:20PM +0100, Marek Majkowski wrote:
> > splice code will be expensive if less than 1MB is present in receive queue.
> 
> I'm not sure what you are suggesting. I'm just shuffling data between
> two sockets. Is there a better buffer size value? Is it possible to
> keep splice() blocked until it succeeds to forward N bytes of data? (I
> tried this unsuccessfully with SO_RCVLOWAT).

I've personally observed performance decrease once the pipe is configured
larger than 512 kB. I think that it's related to the fact that you're
moving 256 pages around on each call and that it might even start to
have some effect on L1 caches when touch lots of data, though that could
be completely unrelated.

> Here is a snippet from strace:
> 
> splice(4, NULL, 11, NULL, 1048576, 0) = 373760 <0.000048>
> splice(10, NULL, 5, NULL, 373760, 0) = 373760 <0.000108>
> splice(4, NULL, 11, NULL, 1048576, 0) = 335800 <0.000065>
> splice(10, NULL, 5, NULL, 335800, 0) = 335800 <0.000202>
> splice(4, NULL, 11, NULL, 1048576, 0) = 227760 <0.000029>
> splice(10, NULL, 5, NULL, 227760, 0) = 227760 <0.000106>
> splice(4, NULL, 11, NULL, 1048576, 0) = 16060 <0.000019>
> splice(10, NULL, 5, NULL, 16060, 0) = 16060 <0.000028>
> splice(4, NULL, 11, NULL, 1048576, 0) = 7300 <0.000013>
> splice(10, NULL, 5, NULL, 7300, 0) = 7300 <0.000021>

I think your driver is returning one segment per page. Let's do some
rough maths : assuming you're having an MSS of 1448 (timestamps enabled),
you'll retrieve 256*1448 = 370688 at once per call, which closely matches
what you're seeing. Hmmm checking closer, you're in fact exactly running
at 1460 (256*1460=373760) so you have timestamps disabled. Your numbers
seem normal to me (just the CPU usage doesn't, but maybe it improves
when using a smaller pipe).

Willy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ