netdev - Re: [PATCH net-next] tcp: TCP_NOSENT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1374538422.4990.99.camel@edumazet-glaptop>
Date:	Mon, 22 Jul 2013 17:13:42 -0700
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Rick Jones <rick.jones2@...com>
Cc:	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>,
	Yuchung Cheng <ycheng@...gle.com>,
	Neal Cardwell <ncardwell@...gle.com>,
	Michael Kerrisk <mtk.manpages@...il.com>
Subject: Re: [PATCH net-next] tcp: TCP_NOSENT_LOWAT socket option

On Mon, 2013-07-22 at 16:08 -0700, Rick Jones wrote:
> On 07/22/2013 03:44 PM, Eric Dumazet wrote:
> > Hi Rick
> >
> >> Netperf is perhaps a "best case" for this as it has no think time and
> >> will not itself build-up a queue of data internally.
> >>
> >> The 18% increase in service demand is troubling.
> >
> > Its not troubling at such high speed. (Note also I had better throughput
> > in my (single) test)
> 
> Yes, you did, but that was only 5.4%, and it may be in an area where 
> there is non-trivial run to run variation.
> 
> I would think an increase in service demand is even more troubling at 
> high speeds than low speeds.  Particularly when I'm still not at link-rate.
> 

If I wanter link-rate, I would use TCP_SENDFILE, and unfortunately be
slowed down by the receiver ;)

> In theory anyway, the service demand is independent of the transfer 
> rate.  Of course, practice dictates that different algorithms have 
> different behaviours at different speeds, but in slightly sweeping 
> handwaving, if the service demand went up 18% that cut your maximum 
> aggregate throughput for the "infinitely fast link" or collection of 
> finitely fast links in the system by 18%.
> 
> I suppose that brings up the question of what the aggregate throughput 
> and CPU utilization was for your 200 concurrent netperf TCP_STREAM sessions.

I am not sure I want to add 1000 lines in the changelog with a detailed
netperf results. Even so, they would be meaningful for my lab machines.


> 
> > Process scheduler cost is abysmal (Or more exactly when cpu enters idle
> > mode I presume).
> >
> > Adding a context switch for every TSO packet is obviously not something
> > you want if you want to pump 20Gbps on a single tcp socket.
> 
> You wouldn't want it if you were pumping 20 Gbit/s down multiple TCP 
> sockets either I'd think.

No difference as a matter of fact, as each netperf _will_ schedule
anyway, as a queue builds in Qdisc layer.



> 
> > I guess that real application would not use 16KB send()s either.
> 
> You can use a larger send in netperf - the 16 KB is only because that is 
> the default initial SO_SNDBUF size under Linux :)
> 
> > I chose extreme parameters to show that the patch had acceptable impact.
> > (128KB are only 2 TSO packets)
> >
> > The main targets of this patch are servers handling hundred to million
> > of sockets, or any machine with RAM constraints. This would also permit
> > better autotuning in the future. Our current 4MB limit is a bit small in
> > some cases.
> >
> > Allowing the socket write queue to queue more bytes is better for
> > throughput/cpu cycles, as long as you have enough RAM.
> 
> So, netperf doesn't queue internally - what happens when the application 
> does queue internally?  Admittedly, it will be user-space memory (I 
> assume) rather than kernel memory, which I suppose is better since it 
> can be paged and whatnot.  But if we drop the qualifiers, it is still 
> the same quantity of memory overall right?
> 
> By the way, does this affect sendfile() or splice()?

Sure : Patch intercepts sk_stream_memory_free() for all its callers.

10Gb link 'experiment with sendfile()' :

lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 10.246.17.84 -l 20 -t TCP_SENDFILE -c
TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.17.84 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

 87380  16384  16384    20.00      9372.56   1.69     -1.00    0.355   -1.000 

 Performance counter stats for './netperf -H 10.246.17.84 -l 20 -t TCP_SENDFILE -c':

            16,188 context-switches                                            

      20.006998098 seconds time elapsed

lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 10.246.17.84 -l 20 -t TCP_SENDFILE -c
TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.17.84 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

 87380  16384  16384    20.00      9408.33   1.75     -1.00    0.366   -1.000 

 Performance counter stats for './netperf -H 10.246.17.84 -l 20 -t TCP_SENDFILE -c':

           714,395 context-switches                                            

      20.004409659 seconds time elapsed




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html