netdev - Re: TCP socket send return EAGAIN unexpectedly when sending small fragments

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f0f30591-f503-ae7c-9293-35cca4ceec84@gmail.com>
Date:   Fri, 10 Jun 2022 07:21:33 -0700
From:   Eric Dumazet <erdnetdev@...il.com>
To:     Ronny Meeus <ronny.meeus@...il.com>,
        netdev <netdev@...r.kernel.org>
Subject: Re: TCP socket send return EAGAIN unexpectedly when sending small
 fragments


On 6/10/22 05:48, Ronny Meeus wrote:
> Hello
>
> I have a small test application written in C that creates a local (in
> process) TCP channel via loopback.
> (kernel version is 3.10.0 but the same issue is also seen for example
> on a 4.9 kernel).
>
> On the client side, an SO_SNDBUF of 5Kb is configured (which is
> doubled by the kernel) while on the server side the default size is
> used.
> Both client and server side are running in non-blocking mode.
>
> The server side is not reading its socket so all data is simply queued
> in the socket's receive buffer.
> On the client side, the client is writing data into the socket in a
> loop until the write returns an error (EAGAIN in this case).
>
> Depending on the size of data I send on this socket, I get the EAGAIN
> on the client side already after a small number of messages.
> For example when sending 106 byte messages, I see the first EAGAIN
> after 21 write calls:
> # ./tcp_client_server 106
> using data size 106
> client send buffer size 5000
> client socket snd_buf: 10000
> ERRNO = 11 count=21
>
> The same is observed for all sizes smaller than or equal to 107 bytes.
>
> When getting the socket stats using ss I see all data (2226b) pending
> in the socket on the server side:
> # ss -tmi  | grep -i X11 -A 1
> ESTAB      0      0      127.0.0.1:59792                127.0.0.1:x11
> skmem:(r0,rb1061296,t0,tb10000,f0,w0,o0,bl0,d0) cubic wscale:7,7
> rto:202 rtt:1.276/2.504 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> bytes_acked:2227 segs_out:24 segs_in:18 send 1372.3Mbps lastsnd:3883
> lastrcv:771546999 lastack:3883 pacing_rate 2744.3Mbps retrans:0/1
> rcv_space:43690
> --
> ESTAB      2226   0      127.0.0.1:x11
> 127.0.0.1:59792
> skmem:(r4608,rb1061488,t0,tb2626560,f3584,w0,o0,bl0,d1) cubic
> wscale:7,7 rto:200 ato:40 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> bytes_received:2226 segs_out:17 segs_in:24 lastsnd:3893 lastrcv:3893
> lastack:3883 rcv_space:43690
>
>
> When sending larger messages, the EAGAIN only is seen after 2116 writes.
> # ./tcp_client_server 108
> using data size 108
> client send buffer size 5000
> client socket snd_buf: 10000
> ERRNO = 11 count=2116
>
> Again, the ss shows all data being present on the server side (108 *
> 2116 = 228528)
> ESTAB      228528 0      127.0.0.1:x11
> 127.0.0.1:59830
> skmem:(r976896,rb1061488,t0,tb2626560,f2048,w0,o0,bl0,d1) cubic
> wscale:7,7 rto:200 ato:80 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> bytes_received:228528 segs_out:436 segs_in:2119 lastsnd:3615
> lastrcv:3606 lastack:3596 rcv_rtt:1 rcv_space:43690
> --
> ESTAB      0      0      127.0.0.1:59830                127.0.0.1:x11
> skmem:(r0,rb1061296,t0,tb10000,f0,w0,o0,bl0,d0) cubic wscale:7,7
> rto:206 rtt:5.016/9.996 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> bytes_acked:228529 segs_out:2119 segs_in:437 send 349.1Mbps
> lastsnd:3596 lastrcv:771704718 lastack:3566 pacing_rate 698.1Mbps
> retrans:0/1 rcv_space:43690
>
>
>
> When I enlarge the SNDBUF on the client side to for example 10K, I see
> that more messages can be sent:
> # ./tcp_client_server 106 10000
> using data size 106
> client send buffer size 10000
> client socket snd_buf: 20000
> ERRNO = 11 count=1291
>
> I also captured the packets on the interface using wireshark and it
> looks like the error is returned after the TCP layer has done a
> retransmit (after 10ms) of the last packet sent on the connection.
>
> 10:12:38.186451 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1165:1265, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186461 IP localhost.etlservicemgr > localhost.48470: Flags
> [.], ack 1265, win 342, options [nop,nop,TS val 593860562 ecr
> 593860562], length 0
> 10:12:38.186478 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1265:1365, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186488 IP localhost.etlservicemgr > localhost.48470: Flags
> [.], ack 1365, win 342, options [nop,nop,TS val 593860562 ecr
> 593860562], length 0
> 10:12:38.186505 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1365:1465, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186516 IP localhost.etlservicemgr > localhost.48470: Flags
> [.], ack 1465, win 342, options [nop,nop,TS val 593860562 ecr
> 593860562], length 0
> 10:12:38.186533 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1465:1565, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186543 IP localhost.etlservicemgr > localhost.48470: Flags
> [.], ack 1565, win 342, options [nop,nop,TS val 593860562 ecr
> 593860562], length 0
> 10:12:38.186560 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1565:1665, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186578 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1665:1765, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186595 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1765:1865, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186615 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1865:1965, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186632 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1965:2065, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.196064 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1965:2065, ack 165, win 342, options [nop,nop,TS val
> 593860572 ecr 593860562], length 100
> 10:12:38.196128 IP localhost.etlservicemgr > localhost.48470: Flags
> [.], ack 2065, win 342, options [nop,nop,TS val 593860572 ecr
> 593860562,nop,nop,sack 1 {1965:2065}], length 0
>
> Now my question is: how is it possible that I can only send 21
> messages of 106 bytes before I see the EAGAIN while when sending
> larger messages I can send 2K+ messages.


This is because kernel tracks kernel memory usage.

Small packets incur more overhead.

The doubling of SO_SNDBUF user value, is an heuristic based on the fact 
the kernel

uses a 2x overhead estimation,

while effective overhead can vary from ~1.001x to ~768x depending on 
number of bytes per skb.