netdev - Re: TCP socket send return EAGAIN unexpectedly when sending small fragments

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220610174201.GC19540@1wt.eu>
Date:   Fri, 10 Jun 2022 19:42:01 +0200
From:   Willy Tarreau <w@....eu>
To:     Ronny Meeus <ronny.meeus@...il.com>
Cc:     David Laight <David.Laight@...lab.com>,
        Eric Dumazet <erdnetdev@...il.com>,
        netdev <netdev@...r.kernel.org>
Subject: Re: TCP socket send return EAGAIN unexpectedly when sending small
 fragments

On Fri, Jun 10, 2022 at 07:16:06PM +0200, Ronny Meeus wrote:
> Op vr 10 jun. 2022 om 17:21 schreef David Laight <David.Laight@...lab.com>:
> >
> > ...
> > > If the 5 queued packets on the sending side would cause the EAGAIN
> > > issue, the real question maybe is why the receiving side is not
> > > sending the ACK within the 10ms while for earlier messages the ACK is
> > > sent much sooner.
> >
> > Have you disabled Nagle (TCP_NODELAY) ?
> 
> Yes I enabled TCP_NODELAY so the Nagle algo is disabled.
> I did a lot of tests over the last couple of days but if I remember well
> enable or disable TCP_NODELAY does not influence the result.

There are many possible causes for what you're observing. For example
if your NIC has too small a tx ring and small buffers, you can imagine
that the Nx106 bytes fit in the buffers but not the N*107, which cause
a tiny delay waiting for the Tx IRQ to recycle the buffers, and that
during this time your subsequent send() are coalesced into larger
segments that are sent at once when using 107.

If you do not want packets to be sent individually and you know you
still have more to come, you need to put MSG_MORE on the send() flags
(or to disable TCP_NODELAY).

Clearly, when running with TCP_NODELAY you're asking the whole stack
"do your best to send as fast as possible", which implies "without any
consideration for efficiency optimization". I've seen a situation in the
past where it was impossible to send any extra segment after a first
unacked PUSH was in flight. Simply sending full segments was enough to
considerably increase the performance. I analysed this as a result of
the SWS avoidance algorithm and concluded that it was normal in that
situation, though I've not witnessed it anymore in a while.

So just keep in mind to try not to abuse TCP_NODELAY too much.

Willy