netdev - Re: TCP socket send return EAGAIN unexpectedly when sending small fragments

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMJ=MEcOdB17WhYA6LNcoOReD0+8o3xs38qde2_FRhEfK1MqrQ@mail.gmail.com>
Date:   Fri, 10 Jun 2022 20:14:21 +0200
From:   Ronny Meeus <ronny.meeus@...il.com>
To:     Willy Tarreau <w@....eu>
Cc:     David Laight <David.Laight@...lab.com>,
        Eric Dumazet <erdnetdev@...il.com>,
        netdev <netdev@...r.kernel.org>
Subject: Re: TCP socket send return EAGAIN unexpectedly when sending small fragments

Op vr 10 jun. 2022 om 19:42 schreef Willy Tarreau <w@....eu>:
>
> On Fri, Jun 10, 2022 at 07:16:06PM +0200, Ronny Meeus wrote:
> > Op vr 10 jun. 2022 om 17:21 schreef David Laight <David.Laight@...lab.com>:
> > >
> > > ...
> > > > If the 5 queued packets on the sending side would cause the EAGAIN
> > > > issue, the real question maybe is why the receiving side is not
> > > > sending the ACK within the 10ms while for earlier messages the ACK is
> > > > sent much sooner.
> > >
> > > Have you disabled Nagle (TCP_NODELAY) ?
> >
> > Yes I enabled TCP_NODELAY so the Nagle algo is disabled.
> > I did a lot of tests over the last couple of days but if I remember well
> > enable or disable TCP_NODELAY does not influence the result.
>
> There are many possible causes for what you're observing. For example
> if your NIC has too small a tx ring and small buffers, you can imagine
> that the Nx106 bytes fit in the buffers but not the N*107, which cause
> a tiny delay waiting for the Tx IRQ to recycle the buffers, and that
> during this time your subsequent send() are coalesced into larger
> segments that are sent at once when using 107.
>

The test is running over the loopback interface ...

> If you do not want packets to be sent individually and you know you
> still have more to come, you need to put MSG_MORE on the send() flags
> (or to disable TCP_NODELAY).

Like I said, TCP_NODELAY does not have an impact.

> Clearly, when running with TCP_NODELAY you're asking the whole stack
> "do your best to send as fast as possible", which implies "without any
> consideration for efficiency optimization". I've seen a situation in the
> past where it was impossible to send any extra segment after a first
> unacked PUSH was in flight. Simply sending full segments was enough to
> considerably increase the performance. I analysed this as a result of
> the SWS avoidance algorithm and concluded that it was normal in that
> situation, though I've not witnessed it anymore in a while.
>
> So just keep in mind to try not to abuse TCP_NODELAY too much.

Thanks Willy for the feedback.

>
> Willy