[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1391788738.10160.53.camel@edumazet-glaptop2.roam.corp.google.com>
Date: Fri, 07 Feb 2014 07:58:58 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: netdev@...r.kernel.org
Subject: Re: nonagle flags for TSQ
On Fri, 2014-02-07 at 07:34 -0800, Eric Dumazet wrote:
> On Fri, 2014-02-07 at 16:08 +0100, John Ogness wrote:
> > Hi,
> >
> > This email is referring to your Linux patch
> > 46d3ceabd8d98ed0ad10f20c595ca784e34786c5.
> >
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=46d3ceabd8d98ed0ad10f20c595ca784e34786c5
> >
> > I have a question about the use of tcp_write_xmit() in
> > net/ipv4/tcp_output.c
> >
> > When tcp_write_xmit() is called, the nonagle flag of the tcp socket is
> > ignored and instead 0 is passed. This causes the Nagle-algorithm to be
> > used even if it should not be, which in some cases causes a large delay.
> >
> > Was there a reason that 0 was hard-coded?
> >
> > Although current mainline code has been refactored, 0 is still
> > hard-coded for TSQ cases.
>
> Hi John
>
> Do you have any data, like exact kernel version you use, tcpdump or
> things like that ?
>
> When the TCP writes are throttled, its only up to the point next packet
> is TX completed, and only if you have at least 128KB worth of bytes
> consumed in the QDISC/NIC layers for this socket.
>
> We had some issues at very high speeds, not related to Nagle at all.
>
> 98e09386c0ef tcp: tsq: restore minimal amount of queueing
> c9eeec26e32e tcp: TSQ can use a dynamic limit
> d6a4a1041176 tcp: GSO should be TSQ friendly
> d01cb20711e3 tcp: add LAST_ACK as a valid state for TSQ
>
> I am not aware of TSQ being a problem for Nagle.
>
> Also take a look at recent TCP autocork patches, as they are more
> related to Nagle
>
> a181ceb501b3 tcp: autocork should not hold first packet in write queue
> f54b311142a9 tcp: auto corking
>
> Thanks
I think I mentioned this once, but the "a181ceb501b3" fix
included this bit :
Also, as TX completion is lockless, it's safer to perform sk_wmem_alloc
test after setting TSQ_THROTTLED.
So its possible you hit the same race, its only a guess...
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 03d26b85eab8..c99a63c6e91a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1904,7 +1904,12 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
if (atomic_read(&sk->sk_wmem_alloc) > limit) {
set_bit(TSQ_THROTTLED, &tp->tsq_flags);
- break;
+ /* It is possible TX completion already happened
+ * before we set TSQ_THROTTLED, so we must
+ * test again the condition.
+ */
+ if (atomic_read(&sk->sk_wmem_alloc) > limit)
+ break;
}
limit = mss_now;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists