[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1337287279.3403.44.camel@edumazet-glaptop>
Date: Thu, 17 May 2012 22:41:19 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Willy Tarreau <w@....eu>
Cc: netdev@...r.kernel.org
Subject: Re: Stable regression with 'tcp: allow splice() to build full TSO
packets'
On Thu, 2012-05-17 at 14:18 +0200, Willy Tarreau wrote:
> Hi Eric,
>
> I'm facing a regression in stable 3.2.17 and 3.0.31 which is
> exhibited by your patch 'tcp: allow splice() to build full TSO
> packets' which unfortunately I am very interested in !
>
> What I'm observing is that TCP transmits using splice() stall
> quite quickly if I'm using pipes larger than 64kB (even 65537
> is enough to reliably observe the stall).
>
> I'm seeing this on haproxy running on a small ARM machine (a
> dockstar), which exchanges data through a gig switch with my
> development PC. The NIC (mv643xx) doesn't support TSO but has
> GSO enabled. If I disable GSO, the problem remains. I can however
> make the problem disappear by disabling SG or Tx checksumming.
> BTW, using recv/send() instead of splice() also gets rid of the
> problem.
>
> I can also reduce the risk of seeing the problem by increasing
> the default TCP buffer sizes in tcp_wmem. By default I'm running
> at 16kB, but if I increase the output buffer size above the pipe
> size, the problem *seems* to disappear though I can't be certain,
> since larger buffers generally means the problem takes longer to
> appear, probably due to the fact that the buffers don't need to
> be filled. Still I'm certain that with 64k TCP buffers and 128k
> pipes I'm still seeing it.
>
> With strace, I'm seeing data fill up the pipe with the splice()
> call responsible for pushing the data to the output socket returing
> -1 EAGAIN. During this time, the client receives no data.
>
> Something bugs me, I have tested with a dummy server of mine,
> httpterm, which uses tee+splice() to push data outside, and it
> has no problem filling the gig pipe, and correctly recoverers
> from the EAGAIN :
>
> send(13, "HTTP/1.1 200\r\nConnection: close\r"..., 160, MSG_DONTWAIT|MSG_NOSIGNAL) = 160
> tee(0x3, 0x6, 0x10000, 0x2) = 42552
> splice(0x5, 0, 0xd, 0, 0xa00000, 0x2) = 14440
> tee(0x3, 0x6, 0x10000, 0x2) = 13880
> splice(0x5, 0, 0xd, 0, 0x9fc798, 0x2) = -1 EAGAIN (Resource temporarily unavailable)
> ...
> tee(0x3, 0x6, 0x10000, 0x2) = 13880
> splice(0x5, 0, 0xd, 0, 0x9fc798, 0x2) = 51100
> tee(0x3, 0x6, 0x10000, 0x2) = 50744
> splice(0x5, 0, 0xd, 0, 0x9efffc, 0x2) = 32120
> tee(0x3, 0x6, 0x10000, 0x2) = 30264
> splice(0x5, 0, 0xd, 0, 0x9e8284, 0x2) = -1 EAGAIN (Resource temporarily unavailable)
>
> etc...
>
> It's only with haproxy which uses splice() to copy data between
> two sockets that I'm getting the issue (data forwarded from fd 0xe
> to fd 0x6) :
>
> 16:03:17.797144 pipe([36, 37]) = 0
> 16:03:17.797318 fcntl64(36, 0x407 /* F_??? */, 0x20000) = 131072 ## note: fcntl(F_SETPIPE_SZ, 128k)
> 16:03:17.797473 splice(0xe, 0, 0x25, 0, 0x9f2234, 0x3) = 10220
> 16:03:17.797706 splice(0x24, 0, 0x6, 0, 0x27ec, 0x3) = 10220
> 16:03:17.802036 gettimeofday({1324652597, 802093}, NULL) = 0
> 16:03:17.802200 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 7
> 16:03:17.802363 gettimeofday({1324652597, 802419}, NULL) = 0
> 16:03:17.802530 splice(0xe, 0, 0x25, 0, 0x9efa48, 0x3) = 16060
> 16:03:17.802789 splice(0x24, 0, 0x6, 0, 0x3ebc, 0x3) = 16060
> 16:03:17.806593 gettimeofday({1324652597, 806651}, NULL) = 0
> 16:03:17.806759 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 4
> 16:03:17.806919 gettimeofday({1324652597, 806974}, NULL) = 0
> 16:03:17.807087 splice(0xe, 0, 0x25, 0, 0x9ebb8c, 0x3) = 17520
> 16:03:17.807356 splice(0x24, 0, 0x6, 0, 0x4470, 0x3) = 17520
> 16:03:17.809565 gettimeofday({1324652597, 809620}, NULL) = 0
> 16:03:17.809726 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 1
> 16:03:17.809883 gettimeofday({1324652597, 809937}, NULL) = 0
> 16:03:17.810047 splice(0xe, 0, 0x25, 0, 0x9e771c, 0x3) = 36500
> 16:03:17.810399 splice(0x24, 0, 0x6, 0, 0x8e94, 0x3) = 23360
> 16:03:17.810629 epoll_ctl(0x3, 0x1, 0x6, 0x85378) = 0 ## note: epoll_ctl(ADD, fd=6, dir=OUT).
> 16:03:17.810792 gettimeofday({1324652597, 810848}, NULL) = 0
> 16:03:17.810954 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 1
> 16:03:17.811188 gettimeofday({1324652597, 811246}, NULL) = 0
> 16:03:17.811356 splice(0xe, 0, 0x25, 0, 0x9de888, 0x3) = 21900
> 16:03:17.811651 splice(0x24, 0, 0x6, 0, 0x88e0, 0x3) = -1 EAGAIN (Resource temporarily unavailable)
>
Willy you say output to fd 6 hangs, but splice() returns EAGAIN here ?
(because socket buffer is full)
> So output fd 6 hangs here and will not appear anymore until
> here where I pressed Ctrl-C to stop the test :
>
I just want to make sure its not a userland error that triggers now much
faster than with prior kernels.
You drain bytes from fd 0xe to pipe buffers, but I dont see you check
write ability on destination socket prior the splice(pipe -> socket)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists