[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1333488689.18626.331.camel@edumazet-glaptop>
Date: Tue, 03 Apr 2012 23:31:29 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: David Miller <davem@...emloft.net>
Cc: netdev@...r.kernel.org, ncardwell@...gle.com, therbert@...gle.com,
ycheng@...gle.com, hkchu@...gle.com, maze@...gle.com,
maheshb@...gle.com, ilpo.jarvinen@...sinki.fi, nanditad@...gle.com
Subject: Re: [PATCH] tcp: allow splice() to build full TSO packets
On Tue, 2012-04-03 at 17:21 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@...il.com>
> Date: Tue, 03 Apr 2012 21:37:01 +0200
>
> > vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
> > time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)
> >
> > The call to tcp_push() at the end of do_tcp_sendpages() forces an
> > immediate xmit when pipe is not already filled, and tso_fragment() try
> > to split these skb to MSS multiples.
> >
> > 4096 bytes are usually split in a skb with 2 MSS, and a remaining
> > sub-mss skb (assuming MTU=1500)
>
> Interesting.
>
> But why doesn't TCP_NAGLE_CORK save us? That gets passed down into
> the push pending frames logic when MSG_MORE is specified.
>
> As far as I can tell, the combination of TCP_NAGLE_CORK and the TSO
> deferral logic should do the right thing here.
>
> Obviously you see different behavior, but why?
>
> Also, by eliding the tcp_push() call you are introducing other side
> effects:
>
> 1) we won't do the tcp_mark_push logic
>
> 2) we don't set the URG seq
>
> I think #2 can never happen in the vmsplice/splice path, but #1 might
> matter.
>
> That's why I want to concentrate on why the tcp_push() path doesn't
> behave properly when MSG_MORE is set.
It behaves properly I think, but in the tcp_sendmsg() perspective only.
The code in tcp_sendmsg() and do_tcp_sendpages() is similar (actually
probably copy/pasted) but the thing is tcp_sendmsg() is called once per
sendmsg() call (and the push logic is OK at the end of it), while a
single splice() system call can call do_tcp_sendpages() 16 times (or
even more if pipe buffer was extended by fcntl(F_SETPIPE_SZ))
Maybe a real fix would be to call do_tcp_sendpages() exactly once, but I
tried this today and found needed surgery was complex). Also this would
lock socket for a long period and could add latencies because of backlog
processing.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists