lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 03 Apr 2012 23:31:29 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	David Miller <davem@...emloft.net>
Cc:	netdev@...r.kernel.org, ncardwell@...gle.com, therbert@...gle.com,
	ycheng@...gle.com, hkchu@...gle.com, maze@...gle.com,
	maheshb@...gle.com, ilpo.jarvinen@...sinki.fi, nanditad@...gle.com
Subject: Re: [PATCH] tcp: allow splice() to build full TSO packets

On Tue, 2012-04-03 at 17:21 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@...il.com>
> Date: Tue, 03 Apr 2012 21:37:01 +0200
> 
> > vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
> > time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)
> > 
> > The call to tcp_push() at the end of do_tcp_sendpages() forces an
> > immediate xmit when pipe is not already filled, and tso_fragment() try
> > to split these skb to MSS multiples.
> > 
> > 4096 bytes are usually split in a skb with 2 MSS, and a remaining
> > sub-mss skb (assuming MTU=1500)
> 
> Interesting.
> 
> But why doesn't TCP_NAGLE_CORK save us?  That gets passed down into
> the push pending frames logic when MSG_MORE is specified.
> 
> As far as I can tell, the combination of TCP_NAGLE_CORK and the TSO
> deferral logic should do the right thing here.
> 
> Obviously you see different behavior, but why?
> 
> Also, by eliding the tcp_push() call you are introducing other side
> effects:
> 
> 1) we won't do the tcp_mark_push logic
> 
> 2) we don't set the URG seq
> 
> I think #2 can never happen in the vmsplice/splice path, but #1 might
> matter.
> 
> That's why I want to concentrate on why the tcp_push() path doesn't
> behave properly when MSG_MORE is set.

It behaves properly I think, but in the tcp_sendmsg() perspective only.

The code in tcp_sendmsg() and do_tcp_sendpages() is similar (actually
probably copy/pasted) but the thing is tcp_sendmsg() is called once per
sendmsg() call (and the push logic is OK at the end of it), while a
single splice() system call can call do_tcp_sendpages() 16 times (or
even more if pipe buffer was extended by fcntl(F_SETPIPE_SZ))

Maybe a real fix would be to call do_tcp_sendpages() exactly once, but I
tried this today and found needed surgery was complex). Also this would
lock socket for a long period and could add latencies because of backlog
processing.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ