lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 17 May 2012 17:56:21 +0200
From:	Willy Tarreau <w@....eu>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	netdev@...r.kernel.org
Subject: Re: Stable regression with 'tcp: allow splice() to build full TSO packets'

On Thu, May 17, 2012 at 05:43:00PM +0200, Eric Dumazet wrote:
> On Thu, 2012-05-17 at 17:01 +0200, Willy Tarreau wrote:
> > Hi Eric,
> > 
> > On Thu, May 17, 2012 at 02:18:00PM +0200, Willy Tarreau wrote:
> > > Hi Eric,
> > > 
> > > I'm facing a regression in stable 3.2.17 and 3.0.31 which is
> > > exhibited by your patch 'tcp: allow splice() to build full TSO
> > > packets' which unfortunately I am very interested in !
> > > 
> > > What I'm observing is that TCP transmits using splice() stall
> > > quite quickly if I'm using pipes larger than 64kB (even 65537
> > > is enough to reliably observe the stall).
> > (...)
> > 
> > I managed to fix the issue and I really think that the fix makes sense.
> > I'm appending the patch, please could you review it and if approved,
> > push it for inclusion ?
> > 
> > BTW, your patch significantly improves performance here. On this
> > machine I was reaching max 515 Mbps of proxied traffic, and with
> > the patch I reach 665 Mbps with the same test ! I think you managed
> > to fix what caused splice() to always be slightly slower than
> > recv/send() for a long time in my tests with a number of NICs !
> > 
> 
> What NIC do you use exactly ?

It's the NIC included in the system-on-chip (Marvell 88F6281), and the NIC
driver is mv643xx. It's the same hardware you find in sheevaplugs, guruplugs,
dreamplugs and almost all ARM-based cheap NAS boxes.

> With commit 1d0c0b328a6 in net-next
> (net: makes skb_splice_bits() aware of skb->head_frag)
> You'll be able to get even more speed, if NIC uses frag to hold frame.

I'm going to check this now, sounds interesting :-)

The NIC does not support TSO but I've seen an alternate driver for this
NIC which pretends to do TSO and in fact builds header frags so that the
NIC is able to send all frames at once. I think it's already what GSO is
doing but I'm wondering whether it would be possible to get more speed
by doing this than by relying on GSO to (possibly) split the frags earlier.

Note that I'm not quite skilled on this, so do not hesitate to correct me
if I'm saying stupid things.

(...)

> > Commit 2f533844242 (tcp: allow splice() to build full TSO packets)
> > significantly improved splice() performance for some workloads but
> > caused stalls when pipe buffers were larger than socket buffers.
> 
> But... This seems bug was already there ? Only having more data in pipe
> trigger it more often ?

I honnestly don't know. As I said in the initial mail, I suspect this is
the case because I don't see in your patch what could cause the bug, but
I suspect it makes it more frequent. Still, I could not manage to reproduce
the bug without your patch. So maybe we switched from just below a limit
to just over.

> > The issue seems to happen when no data can be copied at all due to
> > lack of buffers, which results in pending data never being pushed.
> > 
> > This change checks if all pending data has been pushed or not and
> > pushes them when waiting for send buffers.
> > ---
> >  net/ipv4/tcp.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 80b988f..f6d9e5f 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -850,7 +850,7 @@ new_segment:
> >  wait_for_sndbuf:
> >  		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> >  wait_for_memory:
> > -		if (copied)
> > +		if (tcp_sk(sk)->pushed_seq != tcp_sk(sk)->write_seq)
> >  			tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
> >  
> >  		if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
> 
> Can you check you have commit 35f9c09fe9c72eb8ca2b8e89a593e1c151f28fc2 
> ( tcp: tcp_sendpages() should call tcp_push() once )

Yes I have it, in fact I'm on -stable (3.0.31) which includes your backport
of the two patches in a single one. It's just a few lines below, out of the
context of this patch.

Best regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ