lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090109221744.GA4819@1wt.eu>
Date:	Fri, 9 Jan 2009 23:17:44 +0100
From:	Willy Tarreau <w@....eu>
To:	Eric Dumazet <dada1@...mosbay.com>
Cc:	David Miller <davem@...emloft.net>, ben@...s.com,
	jarkao2@...il.com, mingo@...e.hu, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org, jens.axboe@...cle.com
Subject: Re: [PATCH] tcp: splice as many packets as possible at once

On Fri, Jan 09, 2009 at 11:12:09PM +0100, Eric Dumazet wrote:
> Willy Tarreau a écrit :
> > On Fri, Jan 09, 2009 at 10:24:00PM +0100, Willy Tarreau wrote:
> >> On Fri, Jan 09, 2009 at 09:51:17PM +0100, Eric Dumazet wrote:
> >> (...)
> >>>> Also, in your second mail, you're saying that your change
> >>>> might return more data than requested by the user. I can't
> >>>> find why, could you please explain to me, as I'm still quite
> >>>> ignorant in this area ?
> >>> Well, I just tested various user programs and indeed got this
> >>> strange result :
> >>>
> >>> Here I call splice() with len=1000 (0x3e8), and you can see
> >>> it gives a result of 1460 at the second call.
> > 
> > OK finally I could reproduce it and found why we have this. It's
> > expected in fact.
> > 
> > The problem when we loop in tcp_read_sock() is that tss->len is
> > not decremented by the amount of bytes read, this one is done
> > only in tcp_splice_read() which is outer.
> > 
> > The solution I found was to do just like other callers, which means
> > use desc->count to keep the remaining number of bytes we want to
> > read. In fact, tcp_read_sock() is designed to use that one as a stop
> > condition, which explains why you first had to hide it.
> > 
> > Now with the attached patch as a replacement for my previous one,
> > both issues are solved :
> >   - I splice 1000 bytes if I ask to do so
> >   - I splice as much as possible if available (typically 23 kB).
> > 
> > My observed performances are still at the top of earlier results
> > and IMHO that way of counting bytes makes sense for an actor called
> > from tcp_read_sock().
> > 
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 35bcddf..51ff3aa 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -522,8 +522,12 @@ static int tcp_splice_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
> >  				unsigned int offset, size_t len)
> >  {
> >  	struct tcp_splice_state *tss = rd_desc->arg.data;
> > +	int ret;
> >  
> > -	return skb_splice_bits(skb, offset, tss->pipe, tss->len, tss->flags);
> > +	ret = skb_splice_bits(skb, offset, tss->pipe, rd_desc->count, tss->flags);
> > +	if (ret > 0)
> > +		rd_desc->count -= ret;
> > +	return ret;
> >  }
> >  
> >  static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
> > @@ -531,6 +535,7 @@ static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
> >  	/* Store TCP splice context information in read_descriptor_t. */
> >  	read_descriptor_t rd_desc = {
> >  		.arg.data = tss,
> > +		.count = tss->len,
> >  	};
> >  
> >  	return tcp_read_sock(sk, &rd_desc, tcp_splice_data_recv);
> > 
> 
> OK, I came to a different patch. Please check other tcp_read_sock() callers in tree :)

I've seen the other callers, but they all use desc->count for their own
purpose. That's how I understood what it was used for :-)

I think it's better not to change the API here and use tcp_read_sock()
how it's supposed to be used. Also, the less parameters to the function,
the better.

However I'm OK for the !timeo before release_sock/lock_sock. I just
don't know if we can put the rest of the if above or not. I don't
know what changes we're supposed to collect by doing release_sock/
lock_sock before the if().

Regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ