lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090109220737.GA4111@1wt.eu>
Date:	Fri, 9 Jan 2009 23:07:38 +0100
From:	Willy Tarreau <w@....eu>
To:	Eric Dumazet <dada1@...mosbay.com>
Cc:	David Miller <davem@...emloft.net>, ben@...s.com,
	jarkao2@...il.com, mingo@...e.hu, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org, jens.axboe@...cle.com
Subject: Re: [PATCH] tcp: splice as many packets as possible at once

On Fri, Jan 09, 2009 at 10:24:00PM +0100, Willy Tarreau wrote:
> On Fri, Jan 09, 2009 at 09:51:17PM +0100, Eric Dumazet wrote:
> (...)
> > > Also, in your second mail, you're saying that your change
> > > might return more data than requested by the user. I can't
> > > find why, could you please explain to me, as I'm still quite
> > > ignorant in this area ?
> > 
> > Well, I just tested various user programs and indeed got this
> > strange result :
> > 
> > Here I call splice() with len=1000 (0x3e8), and you can see
> > it gives a result of 1460 at the second call.

OK finally I could reproduce it and found why we have this. It's
expected in fact.

The problem when we loop in tcp_read_sock() is that tss->len is
not decremented by the amount of bytes read, this one is done
only in tcp_splice_read() which is outer.

The solution I found was to do just like other callers, which means
use desc->count to keep the remaining number of bytes we want to
read. In fact, tcp_read_sock() is designed to use that one as a stop
condition, which explains why you first had to hide it.

Now with the attached patch as a replacement for my previous one,
both issues are solved :
  - I splice 1000 bytes if I ask to do so
  - I splice as much as possible if available (typically 23 kB).

My observed performances are still at the top of earlier results
and IMHO that way of counting bytes makes sense for an actor called
from tcp_read_sock().

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 35bcddf..51ff3aa 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -522,8 +522,12 @@ static int tcp_splice_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
 				unsigned int offset, size_t len)
 {
 	struct tcp_splice_state *tss = rd_desc->arg.data;
+	int ret;
 
-	return skb_splice_bits(skb, offset, tss->pipe, tss->len, tss->flags);
+	ret = skb_splice_bits(skb, offset, tss->pipe, rd_desc->count, tss->flags);
+	if (ret > 0)
+		rd_desc->count -= ret;
+	return ret;
 }
 
 static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
@@ -531,6 +535,7 @@ static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
 	/* Store TCP splice context information in read_descriptor_t. */
 	read_descriptor_t rd_desc = {
 		.arg.data = tss,
+		.count = tss->len,
 	};
 
 	return tcp_read_sock(sk, &rd_desc, tcp_splice_data_recv);

Regards,
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ