linux-kernel - Re: [PATCH] tcp: splice as many packets as possible at once

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4967D36E.6020207@cosmosbay.com>
Date:	Fri, 09 Jan 2009 23:45:02 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Willy Tarreau <w@....eu>
CC:	David Miller <davem@...emloft.net>, ben@...s.com,
	jarkao2@...il.com, mingo@...e.hu, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org, jens.axboe@...cle.com
Subject: Re: [PATCH] tcp: splice as many packets as possible at once

Willy Tarreau a écrit :
> On Fri, Jan 09, 2009 at 11:12:09PM +0100, Eric Dumazet wrote:
>> Willy Tarreau a écrit :
>>> On Fri, Jan 09, 2009 at 10:24:00PM +0100, Willy Tarreau wrote:
>>>> On Fri, Jan 09, 2009 at 09:51:17PM +0100, Eric Dumazet wrote:
>>>> (...)
>>>>>> Also, in your second mail, you're saying that your change
>>>>>> might return more data than requested by the user. I can't
>>>>>> find why, could you please explain to me, as I'm still quite
>>>>>> ignorant in this area ?
>>>>> Well, I just tested various user programs and indeed got this
>>>>> strange result :
>>>>>
>>>>> Here I call splice() with len=1000 (0x3e8), and you can see
>>>>> it gives a result of 1460 at the second call.
>>> OK finally I could reproduce it and found why we have this. It's
>>> expected in fact.
>>>
>>> The problem when we loop in tcp_read_sock() is that tss->len is
>>> not decremented by the amount of bytes read, this one is done
>>> only in tcp_splice_read() which is outer.
>>>
>>> The solution I found was to do just like other callers, which means
>>> use desc->count to keep the remaining number of bytes we want to
>>> read. In fact, tcp_read_sock() is designed to use that one as a stop
>>> condition, which explains why you first had to hide it.
>>>
>>> Now with the attached patch as a replacement for my previous one,
>>> both issues are solved :
>>>   - I splice 1000 bytes if I ask to do so
>>>   - I splice as much as possible if available (typically 23 kB).
>>>
>>> My observed performances are still at the top of earlier results
>>> and IMHO that way of counting bytes makes sense for an actor called
>>> from tcp_read_sock().
>>>
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index 35bcddf..51ff3aa 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -522,8 +522,12 @@ static int tcp_splice_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
>>>  				unsigned int offset, size_t len)
>>>  {
>>>  	struct tcp_splice_state *tss = rd_desc->arg.data;
>>> +	int ret;
>>>  
>>> -	return skb_splice_bits(skb, offset, tss->pipe, tss->len, tss->flags);
>>> +	ret = skb_splice_bits(skb, offset, tss->pipe, rd_desc->count, tss->flags);
>>> +	if (ret > 0)
>>> +		rd_desc->count -= ret;
>>> +	return ret;
>>>  }
>>>  
>>>  static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
>>> @@ -531,6 +535,7 @@ static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
>>>  	/* Store TCP splice context information in read_descriptor_t. */
>>>  	read_descriptor_t rd_desc = {
>>>  		.arg.data = tss,
>>> +		.count = tss->len,
>>>  	};
>>>  
>>>  	return tcp_read_sock(sk, &rd_desc, tcp_splice_data_recv);
>>>
>> OK, I came to a different patch. Please check other tcp_read_sock() callers in tree :)
> 
> I've seen the other callers, but they all use desc->count for their own
> purpose. That's how I understood what it was used for :-)

Ah yes, I reread your patch and you are right.

> 
> I think it's better not to change the API here and use tcp_read_sock()
> how it's supposed to be used. Also, the less parameters to the function,
> the better.
> 
> However I'm OK for the !timeo before release_sock/lock_sock. I just
> don't know if we can put the rest of the if above or not. I don't
> know what changes we're supposed to collect by doing release_sock/
> lock_sock before the if().

Only the (!timeo) can be above. Other conditions must be checked after
the release/lock.


Thank you


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/