[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4967D36E.6020207@cosmosbay.com>
Date: Fri, 09 Jan 2009 23:45:02 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: Willy Tarreau <w@....eu>
CC: David Miller <davem@...emloft.net>, ben@...s.com,
jarkao2@...il.com, mingo@...e.hu, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, jens.axboe@...cle.com
Subject: Re: [PATCH] tcp: splice as many packets as possible at once
Willy Tarreau a écrit :
> On Fri, Jan 09, 2009 at 11:12:09PM +0100, Eric Dumazet wrote:
>> Willy Tarreau a écrit :
>>> On Fri, Jan 09, 2009 at 10:24:00PM +0100, Willy Tarreau wrote:
>>>> On Fri, Jan 09, 2009 at 09:51:17PM +0100, Eric Dumazet wrote:
>>>> (...)
>>>>>> Also, in your second mail, you're saying that your change
>>>>>> might return more data than requested by the user. I can't
>>>>>> find why, could you please explain to me, as I'm still quite
>>>>>> ignorant in this area ?
>>>>> Well, I just tested various user programs and indeed got this
>>>>> strange result :
>>>>>
>>>>> Here I call splice() with len=1000 (0x3e8), and you can see
>>>>> it gives a result of 1460 at the second call.
>>> OK finally I could reproduce it and found why we have this. It's
>>> expected in fact.
>>>
>>> The problem when we loop in tcp_read_sock() is that tss->len is
>>> not decremented by the amount of bytes read, this one is done
>>> only in tcp_splice_read() which is outer.
>>>
>>> The solution I found was to do just like other callers, which means
>>> use desc->count to keep the remaining number of bytes we want to
>>> read. In fact, tcp_read_sock() is designed to use that one as a stop
>>> condition, which explains why you first had to hide it.
>>>
>>> Now with the attached patch as a replacement for my previous one,
>>> both issues are solved :
>>> - I splice 1000 bytes if I ask to do so
>>> - I splice as much as possible if available (typically 23 kB).
>>>
>>> My observed performances are still at the top of earlier results
>>> and IMHO that way of counting bytes makes sense for an actor called
>>> from tcp_read_sock().
>>>
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index 35bcddf..51ff3aa 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -522,8 +522,12 @@ static int tcp_splice_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
>>> unsigned int offset, size_t len)
>>> {
>>> struct tcp_splice_state *tss = rd_desc->arg.data;
>>> + int ret;
>>>
>>> - return skb_splice_bits(skb, offset, tss->pipe, tss->len, tss->flags);
>>> + ret = skb_splice_bits(skb, offset, tss->pipe, rd_desc->count, tss->flags);
>>> + if (ret > 0)
>>> + rd_desc->count -= ret;
>>> + return ret;
>>> }
>>>
>>> static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
>>> @@ -531,6 +535,7 @@ static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
>>> /* Store TCP splice context information in read_descriptor_t. */
>>> read_descriptor_t rd_desc = {
>>> .arg.data = tss,
>>> + .count = tss->len,
>>> };
>>>
>>> return tcp_read_sock(sk, &rd_desc, tcp_splice_data_recv);
>>>
>> OK, I came to a different patch. Please check other tcp_read_sock() callers in tree :)
>
> I've seen the other callers, but they all use desc->count for their own
> purpose. That's how I understood what it was used for :-)
Ah yes, I reread your patch and you are right.
>
> I think it's better not to change the API here and use tcp_read_sock()
> how it's supposed to be used. Also, the less parameters to the function,
> the better.
>
> However I'm OK for the !timeo before release_sock/lock_sock. I just
> don't know if we can put the rest of the if above or not. I don't
> know what changes we're supposed to collect by doing release_sock/
> lock_sock before the if().
Only the (!timeo) can be above. Other conditions must be checked after
the release/lock.
Thank you
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists