[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090109221744.GA4819@1wt.eu>
Date: Fri, 9 Jan 2009 23:17:44 +0100
From: Willy Tarreau <w@....eu>
To: Eric Dumazet <dada1@...mosbay.com>
Cc: David Miller <davem@...emloft.net>, ben@...s.com,
jarkao2@...il.com, mingo@...e.hu, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, jens.axboe@...cle.com
Subject: Re: [PATCH] tcp: splice as many packets as possible at once
On Fri, Jan 09, 2009 at 11:12:09PM +0100, Eric Dumazet wrote:
> Willy Tarreau a écrit :
> > On Fri, Jan 09, 2009 at 10:24:00PM +0100, Willy Tarreau wrote:
> >> On Fri, Jan 09, 2009 at 09:51:17PM +0100, Eric Dumazet wrote:
> >> (...)
> >>>> Also, in your second mail, you're saying that your change
> >>>> might return more data than requested by the user. I can't
> >>>> find why, could you please explain to me, as I'm still quite
> >>>> ignorant in this area ?
> >>> Well, I just tested various user programs and indeed got this
> >>> strange result :
> >>>
> >>> Here I call splice() with len=1000 (0x3e8), and you can see
> >>> it gives a result of 1460 at the second call.
> >
> > OK finally I could reproduce it and found why we have this. It's
> > expected in fact.
> >
> > The problem when we loop in tcp_read_sock() is that tss->len is
> > not decremented by the amount of bytes read, this one is done
> > only in tcp_splice_read() which is outer.
> >
> > The solution I found was to do just like other callers, which means
> > use desc->count to keep the remaining number of bytes we want to
> > read. In fact, tcp_read_sock() is designed to use that one as a stop
> > condition, which explains why you first had to hide it.
> >
> > Now with the attached patch as a replacement for my previous one,
> > both issues are solved :
> > - I splice 1000 bytes if I ask to do so
> > - I splice as much as possible if available (typically 23 kB).
> >
> > My observed performances are still at the top of earlier results
> > and IMHO that way of counting bytes makes sense for an actor called
> > from tcp_read_sock().
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 35bcddf..51ff3aa 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -522,8 +522,12 @@ static int tcp_splice_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
> > unsigned int offset, size_t len)
> > {
> > struct tcp_splice_state *tss = rd_desc->arg.data;
> > + int ret;
> >
> > - return skb_splice_bits(skb, offset, tss->pipe, tss->len, tss->flags);
> > + ret = skb_splice_bits(skb, offset, tss->pipe, rd_desc->count, tss->flags);
> > + if (ret > 0)
> > + rd_desc->count -= ret;
> > + return ret;
> > }
> >
> > static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
> > @@ -531,6 +535,7 @@ static int __tcp_splice_read(struct sock *sk, struct tcp_splice_state *tss)
> > /* Store TCP splice context information in read_descriptor_t. */
> > read_descriptor_t rd_desc = {
> > .arg.data = tss,
> > + .count = tss->len,
> > };
> >
> > return tcp_read_sock(sk, &rd_desc, tcp_splice_data_recv);
> >
>
> OK, I came to a different patch. Please check other tcp_read_sock() callers in tree :)
I've seen the other callers, but they all use desc->count for their own
purpose. That's how I understood what it was used for :-)
I think it's better not to change the API here and use tcp_read_sock()
how it's supposed to be used. Also, the less parameters to the function,
the better.
However I'm OK for the !timeo before release_sock/lock_sock. I just
don't know if we can put the rest of the if above or not. I don't
know what changes we're supposed to collect by doing release_sock/
lock_sock before the if().
Regards,
Willy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists