[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1009272155030.15680@melkinpaasi.cs.helsinki.fi>
Date: Mon, 27 Sep 2010 22:20:57 +0300 (EEST)
From: "Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To: David Miller <davem@...emloft.net>
cc: ycheng@...gle.com, Netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH] fix TSO FACK loss marking in tcp_mark_head_lost
On Mon, 27 Sep 2010, David Miller wrote:
> From: "Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
> Date: Mon, 27 Sep 2010 15:22:09 +0300 (EEST)
>
> > On Fri, 24 Sep 2010, Yuchung Cheng wrote:
> >
> >> When TCP uses FACK algorithm to mark lost packets in
> >> tcp_mark_head_lost(), if the number of packets in the (TSO) skb is
> >> greater than the number of packets that should be marked lost, TCP
> >> incorrectly exits the loop and marks no packets lost in the skb. This
> >> underestimates tp->lost_out and affects the recovery/retransmission.
> >> This patch fargments the skb and marks the correct amount of packets
> >> lost.
> >>
> >> Signed-off-by: Yuchung Cheng <ycheng@...gle.com>
> >> ---
> >> net/ipv4/tcp_input.c | 3 ++-
> >> 1 files changed, 2 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> >> index 1bc87a0..e4f472e 100644
> >> --- a/net/ipv4/tcp_input.c
> >> +++ b/net/ipv4/tcp_input.c
> >> @@ -2532,7 +2532,8 @@ static void tcp_mark_head_lost(struct sock *sk, int packets)
> >> cnt += tcp_skb_pcount(skb);
> >>
> >> if (cnt > packets) {
> >> - if (tcp_is_sack(tp) || (oldcnt >= packets))
> >> + if ((tcp_is_sack(tp) && !tcp_is_fack(tp)) ||
> >> + (oldcnt >= packets))
> >> break;
> >>
> >> mss = skb_shinfo(skb)->gso_size;
> >>
> >
> > Acked-by: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
>
> BTW, the history is that this code was added to fix a bug because
> we didn't handle GSO packets at all at one point.
>
> But, conservatively, we didn't do splits here for SACK, and it was
> stated in the commit that we would look into it "at some point in the
> future" :-)
>
> --------------------
> commit c137f3dda04b0aee1bc6889cdc69185f53df8a82
> Author: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
> Date: Mon Apr 7 22:32:38 2008 -0700
>
> [TCP]: Fix NewReno's fast rexmit/recovery problems with GSOed skb
>
> Fixes a long-standing bug which makes NewReno recovery crippled.
> With GSO the whole head skb was marked as LOST which is in
> violation of NewReno procedure that only wants to mark one packet
> and ended up breaking our TCP code by causing counter overflow
> because our code was built on top of assumption about valid
> NewReno procedure. This manifested as triggering a WARN_ON for
> the overflow in a number of places.
>
> It seems relatively safe alternative to just do nothing if
> tcp_fragment fails due to oom because another duplicate ACK is
> likely to be received soon and the fragmentation will be retried.
>
> Special thanks goes to Soeren Sonnenburg <kernel@....de> who was
> lucky enough to be able to reproduce this so that the warning
> for the overflow was hit. It's not as easy task as it seems even
> if this bug happens quite often because the amount of outstanding
> data is pretty significant for the mismarkings to lead to an
> overflow.
>
> Because it's very late in 2.6.25-rc cycle (if this even makes in
> time), I didn't want to touch anything with SACK enabled here.
> Fragmenting might be useful for it as well but it's more or less
> a policy decision rather than mandatory fix. Thus there's no need
> to rush and we can postpone considering tcp_fragment with SACK
> for 2.6.26.
>
> In 2.6.24 and earlier, this very same bug existed but the effect
> is slightly different because of a small changes in the if
> conditions that fit to the patch's context. With them nothing
> got lost marker and thus no retransmissions happened.
>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
> Signed-off-by: David S. Miller <davem@...emloft.net>
> --------------------
>
> To be honest, we should probably just remove the whole tcp_is_sack()
> test, rather than special case FACK.
>
> The cost isn't what it was when this code was added. Back then we
> didn't have Ilpo's restransmit queue coalescing code, so it would make
> retransmit queue packet freeing more expensive. But since the
> coalescing code is there now, splitting all the time to record
> accurate loss information should be fine.
>
> Ilpo what do you say?
Removing it would add a bug for non-FACK SACK as lost_cnt_hint is
calculated a bit differently with it (only SACKed skbs count so
lost_cnt_hint is not changing over the whole skb with pcount > 1 while
it does change per each seg for FACK, basically non-FACK is all or
nothing as only passing by an already SACKed segment increases
lost_cnt_hint). ...I also fell to this trap once while reviewing this
change and wondered in my mind why not to d it for all variants :-). So
Yuchung's patch is fine as is afaict.
I disagree now with myself btw, it's not just "a policy decision" as we
do not retransmit some segments at all before RTO in some scenarios.
--
i.
Powered by blists - more mailing lists