[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <4B17C6C3.1060000@tvk.rwth-aachen.de>
Date: Thu, 03 Dec 2009 15:10:11 +0100
From: Damian Lukowski <damian@....rwth-aachen.de>
To: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
Cc: Frederic Leroy <fredo@...rox.org>, Netdev <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Eric Dumazet <eric.dumazet@...il.com>,
Herbert Xu <herbert@...dor.apana.org.au>,
Greg KH <gregkh@...e.de>
Subject: Re: scp stalls mysteriously
Ilpo Järvinen schrieb:
> On Thu, 3 Dec 2009, Damian Lukowski wrote:
>
>>> On Thu, 3 Dec 2009, Frederic Leroy wrote:
>>>> On Wed, Dec 02, 2009 at 08:17:44PM +0100, Damian Lukowski wrote:
>>>>> could you please printk retrans_stamp just before the return in
>>>>> include/net/tcp.h:retransmits_timed_out()?
>>>>> If the value is not monotonically increasing but is reset to 0 at some
>>>>> point, this might lead to problems in tcp_write_timeout().
>>>>> It's the only idea I have now.
>>>> Your idea is good.
>>>> Only one out of 4 value is not null.
>>>>
>>>> Logs corresponding on http://wwW.starox.org/pub/scp_stall is .10
>>>>
>>>> I make 2 attempts. Printk corresponding to .10 are those after the line
>>>> "wlan1 enter promiscuous mode"
>>> Nice thinking indeed Damian, thanks. ...But but, where exactly did you
>>> print? ...There are multiple returns and the return false branch is
>>> expected to have a zero retrans_stamp in a typical case but that is not
>>> a problem because we never use the value.
>> Yes, it's the retrans_stamp in the subtraction I suspected to be 0.
>> I also suspect this to happen only in the ca_state < CA_Loss case,
>> so one first solution might be to return true whenever retrans_stamp == 0.
>
> I suppose adding || !tp->retrans_stamp into the false condition is fine
> as long as we don't then have a connection that can cause a connection
> to hang there forever for some reason (this needs to be understood well
> enough, not just test driven in stables :-)).
>
>> Unluckily, I still cannot reproduce the scp stalls here, so it would be nice
>> if Frederic printed retrans_stamp together with icsk_ca_state and
>> icsk_retransmits, please.
>
> It wouldn't hurt to know tp->packets_out and tp->retrans_out too, that
> might have some significant w.r.t what happens because of FRTO.
I made a patch for Frederic with Codebase
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
Thanks for testing.
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 03a49c7..c170948 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1268,7 +1268,8 @@ static inline bool retransmits_timed_out(const struct sock *sk,
{
unsigned int timeout, linear_backoff_thresh;
- if (!inet_csk(sk)->icsk_retransmits)
+ if (!inet_csk(sk)->icsk_retransmits ||
+ !tcp_sk(sk)->retrans_stamp)
return false;
linear_backoff_thresh = ilog2(TCP_RTO_MAX/TCP_RTO_MIN);
@@ -1279,6 +1280,11 @@ static inline bool retransmits_timed_out(const struct sock *sk,
timeout = ((2 << linear_backoff_thresh) - 1) * TCP_RTO_MIN +
(boundary - linear_backoff_thresh) * TCP_RTO_MAX;
+ printk("stamp, rstamp, retrans, ca, p_out, retr_out: "
+ "%u, %u, %u, %u, %u, %u\n", tcp_time_stamp,
+ tcp_sk(sk)->retrans_stamp, inet_csk(sk)->icsk_retransmits,
+ inet_csk(sk)->icsk_ca_state, tcp_sk(sk)->packets_out,
+ tcp_sk(sk)->retrans_out);
return (tcp_time_stamp - tcp_sk(sk)->retrans_stamp) >= timeout;
}
--
1.6.4.4
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists