lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.0912031121190.7024@wel-95.cs.helsinki.fi>
Date:	Thu, 3 Dec 2009 12:29:39 +0200 (EET)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	Frederic Leroy <fredo@...rox.org>
cc:	Damian Lukowski <damian@....rwth-aachen.de>,
	Netdev <netdev@...r.kernel.org>, Asdo <asdo@...ftmail.org>,
	David Miller <davem@...emloft.net>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Herbert Xu <herbert@...dor.apana.org.au>,
	Greg KH <gregkh@...e.de>
Subject: Re: scp stalls mysteriously

I've added Greg as CC to make him aware of this issue in early as it now 
affects 2.6.32 too (rather important to get dealt quickly in stable once 
we have a tested solution since TCP is pretty broken with the silent 
deaths this problem seems to cause). ...One possibility would be to just 
queue the tested revert to stable and sort this thing out for 2.6.33 in 
net-2.6.

Opinions, Dave?, Greg?

Now back to the issue...

You said in the other mail that "All further test are on linus-stable 
tree.", which has this contradiction that Linus does not maintain stable 
trees. Which exactly was the tree used for the .9. test, Linus' tree or 
the 2.6.31 stable tree? I suppose the former since the revert wouldn't 
apply to 2.6.31 so I just want to confirm.


On Thu, 3 Dec 2009, Frederic Leroy wrote:
> On Wed, Dec 02, 2009 at 08:17:44PM +0100, Damian Lukowski wrote:
> > could you please printk retrans_stamp just before the return in 
> > include/net/tcp.h:retransmits_timed_out()?
> > If the value is not monotonically increasing but is reset to 0 at some
> > point, this might lead to problems in tcp_write_timeout().
> > It's the only idea I have now.
> 
> Your idea is good.
> Only one out of 4 value is not null.
>
> Logs corresponding on http://wwW.starox.org/pub/scp_stall is .10
> 
> I make 2 attempts. Printk corresponding to .10 are those after the line 
> "wlan1 enter promiscuous mode"

Nice thinking indeed Damian, thanks. ...But but, where exactly did you 
print? ...There are multiple returns and the return false branch is 
expected to have a zero retrans_stamp in a typical case but that is not
a problem because we never use the value.

...Anyway, if I'm wrong with my suspicion and it still holds that we have 
zero retrans_stamp in the substraction too, it could have something to do 
with this snippet:

static void tcp_try_to_open(struct sock *sk, int flag)
{
        struct tcp_sock *tp = tcp_sk(sk);

        tcp_verify_left_out(tp);

        if (!tp->frto_counter && tp->retrans_out == 0)
                tp->retrans_stamp = 0;

...It bit me last time when FRTO was enabled after very small modification 
(without running a full verification after the trivial looking 
modification). ...So I've worked around this clearing for FRTO as you 
can see :-).


Also, we have the another mystery to be solved, the fast retransmission is 
not triggered for some reason (or alternatively not captured in to a 
log), even in the working .9. case. It would be easy to see whether it 
works at all from TCP point of view by looking into mibs once you have 
have some transfers in a working configuration:

grep -A1 TCP /proc/net/netstat

...luckily this fast retransmit issue is less crucial as almost all people 
are pretty happy already if their RTO-based recovery works even if the 
fast recovery would not. So figuring it out can be postponed (if one has 
to prioritize) until the silent death issue is out of the way.


-- 
 i.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ