netdev - Re: scp stalls mysteriously

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.0912080023590.27565@melkinpaasi.cs.helsinki.fi>
Date:	Tue, 8 Dec 2009 00:38:01 +0200 (EET)
From:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
To:	Frederic Leroy <fredo@...rox.org>
cc:	Damian Lukowski <damian@....rwth-aachen.de>,
	David Miller <davem@...emloft.net>,
	Netdev <netdev@...r.kernel.org>
Subject: Re: scp stalls mysteriously

Trimmed Ccs.

On Mon, 7 Dec 2009, Frederic Leroy wrote:

> Le Mon, 7 Dec 2009 16:01:53 +0200 (EET),
> "Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi> a écrit :
> 
> > On Sat, 5 Dec 2009, Damian Lukowski wrote:
> > 
> > > Could you please make another test and unplug the cable or drop
> > > [...]
> > After taking some more look into this, this is partly a red herring.
> > It looks like that because of the place of the printk that was still
> > in the end of the function. You can see the full trace of what
> > happens in .13., they come from independent incarnations of RTO
> > recovery (when finally no error happens in tcp_retransmit_skb).
> 
> Doh ! Sorry :( 

Now I think we have had just too many testcases and are all confused :-).
I was referring to the case .11. (the same case as Damian did) ...Not 
something too newish you did, sorry about that.

> > However, the problem itself could occur. Here's the patch which
> > should prevent that (I'm rather convinced that this really isn't
> > stable worthy but net-next or net-2.6 would be fine):
> > 
> > --
> > [PATCH] tcp: fix retrans_stamp advancing in error cases
> > [...]
> 
> Tonight, I made 2 more tests : .20 and .21 . 
> 
> The first with latest damian patch from today.
> Added the printk (This time I doubled checked ;).
> Start the copy, wait 20s, disconnect cable 20s, reconnect. 
> 
> The second try was identical, but I added your patch.
> The copy was slower comparing to the first try.

The losses you are getting are somewhat random process, so it is usually 
the main explination on different transfer rates. One thing leads to 
another and therefore one case suffers more than other.

> I didn't took time to understand tcp retransmission timeout and read
> the code. So, I'm not sure the printk is at the good place and usefull.

Thanks anyway for all testing so far. I'll try to come up with the other 
debug patch tomorrow to get some information on that -EAGAIN. Unless you 
want to do it yourself and printk all the variables involved in this check 
(in tcp_output.c):

        /* Do not sent more than we queued. 1/4 is reserved for possible
         * copying overhead: fragmentation, tunneling, mangling etc.
         */
        if (atomic_read(&sk->sk_wmem_alloc) >
            min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
                return -EAGAIN;

...better to print them before the check it regardless of the actual 
result of the comparison.

...So far only the Damian's patch is clearly required for stable (but I 
suppose DaveM will handle the stable submissions as usual, hopefully it 
won't take too long though as some other people might start reporting this 
same issue once some time has passed and they notice that something is 
wrong with TCP of their new and shiny 2.6.32 :-)).

-- 
 i.