netdev - Re: panics in tcp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1370219787.24311.113.camel@edumazet-glaptop>
Date:	Sun, 02 Jun 2013 17:36:27 -0700
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Rob Herring <robherring2@...il.com>
Cc:	netdev@...r.kernel.org
Subject: Re: panics in tcp_ack

On Sun, 2013-06-02 at 19:16 -0500, Rob Herring wrote:
> Sorry, this time with proper line wrapping...
> 
> I'm debugging a kernel panic in the networking stack that happens with a
> cluster (20-40 nodes) of Calxeda highbank (ARM Cortex A9) nodes and
> typically only after 10-24 hours. The node are transferring files
> between nodes over TCP with 20 clients and servers per node. The kernel
> is based on ubuntu 3.5 kernel which is based on 3.5.7.11. So far testing
> has shown that 3.8.11 based (ubuntu raring) kernel is fixed. Attempts to
> bisect have not yielded results as it seems multiple problems mask the
> issue. Perhaps there is some new feature which has indirectly fixed the
> problem in 3.8.
> 
> This commit appears to fix a similar panic and seems to reduce the
> frequency after picking it up in the latest 3.5 stable:
> 
> commit 16fad69cfe4adbbfa813de516757b87bcae36d93
> Author: Eric Dumazet <edumazet@...gle.com>
> Date:   Thu Mar 14 05:40:32 2013 +0000
> 
>     tcp: fix skb_availroom()
>         Chrome OS team reported a crash on a Pixel ChromeBook in TCP stack :
>         https://code.google.com/p/chromium/issues/detail?id=182056
>         commit a21d45726acac (tcp: avoid order-1 allocations on wifi and tx
>     path) did a poor choice adding an 'avail_size' field to skb, while
>     what we really needed was a 'reserved_tailroom' one.
>         It would have avoided commit 22b4a4f22da (tcp: fix retransmit of
>     partially acked frames) and this commit.
>         Crash occurs because skb_split() is not aware of the 'avail_size'
>     management (and should not be aware)
>         Signed-off-by: Eric Dumazet <edumazet@...gle.com>
>     Reported-by: Mukesh Agrawal <quiche@...omium.org>
>     Signed-off-by: David S. Miller <davem@...emloft.net>
> 
> I've searched thru 3.8 and 3.9 stable fixes looking for possibly
> relevant commits and applied these commits not in 3.5 stable. However,
> they have not helped:
> 
> net: drop dst before queueing fragments
> tcp: call tcp_replace_ts_recent() from tcp_ack()
> tcp: Reallocate headroom if it would overflow csum_start
> tcp: incoming connections might use wrong route under synflood
> 

try also :

commit 093162553c33e94 (tcp: force a dst refcount when prequeue packet)
commit 0d4f0608619de59 (tcp: dont handle MTU reduction on LISTEN socket)
commit 6731d2095bd4aef (tcp: fix for zero packets_in_flight was too
broad)
commit 2e5f421211ff76c (tcp: frto should not set snd_cwnd to 0)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html