netdev - Re: Question regarding kernel panic in net/ipv4/tcp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1346761388.13121.21.camel@edumazet-glaptop>
Date:	Tue, 04 Sep 2012 14:23:08 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	smka2012@...il.de
Cc:	netdev@...r.kernel.org
Subject: Re: Question regarding kernel panic in net/ipv4/tcp_output.c

On Tue, 2012-09-04 at 13:55 +0200, smka2012@...il.de wrote:
> Hi,
>  
> I recently had a severe issue with multiple servers that stopped
> working at the same time. By using the stacktrace, I could locate the
> following line in the tcp_retransmit_skb function of the tcp_output.c
> kernel source that led to a kernel panic on all servers. I am using
> Debian 6.0 with the kernel version 2.6.32-45.
>  
> (gdb) list *(tcp_retransmit_skb+0x66)
> 0x1a93 is in tcp_retransmit_skb (net/ipv4/tcp_output.c:1905).
> 1900  if (atomic_read(&sk->sk_wmem_alloc) >
> 1901      min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
> sk->sk_sndbuf))
> 1902   return -EAGAIN;
> 1903
> 1904  if (before(TCP_SKB_CB(skb)->seq, tp->snd_una)) {
> 
> --->
> 1905   if (before(TCP_SKB_CB(skb)->end_seq, tp->snd_una))
> 1906    BUG();
> <---
>  
> 1907   if (tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq))
> 1908    return -ENOMEM;
> 1909  }
>  
> Now I am interested in getting to know what can lead to a situation
> that the condition in line 1905 evaluates true and why the kernel goes
> into the BUG() function in that case and does not only return an
> error. All servers reached this line of code. They all were connected
> to a switch that broke the same time. However, I cannot say if the
> switch broke before the servers and eventually affected the servers or
> if the switch was also itself affected by some external event. 
>  
> Thank you very much for your help.
>  
> Kind Regards,
> Sascha
> --

You can see this BUG() as an early notification of a hard to
debug/diagnose bug.

We shouldnt take this path at all.

If we do, we have an earlier bug that we should fix anyway, because
machine is going to crash.

It would be nice you sent the stack trace you had.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html