lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1346761388.13121.21.camel@edumazet-glaptop>
Date:	Tue, 04 Sep 2012 14:23:08 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	smka2012@...il.de
Cc:	netdev@...r.kernel.org
Subject: Re: Question regarding kernel panic in net/ipv4/tcp_output.c

On Tue, 2012-09-04 at 13:55 +0200, smka2012@...il.de wrote:
> Hi,
>  
> I recently had a severe issue with multiple servers that stopped
> working at the same time. By using the stacktrace, I could locate the
> following line in the tcp_retransmit_skb function of the tcp_output.c
> kernel source that led to a kernel panic on all servers. I am using
> Debian 6.0 with the kernel version 2.6.32-45.
>  
> (gdb) list *(tcp_retransmit_skb+0x66)
> 0x1a93 is in tcp_retransmit_skb (net/ipv4/tcp_output.c:1905).
> 1900  if (atomic_read(&sk->sk_wmem_alloc) >
> 1901      min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
> sk->sk_sndbuf))
> 1902   return -EAGAIN;
> 1903
> 1904  if (before(TCP_SKB_CB(skb)->seq, tp->snd_una)) {
> 
> --->
> 1905   if (before(TCP_SKB_CB(skb)->end_seq, tp->snd_una))
> 1906    BUG();
> <---
>  
> 1907   if (tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq))
> 1908    return -ENOMEM;
> 1909  }
>  
> Now I am interested in getting to know what can lead to a situation
> that the condition in line 1905 evaluates true and why the kernel goes
> into the BUG() function in that case and does not only return an
> error. All servers reached this line of code. They all were connected
> to a switch that broke the same time. However, I cannot say if the
> switch broke before the servers and eventually affected the servers or
> if the switch was also itself affected by some external event. 
>  
> Thank you very much for your help.
>  
> Kind Regards,
> Sascha
> --

You can see this BUG() as an early notification of a hard to
debug/diagnose bug.

We shouldnt take this path at all.

If we do, we have an earlier bug that we should fix anyway, because
machine is going to crash.

It would be nice you sent the stack trace you had.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ