netdev - Aw: Re: Question regarding kernel panic in net/ipv4/tcp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <trinity-299e8894-40a0-46ec-bc19-1b74a0090b83-1346835703144@3capp-webde-bs46>
Date:	Wed, 5 Sep 2012 11:01:43 +0200 (CEST)
From:	"Sascha Mühlbach" <smka2012@...il.de>
To:	"Eric Dumazet" <eric.dumazet@...il.com>
Cc:	netdev@...r.kernel.org
Subject: Aw: Re: Question regarding kernel panic in net/ipv4/tcp_output.c

Hi,
 
this is the stack trace I could save by making a screenshot of the remote console:
 
[<ffffffff8101193b>] ? invalid_op+0x1b/0x20
[<ffffffff81289e6a>] ? tcp_retransmit_skb+0x66/0x5aa
[<ffffffff8128a546>] ? tcp_xmit_retransmit_queue+0x198/0x223
[<ffffffff81286113>] ? tcp_ack+0x1744/0x1952
[<ffffffff81286871>] ? tcp_validate_incoming+0x1ba/0x2be
[<ffffffff81286f5e>] ? tcp_rcv_established+0x5e9/0x6d9
[<ffffffff8128e00f>] ? tcp_v4_do_rcv+0x1bb/0x376
[<ffffffff8128e639>] ? tcp_v4_rcv+0x46f/0x6f8
[<ffffffff81273afa>] ? ip_local_deliver_finish+0x0/0x1e9
[<ffffffff81273afa>] ? ip_local_deliver_finish+0x0/0x1e9
[<ffffffff81273c40>] ? ip_local_deliver_finish+0x146/0x1e9
[<ffffffff8127378f>] ? ip_rcv_finish+0x373/0x38d
[<ffffffffa004d95c>] ? bnx2_poll_work+0x954/0xa7e [bnx2]
[<ffffffffa004d95c>] ? bnx2_poll_work+0x954/0xa7e [bnx2]
[<ffffffff8105aeb6>] ? __mod_timer+0x141/0x153
[<ffffffff810964a1>] ? handle_edge_irq+0xdd/0x101
[<ffffffffa004daae>] ? bnx2_poll_msix+0x28/0xa6 [bnx2]
[<ffffffff8125039f>] ? net_rx_action+0xae/0x1c9
[<ffffffff81053d6f>] ? __do_softirq+0xdd/0x1a6
[<ffffffff81011cac>] ? call_softirq+0x1c/0x30
[<ffffffff8101322b>] ? do_softirq+0x3f/0x7c
[<ffffffff81053bdf>] ? irq_exit+0x36/0x76
[<ffffffff81012922>] ? do_IRQ+0xa0/0xb6
[<ffffffff810114d3>] ? ret_from_intr+0x0/0x11
<E0I> [<ffffffffa0135509>] ? acpi_idle_enter_bm+0x27d/0x2af [processor]
[<ffffffffa0135509>] ? acpi_idle_enter_bm+0x27d/0x2af [processor]
[<ffffffffa0135502>] ? acpi_idle_enter_bm+0x276/0x2af [processor]
[<ffffffff8123a2c6>] ? cpuidle_idle_call+0x94/0xee
[<ffffffff8100fe97>] ? cpu_idle+0xa2/0xda
[<ffffffff8151c140>] ? early_idt_handler+0x0/0x71
[<ffffffff8151ccdd>] ? start_kernel+0x3dc/0x3e8
[<ffffffff8151c3b7>] ? x86_64_start_kernel+0xf9/0x106

Kind Regards,
Sascha

>Gesendet: Dienstag, 04. September 2012 um 14:23 Uhr
>Von: "Eric Dumazet" <eric.dumazet@...il.com>
>An: smka2012@...il.de
>Cc: netdev@...r.kernel.org
>Betreff: Re: Question regarding kernel panic in net/ipv4/tcp_output.c
>On Tue, 2012-09-04 at 13:55 +0200, smka2012@...il.de wrote:
>> Hi,
>>
>> I recently had a severe issue with multiple servers that stopped
>> working at the same time. By using the stacktrace, I could locate the
>> following line in the tcp_retransmit_skb function of the tcp_output.c
>> kernel source that led to a kernel panic on all servers. I am using
>> Debian 6.0 with the kernel version 2.6.32-45.
>>
>> (gdb) list *(tcp_retransmit_skb+0x66)
>> 0x1a93 is in tcp_retransmit_skb (net/ipv4/tcp_output.c:1905).
>> 1900 if (atomic_read(&sk->sk_wmem_alloc) >
>> 1901 min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
>> sk->sk_sndbuf))
>> 1902 return -EAGAIN;
>> 1903
>> 1904 if (before(TCP_SKB_CB(skb)->seq, tp->snd_una)) {
>>
>> --->
>> 1905 if (before(TCP_SKB_CB(skb)->end_seq, tp->snd_una))
>> 1906 BUG();
>> <---
>>
>> 1907 if (tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq))
>> 1908 return -ENOMEM;
>> 1909 }
>>
>> Now I am interested in getting to know what can lead to a situation
>> that the condition in line 1905 evaluates true and why the kernel goes
>> into the BUG() function in that case and does not only return an
>> error. All servers reached this line of code. They all were connected
>> to a switch that broke the same time. However, I cannot say if the
>> switch broke before the servers and eventually affected the servers or
>> if the switch was also itself affected by some external event.
>>
>> Thank you very much for your help.
>>
>> Kind Regards,
>> Sascha
>> --
>
>You can see this BUG() as an early notification of a hard to
>debug/diagnose bug.
>
>We shouldnt take this path at all.
>
>If we do, we have an earlier bug that we should fix anyway, because
>machine is going to crash.
>
>It would be nice you sent the stack trace you had.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html