linux-kernel - Re: Panic at tcp_xmit_retransmit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100218112826.39aabf85@pluto.restena.lu>
Date:	Thu, 18 Feb 2010 11:28:26 +0100
From:	Bruno Prémont <bonbons@...ux-vserver.org>
To:	"Ilpo Järvinen" <ilpo.jarvinen@...sinki.fi>
Cc:	sbs <gexlie@...il.com>, Netdev <netdev@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: Panic at tcp_xmit_retransmit_queue

On Mon, 15 Feb 2010 15:21:58 "Ilpo Järvinen" wrote:
> On Wed, 3 Feb 2010, Ilpo Järvinen wrote:
> 
> > On Mon, 1 Feb 2010, sbs wrote:
> > 
> > > actually removing netconsole from kernel didnt help.
> > > i found many guys with the same problem but with different
> > > hardware configurations here:
> > > 
> > > freez in TCP stack :
> > > http://bugzilla.kernel.org/show_bug.cgi?id=14470
> > > 
> > > is there someone who can investigate it?
> > > 
> > > 
> > > On Tue, Jan 19, 2010 at 7:13 PM, sbs <gexlie@...il.com> wrote:
> > > > We are hiting kernel panics on servers with nVidia MCP55 NICs
> > > > once a day; it appears usualy under a high network trafic
> > > > ( around 10000Mbit/s) but it is not a rule, it has happened
> > > > even on low trafic.
> > > >
> > > > Servers are used as nginx+static content
> > > > On 2 equal servers this panic happens aprox 2 times a day
> > > > depending on network load. Machine completly freezes till the
> > > > netconsole reboots.
> > > >
> > > > Kernel: 2.6.32.3
> > > >
> > > > what can it be? whats wrong with tcp_xmit_retransmit_queue()
> > > > function ? can anyone explain or fix?
> > 
> > You might want to try with to debug patch below. It might even make
> > the box to survive the event (if I got it coded right).
> 
> Here should be a better version of the debug patch, hopefully the
> infinite looping is now gone.

I can reproduce the freeze pretty easily, even on an idle server,
all I need is netconsole enabled, an ssh connection to server and
permission to write to /proc/sysrq-trigger.

The following command, executed via SSH triggers the frozen system:
  echo t > /proc/sysrq-trigger
when netconsole is enabled. Doing the same from local console has no
negative effect (idle system).
Unfortunately I can't get any useful information out of the system as
nothing reaches VGA console and interaction with the system is not
possible anymore (cursor is still blinking on VGA console).

Unfortunately I currently have no setup here to analyze dead system via
kexec crash kernel that would be run on watchdog.

System I'm using is HP Proliant DL360 G5 (4 logical CPUs, two sockets),
bnx2 NIC.
Eventually I will try with some other system to reproduce there as
well (to rule out NIC driver).

Any hints on how to get pertinent data out of that system would be
really nice!

Regards,
Bruno
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/