| lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
|
Open Source and information security mailing list archives
| ||
|
Message-ID: <1267493170.2762.45.camel@dhcp-10-12-137-104.broadcom.com> Date: Mon, 1 Mar 2010 17:26:10 -0800 From: "Benjamin Li" <benli@...adcom.com> To: "Bruno Prémont" <bonbons@...ux-vserver.org> cc: NetDEV <netdev@...r.kernel.org>, "Michael Chan" <mchan@...adcom.com>, Linux-Kernel <linux-kernel@...r.kernel.org> Subject: Re: BNX2: Kernel crashes with 2.6.31 and 2.6.31.9 Hi Bruno, On Tue, 2010-02-23 at 04:15 -0800, Bruno Prémont wrote: > Hi Benjamin, > > On Fri, 19 February 2010 "Benjamin Li" <benli@...adcom.com> wrote: > > >From your logs it looks like the device came up using MSI, but in the > > MSI-X poll routine was being called: > > > > [ 9.836673] bnx2: eth0: using MSI > > ... > > > > [ 134.643459] [<ffffffffa004019e>] bnx2_poll_msix+0x3e/0xd0 [bnx2] > > [ 134.643465] [<ffffffff8135bcd1>] netpoll_poll+0xe1/0x3c0 > > > > which is incorrect. If we are in MSI mode, the bnx2_poll() routine > > should be used. > > > > I think what is going on here is that during the bnx2x driver > > initialization the current bnx2 driver adds all possible NAPI > > structures that map to all the hardware vectors (BNX2_MAX_MSIX_VEC=9) > > to the NAPI list in the net_device structure regardless if they are > > used or not (Seen in drivers/net/bnx2.c:bnx2_init_napi()). This can > > cause uninitialized NAPI structures to be placed on the napi_list. > > Because this device is in MSI mode, only 1 vector is initialized. > > Now, the problem is triggered when net/core/netpoll.c:poll_napi() is > > called. This is because this routine will run through the entire > > napi_list calling all the poll routines. In your particular case, it > > is calling the poll routine on an uninitialized vector causing the > > kernel panic. > > > > Please try the patch below to see if it solves your problem. Note, > > this only have been compile tested and tested against basic traffic > > runs. Unfortunately, I could not reproduce the kernel panic with the > > instructions below to verify the patch. > > > > Thanks again for all your help in helping us track this down. > > I applied the patch today and tried to reproduce with my showcases. > > Seems that it's harder to trigger now but I still end up being able to > crash the box. Don't know if it's the same cause or not (could also > be the tcp-retransmit ghost)... > > This time I had to run a few paralell scp's (8Mb/s each) to the box and > 'echo t > /proc/sysrq-trigger' multiple times via ssh session for it to > happen. It didn't trigger with by netbomb though I will try some more > and see) > > I don't know if it's the same reason or not (hopefully something > reached disk as serial console is dead and pings are not > answered anymore. > It's probably some printk/bug/warn that triggers in network stack and > deadlocks with netconsole. Thanks for trying the patch. I still haven't been able to reproduce what you are seeing here. I am able to run scp and 'echo t > /proc/sysrq-trigger' multiple times. I was wondering if you had any success reproducing the problem with a stack trace? Thanks again. -Ben > > Regards, > Bruno > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists