lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 29 Dec 2009 10:33:10 +0100
From:	Bruno Prémont <bonbons@...ux-vserver.org>
To:	"Benjamin Li" <benli@...adcom.com>
Cc:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"Michael Chan" <mchan@...adcom.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: BNX2: Kernel crashes with 2.6.31 and 2.6.31.9

Hi Benjamin,

On Tue, 29 Dec 2009 01:05:40 "Benjamin Li" <benli@...adcom.com> wrote:
> Hi Bruno,
> 
> It looks like the the NULL dereference is happening at a0fc.
> 
> a0f8:       48 8b 42 70             mov 0x70(%rdx),%rax 
> a0fc:       0f b7 10                movzwl (%rax),%edx
> a0ff:       31 c0                   xor    %eax,%eax

Thanks for confirming my guess

> The offset of 0x70 is the bp field in the bnx2_napi structure.  (Seen
> in the bnx2_napi structure dump below)  These lines are found in the
> routine, bnx2_get_hw_tx_cons() which look like they were inlined by
> the compiler.  More specifically it looks like the dereference of the
> hw_tx_cons_ptr failed.
> 
> cons = *bnapi->hw_tx_cons_ptr;
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/net/bnx2.c;h=06b901152d4487fa04164437cc179661b44657fe;hb=74fca6a42863ffacaf7ba6f1936a9f228950f657#l2761
> 
> To be sure this is the case, could you send the .config file you are
> using or if you could send me the bnx2 kernel module built with the
> CFLAG '-g', then we can definitely verify where in the code it is
> crashing.

See attached .config, if needed I can recompile with the module with
'-g', but the original instance does not contain debugging info.

> Did you see anything suspicious in the system kernel logs?  If you
> could isolate the logs from when the machine booted to when it crash
> and send it to us it would be very helpful. 

Unfortunately there is nothing suspicious in there, all I have is
attached dmesg (with IP addresses, MAC addresses replaced by '*'s)

I've not appended the crash dump gathered via netconsole which didn't
make it to the affected system's disk (see previous mail for it).


Regards,
Bruno



> Thanks again for your time.
> 
> -Ben
> 
> 
> <--snip snip structure dump from pahole-->
> struct bnx2_napi {
>         struct napi_struct         napi;                 /*     0
> 96 */
>         /* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
>         struct bnx2 *              bp;                   /*    96
> 8 */
>         union {
>                 struct status_block * msi;               /*
> 8 */
>                 struct status_block_msix * msix;         /*
> 8 */
>         } status_blk;                                    /*   104
> 8 */
>         u16 *                      hw_tx_cons_ptr;       /*   112
> 8 */
>         u16 *                      hw_rx_cons_ptr;       /*   120
> 8 */
>         /* --- cacheline 2 boundary (128 bytes) --- */
>         u32                        last_status_idx;      /*   128
> 4 */
>         u32                        int_num;              /*   132
> 4 */
>         struct bnx2_rx_ring_info   rx_ring;              /*   136
> 360 */
>         /* --- cacheline 7 boundary (448 bytes) was 48 bytes ago ---
> */ struct bnx2_tx_ring_info   tx_ring;              /*   496    48
> */
>         /* --- cacheline 8 boundary (512 bytes) was 32 bytes ago ---
> */
> 
>         /* size: 576, cachelines: 9 */
>         /* padding: 32 */
> };
> <--snip snip-->
> 
> On Mon, 2009-12-28 at 23:49 -0800, Bruno Prémont wrote: 
> > On a system that was running 2.6.31 since last September I got two
> > crashes this December at night (cause unknown), yesterday after
> > second crash I updated kernel to 2.6.31.9 and enabled netconsole in
> > the hope to get some information about the cause of the crash.
> > 
> > Today system crashed once again and all I got is the following
> > incomplete trace on the receiving side of netconsole:
> > 
> > [24701.841185] BUG: unable to handle kernel NULL pointer
> > dereference at (null) [24701.841188] IP: [<ffffffffa00610fc>]
> > bnx2_poll_work+0x2c/0x12d0 [bnx2] [24701.841197] PGD 16509067 PUD
> > 4e776067 PMD 0 [24701.841199] Oops: 0000 [#1] SMP
> > [24701.841202] last sysfs file: /sys/kernel/uevent_seqnum
> > [24701.841204] CPU 0
> > [24701.841205] Modules linked in: ipmi_devintf squashfs ext2
> > zlib_inflate netconsole configfs loop dm_round_robin scsi_dh_rdac
> > dm_multipath scsi_dh dm_mod sg sr_mod cdrom ata_piix i pmi_si
> > ipmi_msghandler qla2xxx ahci bnx2 hpwdt uhci_hcd ehci_hcd libata
> > [24701.841218] Pid: 11273, comm: php-cgi Not tainted
> > 2.6.31.9-x86_64 #1 ProLiant DL360 G5 [24701.841220] RIP:
> > 0010:[<ffffffffa00610fc>]  [<ffffffffa00610fc>]
> > bnx2_poll_work+0x2c/0x12d0 [bnx2]
> > 
> > 
> > Running objdump on the bnx2.ko module I get the following:
> > 000000000000a0d0 <bnx2_poll_work>:
> >     a0d0:       41 57                   push   %r15
> >     a0d2:       41 56                   push   %r14
> >     a0d4:       41 55                   push   %r13
> >     a0d6:       41 54                   push   %r12
> >     a0d8:       55                      push   %rbp
> >     a0d9:       53                      push   %rbx
> >     a0da:       48 81 ec 28 01 00 00    sub    $0x128,%rsp
> >     a0e1:       48 89 7c 24 18          mov    %rdi,0x18(%rsp)
> >     a0e6:       48 89 74 24 10          mov    %rsi,0x10(%rsp)
> >     a0eb:       89 54 24 0c             mov    %edx,0xc(%rsp)
> >     a0ef:       89 4c 24 08             mov    %ecx,0x8(%rsp)
> >     a0f3:       48 8b 54 24 10          mov    0x10(%rsp),%rdx
> >     a0f8:       48 8b 42 70             mov    0x70(%rdx),%rax
> >     a0fc:       0f b7 10                movzwl (%rax),%edx
> >     a0ff:       31 c0                   xor    %eax,%eax
> >     a101:       48 8b 4c 24 10          mov    0x10(%rsp),%rcx
> >     a106:       80 fa ff                cmp    $0xff,%dl
> >     a109:       0f 94 c0                sete   %al
> >     a10c:       01 c2                   add    %eax,%edx
> >     a10e:       66 39 91 1a 02 00 00    cmp    %dx,0x21a(%rcx)
> >     a115:       0f 84 78 01 00 00       je     a293
> > <bnx2_poll_work+0x1c3> a11b:       48 8b 57 08             mov
> > 0x8(%rdi),%rdx a11f:       48 89 f8                mov    %rdi,%rax
> >     a122:       48 8b 9a 00 03 00 00    mov    0x300(%rdx),%rbx
> >     a129:       48 83 c0 40             add    $0x40,%rax
> >     a12d:       48 29 c1                sub    %rax,%rcx
> >     a130:       48 89 c8                mov    %rcx,%rax
> >     a133:       48 c1 f8 06             sar    $0x6,%rax
> >     a137:       69 c0 39 8e e3 38       imul   $0x38e38e39,%eax,%eax
> >     a13d:       48 c1 e0 07             shl    $0x7,%rax
> >     a141:       48 01 d8                add    %rbx,%rax
> >     a144:       48 89 44 24 20          mov    %rax,0x20(%rsp)
> >     a149:       48 8b 7c 24 10          mov    0x10(%rsp),%rdi
> >     a14e:       48 8b 47 70             mov    0x70(%rdi),%rax
> >     a152:       44 0f b7 30             movzwl (%rax),%r14d
> >     a156:       31 c0                   xor    %eax,%eax
> >     a158:       0f b7 9f 18 02 00 00    movzwl 0x218(%rdi),%ebx
> >     a15f:       41 80 fe ff             cmp    $0xff,%r14b
> >     a163:       0f 94 c0                sete   %al
> >     a166:       45 31 ff                xor    %r15d,%r15d
> >     a169:       41 01 c6                add    %eax,%r14d
> >     a16c:       66 44 39 f3             cmp    %r14w,%bx
> >     a170:       0f 84 ee 00 00 00       je     a264
> > <bnx2_poll_work+0x194> a176:       66 2e 0f 1f 84 00 00    nopw
> > %cs:0x0(%rax,%rax,1) a17d:       00 00 00 
> >     a180:       0f b6 cb                movzbl %bl,%ecx
> >     a183:       48 8b 44 24 10          mov    0x10(%rsp),%rax
> >     a188:       44 0f b7 e1             movzwl %cx,%r12d
> >     a18c:       49 c1 e4 04             shl    $0x4,%r12
> >     a190:       4c 03 a0 10 02 00 00    add    0x210(%rax),%r12
> >     a197:       4d 8b 2c 24             mov    (%r12),%r13
> >     a19b:       66 41 83 7c 24 08 00    cmpw   $0x0,0x8(%r12)
> >     a1a2:       41 0f 18 8d bc 00 00    prefetcht0 0xbc(%r13)
> >     a1a9:       00 
> >                 ...
> > 
> > 
> > Kernel is compiled on Gentoo (64bit):
> >   Linux version 2.6.31.9-x86_64 () (gcc version 4.3.4 (Gentoo 4.3.4
> > p1.0, pie-10.1.5) ) #1 SMP Mon Dec 28 15:49:16 CET 2009 The
> > affected server (HP DL360 G5) is running OpenSuSE-11.1, 32bit
> > userspace
> > 
> > Any idea if there is a recent patch that could fix this issue? At
> > the crashing time the server was not specifically loaded and had
> > around 200 packets/s network traffic.
> > 
> > Regards,
> > Bruno


View attachment "dmesg" of type "text/plain" (50098 bytes)

View attachment ".config" of type "text/plain" (51368 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ