lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <216b770e-fc08-68a6-c1bf-be96d52e325e@deltatee.com>
Date:   Fri, 28 Jul 2017 11:38:13 -0600
From:   Logan Gunthorpe <logang@...tatee.com>
To:     Matan Barak <matanb@...lanox.com>,
        Yishai Hadas <yishaih@...lanox.com>,
        Doug Ledford <dledford@...hat.com>,
        "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>
Cc:     Sean Hefty <sean.hefty@...el.com>,
        Hal Rosenstock <hal.rosenstock@...il.com>,
        Jason Gunthorpe <jgunthorpe@...idianresearch.com>,
        Stephen Bates <sbates@...thlin.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: BUG: NULL pointer dereference at ib_uverbs_comp_handler+0x20

Hi,

My system has been failing with recent kernels (4.12.x and 4.13-rc2)
with a NULL pointer dereference at the stack trace given at the end of
this email. This happens when simply running 'ib_write_bw -R <server>'
with a Chelsio T6 (cxgb4). I've bisected (log attached) to find the
offending commit to be:

commit 1e7710f3f6563940bb6bbc94aa8eadfd344a86af
Author: Matan Barak <matanb@...lanox.com>
  IB/core: Change completion channel to use the reworked objects schema

Reverting this commit (and the dependent commits db1b5ddd53365 and
e0fcc61113c that also fix other bugs with this commit) from v4.12.3
fixes the issue.

I did the bisect with the userspace libraries in Debian Stretch but I
also had this bug with rdma-core v14. I was pretty sure v4.12 kernels
worked for me in the past but likely only before I upgraded from Jessie
to Stretch.

Thanks,

Logan


PS. As a side rant, this bug was found after a very *frustrating* day of
what was supposed to be the 20 minute task of getting my RDMA cards
plugged in again. I tried with both CX4s and the T6s (and I'm still not
sure if my CX4s work yet). Instead, it turns out there's a whole mess of
bugs in the kernel I had to go up against. I went back and forth between
different versions of the userspace libraries because I was sure 4.11
worked -- but it turned out 4.11.10+, 4.12.x and who knows what other
stable kernels are currently broken by the bug fixed in [1]. And there
was a whole other bug that broke things that was fixed in the 4.12-rc
series that I had to carefully bisect around to find the one reported
above. So frustrating!!

[1] 5a7a88f1b488e4ee49eb3d5b82612d4d9ffdf2c3

--

[   53.320439] iwpm_register_pid: Unable to send a nlmsg (client = 2)
[   54.738579] BUG: unable to handle kernel NULL pointer dereference at
0000000000000058
[   54.747439] IP: _raw_spin_lock_irqsave+0x10/0x30
[   54.752719] PGD 0
[   54.752721] P4D 0
[   54.755049]
[   54.759109] Oops: 0002 [#1] SMP
[   54.762699] Modules linked in:
[   54.766195] CPU: 0 PID: 5 Comm: kworker/u16:0 Not tainted
4.13.0-rc2.direct #708
[   54.774536] Hardware name: Supermicro SYS-7047GR-TRF/X9DRG-QF, BIOS
3.0a 12/05/2013
[   54.783182] Workqueue: iw_cxgb4 process_work
[   54.788036] task: ffff880276a5ee80 task.stack: ffffc900000c4000
[   54.794728] RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30
[   54.800552] RSP: 0018:ffffc900000c7c70 EFLAGS: 00010046
[   54.806473] RAX: 0000000000000000 RBX: 0000000000000002 RCX:
0000000000000000
[   54.814524] RDX: 0000000000000001 RSI: 0000000000000058 RDI:
0000000000000058
[   54.822583] RBP: ffff880470484600 R08: 0000000000000001 R09:
0000000000000001
[   54.830663] R10: 0000000000000040 R11: ffff88047420b400 R12:
0000000000000282
[   54.838744] R13: ffffc900000c7dc0 R14: 0000000000000001 R15:
ffff880470484600
[   54.846825] FS:  0000000000000000(0000) GS:ffff880277c00000(0000)
knlGS:0000000000000000
[   54.855997] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   54.862522] CR2: 0000000000000058 CR3: 0000000001e0a000 CR4:
00000000000406f0
[   54.870602] Call Trace:
[   54.873442]  ? ib_uverbs_comp_handler+0x20/0xe0
[   54.878610]  ? flush_qp+0x6e/0x2b0
[   54.882514]  ? c4iw_modify_qp+0x11c2/0x1870
[   54.887295]  ? close_con_rpl+0xe7/0x170
[   54.891686]  ? kfree_skb+0x33/0x90
[   54.895592]  ? skb_dequeue+0x52/0x60
[   54.899690]  ? process_work+0x4a/0x60
[   54.903887]  ? process_one_work+0x1c2/0x3e0
[   54.908664]  ? worker_thread+0x47/0x3d0
[   54.913056]  ? kthread+0xfc/0x130
[   54.916864]  ? create_worker+0x180/0x180
[   54.921353]  ? kthread_create_on_node+0x40/0x40
[   54.926521]  ? ret_from_fork+0x22/0x30
[   54.930811] Code: c0 74 05 e8 b3 1c 73 ff 48 89 d8 5b c3 0f 1f 40 00
66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 53 9c 5b fa 31 c0 ba 01 00
00 00 <f0> 0f b1 17 85 c0 75 05 48 89 d8 5b c3 89 c6 e8 9c 09 73 ff 48
[   54.952099] RIP: _raw_spin_lock_irqsave+0x10/0x30 RSP: ffffc900000c7c70
[   54.959598] CR2: 0000000000000058
[   54.963405] ---[ end trace 896cfe0234c949d2 ]---
[  102.633421] random: crng init done


View attachment "bisect.log" of type "text/x-log" (2792 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ