[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20181122142407.18304-1-poros@redhat.com>
Date: Thu, 22 Nov 2018 15:24:07 +0100
From: Petr Oros <poros@...hat.com>
To: netdev@...r.kernel.org
Cc: ivecera@...hat.com, davem@...emloft.net
Subject: [PATCH net] be2net: Fix NULL pointer dereference in be_tx_timeout()
The driver enumerates Tx queues in ndo_tx_timeout() handler, here is
possible race with be_update_queues. For this case we set carrier_off.
It prevents netdev watchdog to be fired after be_clear_queues().
The watchdog timeout doesn't make any sense here as we re-creating queues.
Reproducer:
We can reproduce bug with ethtool when changing queue count
ethtool -L $netif combined 1
ethtool -L $netif combined 32
If oops is not triggered imediately, just run it again or in loop.
Oops:
[ 865.768648] NETDEV WATCHDOG: enp4s0f0 (be2net): transmit queue 0 timed out
[ 865.775539] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x20d/0x220
[ 865.783796] Modules linked in: be2net intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel mei_me intel_cstate intel_uncore ipmi_ssif mei ipmi_si pcspkr sg i2c_i801 joydev lpc_ich intel_rapl_perf ipmi_devintf ioatdma ipmi_msghandler xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci libahci crc32c_intel drm serio_raw libata igb dca i2c_algo_bit wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: be2net]
[ 865.834289] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.20.0-rc3+ #2
[ 865.840640] Hardware name: Supermicro X9DBU/X9DBU, BIOS 3.2 01/15/2015
[ 865.847168] RIP: 0010:dev_watchdog+0x20d/0x220
[ 865.851612] Code: 00 49 63 4e e0 eb 92 4c 89 e7 c6 05 a5 de c9 00 01 e8 f7 b2 fc ff 89 d9 4c 89 e6 48 c7 c7 a0 d1 b2 99 48 89 c2 e8 7d b0 98 ff <0f> 0b eb c0 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66
[ 865.870358] RSP: 0018:ffff9bee73ac3e88 EFLAGS: 00010282
[ 865.875583] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000083f
[ 865.882707] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
[ 865.889832] RBP: ffff9bee5fa0045c R08: 0000000000000824 R09: 0000000000000007
[ 865.896956] R10: 0000000000000000 R11: ffffffff9a3f162d R12: ffff9bee5fa00000
[ 865.904088] R13: 0000000000000003 R14: ffff9bee5fa00480 R15: 0000000000000020
[ 865.911214] FS: 0000000000000000(0000) GS:ffff9bee73ac0000(0000) knlGS:0000000000000000
[ 865.919298] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 865.925037] CR2: 00005580497ce040 CR3: 00000002cf60a004 CR4: 00000000000606e0
[ 865.932170] Call Trace:
[ 865.934626] <IRQ>
[ 865.936645] ? pfifo_fast_dequeue+0x160/0x160
[ 865.941005] call_timer_fn+0x2b/0x130
[ 865.944670] run_timer_softirq+0x3b9/0x3f0
[ 865.948768] ? tick_sched_timer+0x37/0x70
[ 865.952779] ? __hrtimer_run_queues+0x110/0x280
[ 865.957314] __do_softirq+0xdd/0x2fe
[ 865.960896] irq_exit+0xfa/0x100
[ 865.964125] smp_apic_timer_interrupt+0x74/0x140
[ 865.968745] apic_timer_interrupt+0xf/0x20
[ 865.972844] </IRQ>
[ 865.974953] RIP: 0010:cpuidle_enter_state+0xb0/0x320
[ 865.979915] Code: 89 c3 66 66 66 66 90 31 ff e8 0c 07 a6 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 46 02 00 00 31 ff e8 33 e0 ab ff fb 85 ed <0f> 88 1a 02 00 00 48 b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 48 39
[ 865.998661] RSP: 0018:ffffbc9ac19e7ea0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[ 866.006225] RAX: ffff9bee73ae1dc0 RBX: 000000c9938e11ae RCX: 000000000000001f
[ 866.013350] RDX: 000000c9938e11ae RSI: 00000000435e532a RDI: 0000000000000000
[ 866.020474] RBP: 0000000000000005 R08: 0000000000000002 R09: 0000000000021640
[ 866.027598] R10: 00009c434b946fde R11: ffff9bee73ae0e44 R12: ffffffff99d27538
[ 866.034723] R13: ffff9bee73aec628 R14: 0000000000000005 R15: 0000000000000000
[ 866.041860] do_idle+0x1f1/0x230
[ 866.045091] cpu_startup_entry+0x19/0x20
[ 866.049016] start_secondary+0x195/0x1e0
[ 866.052943] secondary_startup_64+0xb6/0xc0
[ 866.057129] ---[ end trace dead88c26bcd8261 ]---
[ 866.061750] be2net 0000:04:00.0: TXQ Dump: 0 H: 0 T: 0 used: 0, qid: 0x2
[ 866.068452] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 866.076273] PGD 0 P4D 0
[ 866.078810] Oops: 0000 [#1] SMP PTI
[ 866.082305] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W 4.20.0-rc3+ #2
[ 866.090041] Hardware name: Supermicro X9DBU/X9DBU, BIOS 3.2 01/15/2015
[ 866.096566] RIP: 0010:be_tx_timeout+0x7c/0x300 [be2net]
[ 866.101786] Code: 8b 45 1c 41 8b 4d 14 48 89 df 31 ed 45 8b 4d 18 48 c7 c6 80 51 2c c0 50 45 8b 45 10 8b 54 24 14 e8 09 a7 cb d8 4d 8b 7d 20 59 <41> 8b 0c af 45 8b 44 af 04 41 8b 74 af 0c 45 8b 4c af 08 89 ca 44
[ 866.120532] RSP: 0018:ffff9bee73ac3e38 EFLAGS: 00010246
[ 866.125758] RAX: 0000000000000000 RBX: ffff9bee72d6b0b0 RCX: 0000000000000002
[ 866.132882] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
[ 866.140014] RBP: 0000000000000000 R08: 000000000000084d R09: 0000000000000007
[ 866.147138] R10: 0000000000000000 R11: ffffffff9a3f162d R12: ffffffffc02c60ab
[ 866.154263] R13: ffff9bee5fa04b40 R14: ffffffffc02c613a R15: 0000000000000000
[ 866.161388] FS: 0000000000000000(0000) GS:ffff9bee73ac0000(0000) knlGS:0000000000000000
[ 866.169472] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 866.175210] CR2: 0000000000000000 CR3: 00000002cf60a004 CR4: 00000000000606e0
[ 866.182334] Call Trace:
[ 866.184781] <IRQ>
[ 866.186799] dev_watchdog+0x1e4/0x220
[ 866.190466] ? pfifo_fast_dequeue+0x160/0x160
[ 866.194825] call_timer_fn+0x2b/0x130
[ 866.198491] run_timer_softirq+0x3b9/0x3f0
[ 866.202590] ? tick_sched_timer+0x37/0x70
[ 866.206604] ? __hrtimer_run_queues+0x110/0x280
[ 866.211134] __do_softirq+0xdd/0x2fe
[ 866.214715] irq_exit+0xfa/0x100
[ 866.217948] smp_apic_timer_interrupt+0x74/0x140
[ 866.222566] apic_timer_interrupt+0xf/0x20
[ 866.226664] </IRQ>
[ 866.228762] RIP: 0010:cpuidle_enter_state+0xb0/0x320
[ 866.233727] Code: 89 c3 66 66 66 66 90 31 ff e8 0c 07 a6 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 46 02 00 00 31 ff e8 33 e0 ab ff fb 85 ed <0f> 88 1a 02 00 00 48 b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 48 39
[ 866.252465] RSP: 0018:ffffbc9ac19e7ea0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[ 866.260031] RAX: ffff9bee73ae1dc0 RBX: 000000c9938e11ae RCX: 000000000000001f
[ 866.267164] RDX: 000000c9938e11ae RSI: 00000000435e532a RDI: 0000000000000000
[ 866.274288] RBP: 0000000000000005 R08: 0000000000000002 R09: 0000000000021640
[ 866.281421] R10: 00009c434b946fde R11: ffff9bee73ae0e44 R12: ffffffff99d27538
[ 866.288545] R13: ffff9bee73aec628 R14: 0000000000000005 R15: 0000000000000000
[ 866.295680] do_idle+0x1f1/0x230
[ 866.298913] cpu_startup_entry+0x19/0x20
[ 866.302839] start_secondary+0x195/0x1e0
[ 866.306764] secondary_startup_64+0xb6/0xc0
[ 866.310948] Modules linked in: be2net intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel mei_me intel_cstate intel_uncore ipmi_ssif mei ipmi_si pcspkr sg i2c_i801 joydev lpc_ich intel_rapl_perf ipmi_devintf ioatdma ipmi_msghandler xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci libahci crc32c_intel drm serio_raw libata igb dca i2c_algo_bit wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: be2net]
[ 866.361432] CR2: 0000000000000000
[ 866.364748] ---[ end trace dead88c26bcd8262 ]---
[ 866.507013] RIP: 0010:be_tx_timeout+0x7c/0x300 [be2net]
[ 866.512234] Code: 8b 45 1c 41 8b 4d 14 48 89 df 31 ed 45 8b 4d 18 48 c7 c6 80 51 2c c0 50 45 8b 45 10 8b 54 24 14 e8 09 a7 cb d8 4d 8b 7d 20 59 <41> 8b 0c af 45 8b 44 af 04 41 8b 74 af 0c 45 8b 4c af 08 89 ca 44
[ 866.530980] RSP: 0018:ffff9bee73ac3e38 EFLAGS: 00010246
[ 866.536206] RAX: 0000000000000000 RBX: ffff9bee72d6b0b0 RCX: 0000000000000002
[ 866.543330] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
[ 866.550454] RBP: 0000000000000000 R08: 000000000000084d R09: 0000000000000007
[ 866.557578] R10: 0000000000000000 R11: ffffffff9a3f162d R12: ffffffffc02c60ab
[ 866.564710] R13: ffff9bee5fa04b40 R14: ffffffffc02c613a R15: 0000000000000000
[ 866.571835] FS: 0000000000000000(0000) GS:ffff9bee73ac0000(0000) knlGS:0000000000000000
[ 866.579920] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 866.585658] CR2: 0000000000000000 CR3: 00000002cf60a004 CR4: 00000000000606e0
[ 866.592784] Kernel panic - not syncing: Fatal exception in interrupt
[ 866.599179] Kernel Offset: 0x17a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Fixes: c1b3bdb2ffa9 ("be2net: gather debug info and reset adapter (only for Lancer) on a tx-timeout")
Signed-off-by: Petr Oros <poros@...hat.com>
---
drivers/net/ethernet/emulex/benet/be_main.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index c5ad7a4f4d83..02202c5e6794 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4700,8 +4700,11 @@ int be_update_queues(struct be_adapter *adapter)
struct net_device *netdev = adapter->netdev;
int status;
- if (netif_running(netdev))
+ if (netif_running(netdev)) {
+ /* prevent netdev watchdog during tx queue destroy */
+ netif_carrier_off(netdev);
be_close(netdev);
+ }
be_cancel_worker(adapter);
--
2.18.1
Powered by blists - more mailing lists