lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 Dec 2017 16:08:48 +0530
From:   Abdul Haleem <abdhalee@...ux.vnet.ibm.com>
To:     linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>
Cc:     linux-next <linux-next@...r.kernel.org>,
        linux-scsi <linux-scsi@...r.kernel.org>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        suganath-prabu.subramani@...adcom.com,
        chaitra.basappa@...adcom.com, mpe <mpe@...erman.id.au>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        sachinp <sachinp@...ux.vnet.ibm.com>, sim@...ux.vnet.ibm.com
Subject: [mainline] rcu stalls on CPU when unbinding mpt3sas driver

Hi,

Off late we are seeing cpu stalls messages while mpt3sas driver unbind
on powerpc machine for both mainline and linux-next kernels

Machine Type: Power 8 Bare-metal
Kernel version: 4.15.0-rc2
config: attached.
test: driver unbind

$ echo -n 0001:03:00.0 > /sys/bus/pci/drivers/mpt3sas/unbind
mpt3sas_cm0: removing handle(0x000a), sas_addr(0x500304801f080d00)
mpt3sas_cm0: removing : enclosure logical id(0x500304801f080d3f), slot(0)
mpt3sas_cm0: removing enclosure level(0x0000), connector name(     )
mpt3sas_cm0: removing handle(0x000b), sas_addr(0x500304801f080d01)
mpt3sas_cm0: removing : enclosure logical id(0x500304801f080d3f), slot(1)
mpt3sas_cm0: removing enclosure level(0x0000), connector name(     )
mpt3sas_cm0: removing handle(0x000c), sas_addr(0x500304801f080d02)
mpt3sas_cm0: removing : enclosure logical id(0x500304801f080d3f), slot(2)
mpt3sas_cm0: removing enclosure level(0x0000), connector name(     )
mpt3sas_cm0: removing handle(0x000d), sas_addr(0x500304801f080d03)
mpt3sas_cm0: removing : enclosure logical id(0x500304801f080d3f), slot(3)
mpt3sas_cm0: removing enclosure level(0x0000), connector name(     )
mpt3sas_cm0: removing handle(0x000e), sas_addr(0x500304801f080d04)
mpt3sas_cm0: removing : enclosure logical id(0x500304801f080d3f), slot(4)
mpt3sas_cm0: removing enclosure level(0x0000), connector name(     )
mpt3sas_cm0: removing handle(0x000f), sas_addr(0x500304801f080d3d)
mpt3sas_cm0: removing : enclosure logical id(0x500304801f080d3f), slot(12)
mpt3sas_cm0: removing enclosure level(0x0000), connector name(     )
sd 16:0:0:0: [sdb] Synchronizing SCSI cache
sd 16:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
sd 16:0:1:0: [sdc] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
sd 16:0:1:0: [sdc] tag#0 CDB: ATA command pass through(16) 85 06 2c 00 00 00 00 00 00 00 00 00 00 00 e5 00
sd 16:0:2:0: [sdd] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
sd 16:0:2:0: [sdd] tag#0 CDB: ATA command pass through(16) 85 06 2c 00 00 00 00 00 00 00 00 00 00 00 e5 00
sd 16:0:3:0: [sde] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
sd 16:0:3:0: [sde] tag#0 CDB: ATA command pass through(16) 85 06 2c 00 00 00 00 00 00 00 00 00 00 00 e5 00
sd 16:0:4:0: [sdf] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
sd 16:0:4:0: [sdf] tag#0 CDB: ATA command pass through(16) 85 06 2c 00 00 00 00 00 00 00 00 00 00 00 e5 00

few minutes after above command was executed, machine is flooded with rcu stalls messages.

INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 86-... } 44191221 jiffies s: 3445 root: 0x20/.
blocking rcu_node structures: l=1:80-95:0x40/.
Task dump for CPU 86:
sh              R  running task    10384 18136      1 0x00042086
Call Trace:
[c000007792d47370] [c000007933667200] 0xc000007933667200 (unreliable)
INFO: rcu_sched self-detected stall on CPU
	86-....: (50420459 ticks this GP) idle=0ae/140000000000001/0 softirq=11962/11962 fqs=24724293 
	 (t=50420460 jiffies g=80217 c=80216 q=36817447)
NMI backtrace for cpu 86
CPU: 86 PID: 18136 Comm: sh Not tainted 4.15.0-rc2-autotest #1
Call Trace:
[c000007792d46f20] [c00000000099b83c] dump_stack+0xb0/0xf4 (unreliable)
[c000007792d46f60] [c0000000009a43e4] nmi_cpu_backtrace+0x1a4/0x210
[c000007792d46ff0] [c0000000009a462c] nmi_trigger_cpumask_backtrace+0x1dc/0x220
[c000007792d47090] [c00000000002c7d0] arch_trigger_cpumask_backtrace+0x20/0x40
[c000007792d470b0] [c00000000017496c] rcu_dump_cpu_stacks+0xf4/0x158
[c000007792d47100] [c000000000173cb0] rcu_check_callbacks+0x8f0/0xb00
[c000007792d47230] [c00000000017c25c] update_process_times+0x3c/0x90
[c000007792d47260] [c0000000001921e4] tick_sched_handle.isra.13+0x44/0x80
[c000007792d47280] [c000000000192278] tick_sched_timer+0x58/0xb0
[c000007792d472c0] [c00000000017cd58] __hrtimer_run_queues+0xf8/0x330
[c000007792d47340] [c00000000017da74] hrtimer_interrupt+0xe4/0x280
[c000007792d47400] [c000000000022660] __timer_interrupt+0x90/0x270
[c000007792d47450] [c000000000022d30] timer_interrupt+0xa0/0xe0
[c000007792d47480] [c000000000009238] decrementer_common+0x158/0x160
--- interrupt: 901 at replay_interrupt_return+0x0/0x4
    LR = arch_local_irq_restore+0x74/0x90
[c000007792d47770] [c000003fb3185000] 0xc000003fb3185000 (unreliable)
[c000007792d47790] [c0000000009bb658] _raw_spin_unlock_irqrestore+0x38/0x60
[c000007792d477b0] [c00000000066f274] scsi_remove_target+0x204/0x270
[c000007792d47820] [d00000000fc72604] sas_rphy_remove+0x94/0xa0 [scsi_transport_sas]
[c000007792d47850] [d00000000fc745bc] sas_port_delete+0x4c/0x238 [scsi_transport_sas]
[c000007792d478b0] [d000000010e82990] mpt3sas_transport_port_remove+0x2d0/0x310 [mpt3sas]
[c000007792d47950] [d000000010e71ba0] _scsih_remove_device+0x100/0x2a0 [mpt3sas]
[c000007792d47a10] [d000000010e774d4] mpt3sas_device_remove_by_sas_address.part.44+0xb4/0x160 [mpt3sas]
[c000007792d47a70] [d000000010e77614] _scsih_expander_node_remove+0x94/0x170 [mpt3sas]
[c000007792d47af0] [d000000010e77a88] mpt3sas_expander_remove.part.46+0x398/0xe70 [mpt3sas]
[c000007792d47b90] [c00000000056a9c4] pci_device_remove+0x64/0x110
[c000007792d47bd0] [c00000000060fa74] device_release_driver_internal+0x1e4/0x2c0
[c000007792d47c20] [c00000000060d260] unbind_store+0x110/0x140
[c000007792d47c70] [c00000000060c2fc] drv_attr_store+0x3c/0x60
[c000007792d47c90] [c0000000003a03c4] sysfs_kf_write+0x64/0xa0
[c000007792d47cb0] [c00000000039f1b0] kernfs_fop_write+0x170/0x250
[c000007792d47d00] [c0000000002fd370] __vfs_write+0x40/0x200
[c000007792d47d90] [c0000000002fd748] vfs_write+0xc8/0x240
[c000007792d47de0] [c0000000002fda80] SyS_write+0x60/0x110
[c000007792d47e30] [c00000000000b8e0] system_call+0x58/0x6c

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre



View attachment "Hab-NV-config_with_NVMe" of type "text/plain" (88351 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ