lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Sun, 11 Jun 2017 15:51:42 -0400
From:   Jérôme Carretero <cJ-ko@...gloub.eu>
To:     黃清隆 <ching2048@...ca.com.tw>
Cc:     linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org,
        billion.wu@...ca.com.tw
Subject: arcmsr: during abort device command, can't access other drives

Hi Ching,


When a drive finally failed in my JBOD array, I discovered that the
whole ARC1880X controller would timeout, disabling access to any drive,
which is kind of sad.
I've performed a firmware upgrade and added back the failing drive to see
what happens with a newer device firmware, and it's the same thing.

Kernel: 4.12.0-rc4-00310-g6b7ed4588ce6.

Test scenario:

- 8-drive array configured in JBOD, TLER disabled
- one shell with dd if=/dev/${FAILING_DRIVE} of=/dev/null
- one shell with dd if=/dev/${ANOTHER_DRIVE} of=/dev/null
- observe kernel logs and disk activity

Expected result: while the failing drive is timing out, access to other
disks is maintained.

Actual result: access to the other disks is suspended during
the error handling sequence.


[ 1818.969326] sd 0:0:0:4: [sdac] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 1818.977700] sd 0:0:0:4: [sdac] tag#0 Sense Key : Medium Error [current] 
[ 1818.984411] sd 0:0:0:4: [sdac] tag#0 Add. Sense: Unrecovered read error
[ 1818.991045] sd 0:0:0:4: [sdac] tag#0 CDB: Read(10) 28 00 00 04 36 00 00 02 00 00
[ 1818.998445] blk_update_request: I/O error, dev sdac, sector 275968
[ 1899.118465] arcmsr0: abort device command of scsi id = 0 lun = 4
[ 1901.858516] arcmsr0: abort device command of scsi id = 0 lun = 4
[ 1904.591622] arcmsr: executing bus reset eh.....num_resets = 2, num_aborts = 6 
[ 1928.608091] arcmsr0: wait 'abort all outstanding command' timeout
[ 1928.614241] arcmsr0: executing hw bus reset .....
[ 1942.137500] arcmsr0: wait 'get adapter firmware                      miscellaneous data' timeout 
[ 1966.216936] arcmsr0: wait 'start adapter background                          rebulid' timeout 
[ 1966.244943] arcmsr: scsi bus reset eh returns with success
[ 2008.028613] arcmsr: executing bus reset eh.....num_resets = 3, num_aborts = 6 
[ 2029.344279] sd 0:0:0:4: [sdac] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 2029.352660] sd 0:0:0:4: [sdac] tag#3 Sense Key : Medium Error [current] 
[ 2029.359368] sd 0:0:0:4: [sdac] tag#3 Add. Sense: Unrecovered read error
[ 2029.366059] sd 0:0:0:4: [sdac] tag#3 CDB: Read(10) 28 00 00 04 30 00 00 08 00 00
[ 2029.373483] blk_update_request: I/O error, dev sdac, sector 274432
[ 2033.094134] sd 0:0:0:4: [sdac] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 2033.102539] sd 0:0:0:4: [sdac] tag#2 Sense Key : Medium Error [current] 
[ 2033.109257] sd 0:0:0:4: [sdac] tag#2 Add. Sense: Unrecovered read error
[ 2033.115887] sd 0:0:0:4: [sdac] tag#2 CDB: Read(10) 28 00 00 04 38 00 00 08 00 00
[ 2033.123326] blk_update_request: I/O error, dev sdac, sector 276480
[ 2037.435775] sd 0:0:0:4: [sdac] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 2037.444157] sd 0:0:0:4: [sdac] tag#0 Sense Key : Medium Error [current] 
[ 2037.450876] sd 0:0:0:4: [sdac] tag#0 Add. Sense: Unrecovered read error
[ 2037.457508] sd 0:0:0:4: [sdac] tag#0 CDB: Read(10) 28 00 00 04 36 20 00 00 08 00
[ 2037.464917] blk_update_request: I/O error, dev sdac, sector 276000
[ 2037.471106] Buffer I/O error on dev sdac, logical block 34500, async page read


Regards,

-- 
Jérôme

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ