[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20170611154002.02c0dabb@Vantage.cJ>
Date: Sun, 11 Jun 2017 15:51:42 -0400
From: Jérôme Carretero <cJ-ko@...gloub.eu>
To: 黃清隆 <ching2048@...ca.com.tw>
Cc: linux-scsi@...r.kernel.org, linux-kernel@...r.kernel.org,
billion.wu@...ca.com.tw
Subject: arcmsr: during abort device command, can't access other drives
Hi Ching,
When a drive finally failed in my JBOD array, I discovered that the
whole ARC1880X controller would timeout, disabling access to any drive,
which is kind of sad.
I've performed a firmware upgrade and added back the failing drive to see
what happens with a newer device firmware, and it's the same thing.
Kernel: 4.12.0-rc4-00310-g6b7ed4588ce6.
Test scenario:
- 8-drive array configured in JBOD, TLER disabled
- one shell with dd if=/dev/${FAILING_DRIVE} of=/dev/null
- one shell with dd if=/dev/${ANOTHER_DRIVE} of=/dev/null
- observe kernel logs and disk activity
Expected result: while the failing drive is timing out, access to other
disks is maintained.
Actual result: access to the other disks is suspended during
the error handling sequence.
[ 1818.969326] sd 0:0:0:4: [sdac] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 1818.977700] sd 0:0:0:4: [sdac] tag#0 Sense Key : Medium Error [current]
[ 1818.984411] sd 0:0:0:4: [sdac] tag#0 Add. Sense: Unrecovered read error
[ 1818.991045] sd 0:0:0:4: [sdac] tag#0 CDB: Read(10) 28 00 00 04 36 00 00 02 00 00
[ 1818.998445] blk_update_request: I/O error, dev sdac, sector 275968
[ 1899.118465] arcmsr0: abort device command of scsi id = 0 lun = 4
[ 1901.858516] arcmsr0: abort device command of scsi id = 0 lun = 4
[ 1904.591622] arcmsr: executing bus reset eh.....num_resets = 2, num_aborts = 6
[ 1928.608091] arcmsr0: wait 'abort all outstanding command' timeout
[ 1928.614241] arcmsr0: executing hw bus reset .....
[ 1942.137500] arcmsr0: wait 'get adapter firmware miscellaneous data' timeout
[ 1966.216936] arcmsr0: wait 'start adapter background rebulid' timeout
[ 1966.244943] arcmsr: scsi bus reset eh returns with success
[ 2008.028613] arcmsr: executing bus reset eh.....num_resets = 3, num_aborts = 6
[ 2029.344279] sd 0:0:0:4: [sdac] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 2029.352660] sd 0:0:0:4: [sdac] tag#3 Sense Key : Medium Error [current]
[ 2029.359368] sd 0:0:0:4: [sdac] tag#3 Add. Sense: Unrecovered read error
[ 2029.366059] sd 0:0:0:4: [sdac] tag#3 CDB: Read(10) 28 00 00 04 30 00 00 08 00 00
[ 2029.373483] blk_update_request: I/O error, dev sdac, sector 274432
[ 2033.094134] sd 0:0:0:4: [sdac] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 2033.102539] sd 0:0:0:4: [sdac] tag#2 Sense Key : Medium Error [current]
[ 2033.109257] sd 0:0:0:4: [sdac] tag#2 Add. Sense: Unrecovered read error
[ 2033.115887] sd 0:0:0:4: [sdac] tag#2 CDB: Read(10) 28 00 00 04 38 00 00 08 00 00
[ 2033.123326] blk_update_request: I/O error, dev sdac, sector 276480
[ 2037.435775] sd 0:0:0:4: [sdac] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 2037.444157] sd 0:0:0:4: [sdac] tag#0 Sense Key : Medium Error [current]
[ 2037.450876] sd 0:0:0:4: [sdac] tag#0 Add. Sense: Unrecovered read error
[ 2037.457508] sd 0:0:0:4: [sdac] tag#0 CDB: Read(10) 28 00 00 04 36 20 00 00 08 00
[ 2037.464917] blk_update_request: I/O error, dev sdac, sector 276000
[ 2037.471106] Buffer I/O error on dev sdac, logical block 34500, async page read
Regards,
--
Jérôme
Powered by blists - more mailing lists