linux-kernel - Kernel 2.6.20.4: Software RAID 5: ata13.00: (irq_stat 0x00020002, failed to transmit command FIS)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0704051243080.3810@p34.internal.lan>
Date:	Thu, 5 Apr 2007 12:47:40 -0400 (EDT)
From:	Justin Piszcz <jpiszcz@...idpixels.com>
To:	linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org,
	linux-scsi@...r.kernel.org, linux-raid@...r.kernel.org
Subject: Kernel 2.6.20.4: Software RAID 5: ata13.00: (irq_stat 0x00020002,
 failed to transmit command FIS)

Had a quick question, this is the first time I have seen this happen, and 
it was not even under during heavy I/O, hardly anything was going on with 
the box at the time.

Any idea what could have caused this?  I am running a badblocks test right 
now, but so far the disk looks OK.

[369143.916093] ata13.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[369143.916100] ata13.00: (irq_stat 0x00020002, failed to transmit command FIS)
[369143.916107] ata13.00: cmd ca/00:00:97:1a:d5/00:00:00:00:00/e9 tag 0 cdb 0x0 data 131072 out
[369143.916109]          res 93/37:00:00:00:00/00:00:40:00:93/00 Emask 0x12 (ATA bus error)
[369143.916116] ata13: hard resetting port
[369146.145915] ata13: softreset failed (port not ready)
[369146.145922] ata13: follow-up softreset failed, retrying in 5 secs
[369151.146035] ata13: hard resetting port
[369153.376736] ata13: softreset failed (port not ready)
[369153.376743] ata13: follow-up softreset failed, retrying in 5 secs
[369158.376664] ata13: hard resetting port
[369160.608025] ata13: softreset failed (port not ready)
[369160.608033] ata13: reset failed, giving up
[369160.608036] ata13.00: disabled
[369160.608043] ata13: EH pending after completion, repeating EH (cnt=4)
[369160.718365] ata13: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0x6 frozen
[369160.718370] ata13: (irq_stat 0x00060002, failed to transmit command FIS)
[369161.238432] ata13: waiting for device to spin up (8 secs)
[369168.715610] ata13: hard resetting port
[369170.946658] ata13: softreset failed (port not ready)
[369170.946666] ata13: follow-up softreset failed, retrying in 5 secs
[369175.946249] ata13: hard resetting port
[369178.167644] ata13: softreset failed (port not ready)
[369178.167651] ata13: follow-up softreset failed, retrying in 5 secs
[369183.167742] ata13: hard resetting port
[369185.398497] ata13: softreset failed (port not ready)
[369185.398504] ata13: reset failed, giving up
[369185.398522] sd 12:0:0:0: SCSI error: return code = 0x08000002
[369185.398526] sdl: Current [descriptor]: sense key: Aborted Command
[369185.398532]     Additional sense: Scsi parity error
[369185.398539] Descriptor sense data with sense descriptors (in hex):
[369185.398544]         72 0b 47 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
[369185.398572]         00 00 00 00 
[369185.398581] end_request: I/O error, dev sdl, sector 164960919
[369185.398586] raid5: Disk failure on sdl1, disabling device. Operation continuing on 3 devices
[369185.398617] sd 12:0:0:0: rejecting I/O to offline device
[369185.398625] ata13: EH complete
[369185.398635] ata13.00: detaching (SCSI 12:0:0:0)
[369185.398676] sd 12:0:0:0: SCSI error: return code = 0x00010000
[369185.398680] end_request: I/O error, dev sdl, sector 164961175
[369185.398702] raid5:md3: read error not correctable (sector 164962304 on sdl1).
[369185.398707] raid5:md3: read error not correctable (sector 164962312 on sdl1).
[369185.398711] raid5:md3: read error not correctable (sector 164962320 on sdl1).
[369185.398716] raid5:md3: read error not correctable (sector 164962328 on sdl1).
[369185.398760] Synchronizing SCSI cache for disk sdl: 
[369185.398784] FAILED
[369185.398785]   status = 0, message = 00, host = 4, driver = 00
[369185.398786]   <3>scsi 12:0:0:0: rejecting I/O to dead device
[369185.404619] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404641] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404662] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404682] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404686] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404691] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404712] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404732] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404753] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404774] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404794] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404815] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404844] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404863] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404882] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404900] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404918] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404937] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404956] scsi 12:0:0:0: rejecting I/O to dead device
[369185.413938] RAID5 conf printout:
[369185.413944]  --- rd:4 wd:3
[369185.413948]  disk 0, o:1, dev:sdi1
[369185.413950]  disk 1, o:1, dev:sdj1
[369185.413953]  disk 2, o:0, dev:sdl1
[369185.413956]  disk 3, o:1, dev:sdg1
[369185.418873] RAID5 conf printout:
[369185.418878]  --- rd:4 wd:3
[369185.418881]  disk 0, o:1, dev:sdi1
[369185.418884]  disk 1, o:1, dev:sdj1
[369185.418887]  disk 3, o:1, dev:sdg1

Relevant SMART data:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED 
WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always 
-       0
   3 Spin_Up_Time            0x0007   192   165   021    Pre-fail  Always 
-       3441
   4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always 
-       34
   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always 
-       0
   7 Seek_Error_Rate         0x000a   200   200   051    Old_age   Always 
-       0
   9 Power_On_Hours          0x0032   098   098   000    Old_age   Always 
-       1988
  10 Spin_Retry_Count        0x0012   100   253   051    Old_age   Always 
-       0
  11 Calibration_Retry_Count 0x0012   100   253   051    Old_age   Always 
-       0
  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always 
-       34
194 Temperature_Celsius     0x0022   120   107   000    Old_age   Always 
-       27
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always 
-       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always 
-       0
198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always 
-       0
199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always 
-       1
200 Multi_Zone_Error_Rate   0x0008   200   200   051    Old_age   Offline 
-       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining 
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      1977 
-
# 2  Short offline       Completed without error       00%      1954 
-
# 3  Short offline       Completed without error       00%      1931 
-
# 4  Short offline       Completed without error       00%      1908 
-
# 5  Short offline       Completed without error       00%      1861 
-
# 6  Short offline       Completed without error       00%      1814 
-
# 7  Short offline       Completed without error       00%      1791 
-
# 8  Short offline       Completed without error       00%      1768 
-
# 9  Short offline       Completed without error       00%      1745 
-
#10  Short offline       Completed without error       00%      1698 
-
#11  Short offline       Completed without error       00%      1651 
-
#12  Short offline       Completed without error       00%      1628 
-
#13  Short offline       Completed without error       00%      1605 
-
#14  Short offline       Completed without error       00%      1581 
-
#15  Short offline       Completed without error       00%      1535 
-
#16  Short offline       Completed without error       00%      1488 
-
#17  Short offline       Completed without error       00%      1464 
-
#18  Short offline       Completed without error       00%      1441 
-
#19  Short offline       Completed without error       00%      1418 
-
#20  Short offline       Completed without error       00%      1372 
-
#21  Short offline       Completed without error       00%      1326 
-

SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
     1        0        0  Not_testing
     2        0        0  Not_testing
     3        0        0  Not_testing
     4        0        0  Not_testing
     5        0        0  Not_testing
Selective self-test flags (0x0):
   After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute 
delay.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/