[<prev] [next>] [day] [month] [year] [list]
Message-ID: <48313BF3.3080605@caiway.nl>
Date: Mon, 19 May 2008 10:36:03 +0200
From: Jan Evert van Grootheest <j.e.van.grootheest@...way.nl>
To: linux-ide@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: SATA disk dies and revives after boot
Hi,
Yesterday the below happened on my home xen server, dom0 is debian
stable with ubuntu kernel 2.6.24-16-xen. Given that yesterday was
sunday, there was not much going on (I guess that we were somewhere
between church and home, so there really was not much going on).
After this, the disk does not respond to anything and needs a reboot to
return to sanity. After that it may work for some period of time (days
or weeks).
I recently ran a long smart test and that returned no errors. Also after
a reboot the disk seems to be just fine (except I need to re-add to the
RAID1 arrays). I've also had this disk connected to a promise
controller. The same thing happened there.
Previously, using 2.6.18, it would do this as well.
May 18 13:06:15 quark kernel: [174871.044304] ata5.00: exception Emask
0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
May 18 13:06:15 quark kernel: [174871.044353] ata5.00: cmd
e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0
May 18 13:06:15 quark kernel: [174871.044355] res
40/00:00:01:01:80/00:00:00:00:00/00 Emask 0x4 (timeout)
May 18 13:06:15 quark kernel: [174871.044412] ata5.00: status: { DRDY }
May 18 13:06:20 quark kernel: [174876.082713] ata5: port is slow to
respond, please be patient (Status 0xd0)
May 18 13:06:25 quark kernel: [174881.065279] ata5: soft resetting link
May 18 13:06:55 quark kernel: [174911.291301] ata5.00: qc timeout (cmd 0xec)
May 18 13:06:55 quark kernel: [174911.291337] ata5.00: failed to
IDENTIFY (I/O error, err_mask=0x4)
May 18 13:06:55 quark kernel: [174911.291361] ata5.00: revalidation
failed (errno=-5)
May 18 13:06:55 quark kernel: [174911.291384] ata5: failed to recover
some devices, retrying in 5 secs
May 18 13:07:05 quark kernel: [174921.328542] ata5: port is slow to
respond, please be patient (Status 0xd0)
May 18 13:07:10 quark kernel: [174926.312085] ata5: soft resetting link
May 18 13:07:40 quark kernel: [174956.537601] ata5.00: qc timeout (cmd 0xec)
May 18 13:07:40 quark kernel: [174956.537638] ata5.00: failed to
IDENTIFY (I/O error, err_mask=0x4)
May 18 13:07:40 quark kernel: [174956.537662] ata5.00: revalidation
failed (errno=-5)
May 18 13:07:40 quark kernel: [174956.537685] ata5: failed to recover
some devices, retrying in 5 secs
May 18 13:07:50 quark kernel: [174966.580807] ata5: port is slow to
respond, please be patient (Status 0xd0)
May 18 13:07:55 quark kernel: [174971.564289] ata5: soft resetting link
May 18 13:08:26 quark kernel: [175001.790832] ata5.00: qc timeout (cmd 0xec)
May 18 13:08:26 quark kernel: [175001.790867] ata5.00: failed to
IDENTIFY (I/O error, err_mask=0x4)
May 18 13:08:26 quark kernel: [175001.790891] ata5.00: revalidation
failed (errno=-5)
May 18 13:08:26 quark kernel: [175001.790914] ata5.00: disabled
May 18 13:08:31 quark kernel: [175007.327614] ata5: port is slow to
respond, please be patient (Status 0xd0)
May 18 13:08:36 quark kernel: [175012.311144] ata5: soft resetting link
May 18 13:08:36 quark kernel: [175012.478592] ata5: EH complete
May 18 13:08:36 quark kernel: [175012.478684] sd 4:0:0:0: [sdb] Result:
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
May 18 13:08:36 quark kernel: [175012.478726] end_request: I/O error,
dev sdb, sector 62412332
May 18 13:08:36 quark kernel: [175012.478751] md: super_written gets
error=-5, uptodate=0
May 18 13:08:36 quark kernel: [175012.478777] raid1: Disk failure on
sdb5, disabling device.
The ata/disk info from dmesg:
[ 4.716187] sata_via 0000:00:0f.0: version 2.3
[ 4.716429] sata_via 0000:00:0f.0: routed to hard irq line 10
[ 4.720203] scsi3 : sata_via
[ 4.721459] scsi4 : sata_via
[ 4.721675] ata5: SATA max UDMA/133 cmd 0xd400 ctl 0xd000 bmdma
0xcc08 irq 20
[ 5.135540] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 5.299951] ata5.00: ATA-7: Maxtor 6Y080M0, YAR511W0, max UDMA/133
[ 5.300033] ata5.00: 160086528 sectors, multi 16: LBA
[ 5.315957] ata5.00: configured for UDMA/133
[ 5.316356] sd 4:0:0:0: [sdb] 160086528 512-byte hardware sectors
(81964 MB)
[ 5.316448] sd 4:0:0:0: [sdb] Write Protect is off
[ 5.316526] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 5.316543] sd 4:0:0:0: [sdb] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 5.316684] sd 4:0:0:0: [sdb] 160086528 512-byte hardware sectors
(81964 MB)
[ 5.316772] sd 4:0:0:0: [sdb] Write Protect is off
[ 5.316850] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 5.316865] sd 4:0:0:0: [sdb] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 5.316960] sdb: sdb2 < sdb5 sdb6 sdb7 sdb8 >
[ 5.404184] sd 4:0:0:0: [sdb] Attached SCSI disk
It is this sata controller:
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID
Controller (rev 80)
Subsystem: Micro-Star International Co., Ltd. K8T Neo 2 Motherboard
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 128
Interrupt: pin B routed to IRQ 20
Region 0: I/O ports at dc00 [size=8]
Region 1: I/O ports at d800 [size=4]
Region 2: I/O ports at d400 [size=8]
Region 3: I/O ports at d000 [size=4]
Region 4: I/O ports at cc00 [size=16]
Region 5: I/O ports at c800 [size=256]
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
This is the disk:
quark:~# hdparm -i /dev/sdb
/dev/sdb:
Model=Maxtor 6Y080M0 , FwRev=YAR511W0,
SerialNo=Y236DHAC
Config={ Fixed }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=?16?
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=160086528
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5
AdvancedPM=yes: disabled (255) WriteCache=enabled
Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0: ATA/ATAPI-1
ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7
* signifies the current active mode
Is there any other info that can help? Please ask.
I don't understand the error codes, so have no clue why or what fails.
I would welcome suggestions how to get this disk back online next time
this happens. The other sata connection on this controller is unused,
but the PATA at 0:0:f.1 is used, so if there's something I can do to the
controller without disturbing the PATA... (I'm thinking power-down the
disk and/or controller using the command line)
I'm not really keen on testing patches, because this is my home server
and the rest of the family will not thank me for experimenting.
Thanks,
Jan Evert
Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5185 bytes)
Powered by blists - more mailing lists