lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <48313BF3.3080605@caiway.nl>
Date:	Mon, 19 May 2008 10:36:03 +0200
From:	Jan Evert van Grootheest <j.e.van.grootheest@...way.nl>
To:	linux-ide@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: SATA disk dies and revives after boot

Hi,

Yesterday the below happened on my home xen server, dom0 is debian 
stable with ubuntu kernel 2.6.24-16-xen. Given that yesterday was 
sunday, there was not much going on (I guess that we were somewhere 
between church and home, so there really was not much going on).
After this, the disk does not respond to anything and needs a reboot to 
return to sanity. After that it may work for some period of time (days 
or weeks).
I recently ran a long smart test and that returned no errors. Also after 
a reboot the disk seems to be just fine (except I need to re-add to the 
RAID1 arrays). I've also had this disk connected to a promise 
controller. The same thing happened there.
Previously, using 2.6.18, it would do this as well.

May 18 13:06:15 quark kernel: [174871.044304] ata5.00: exception Emask 
0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
May 18 13:06:15 quark kernel: [174871.044353] ata5.00: cmd 
e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0
May 18 13:06:15 quark kernel: [174871.044355]          res 
40/00:00:01:01:80/00:00:00:00:00/00 Emask 0x4 (timeout)
May 18 13:06:15 quark kernel: [174871.044412] ata5.00: status: { DRDY }
May 18 13:06:20 quark kernel: [174876.082713] ata5: port is slow to 
respond, please be patient (Status 0xd0)
May 18 13:06:25 quark kernel: [174881.065279] ata5: soft resetting link
May 18 13:06:55 quark kernel: [174911.291301] ata5.00: qc timeout (cmd 0xec)
May 18 13:06:55 quark kernel: [174911.291337] ata5.00: failed to 
IDENTIFY (I/O error, err_mask=0x4)
May 18 13:06:55 quark kernel: [174911.291361] ata5.00: revalidation 
failed (errno=-5)
May 18 13:06:55 quark kernel: [174911.291384] ata5: failed to recover 
some devices, retrying in 5 secs
May 18 13:07:05 quark kernel: [174921.328542] ata5: port is slow to 
respond, please be patient (Status 0xd0)
May 18 13:07:10 quark kernel: [174926.312085] ata5: soft resetting link
May 18 13:07:40 quark kernel: [174956.537601] ata5.00: qc timeout (cmd 0xec)
May 18 13:07:40 quark kernel: [174956.537638] ata5.00: failed to 
IDENTIFY (I/O error, err_mask=0x4)
May 18 13:07:40 quark kernel: [174956.537662] ata5.00: revalidation 
failed (errno=-5)
May 18 13:07:40 quark kernel: [174956.537685] ata5: failed to recover 
some devices, retrying in 5 secs
May 18 13:07:50 quark kernel: [174966.580807] ata5: port is slow to 
respond, please be patient (Status 0xd0)
May 18 13:07:55 quark kernel: [174971.564289] ata5: soft resetting link
May 18 13:08:26 quark kernel: [175001.790832] ata5.00: qc timeout (cmd 0xec)
May 18 13:08:26 quark kernel: [175001.790867] ata5.00: failed to 
IDENTIFY (I/O error, err_mask=0x4)
May 18 13:08:26 quark kernel: [175001.790891] ata5.00: revalidation 
failed (errno=-5)
May 18 13:08:26 quark kernel: [175001.790914] ata5.00: disabled
May 18 13:08:31 quark kernel: [175007.327614] ata5: port is slow to 
respond, please be patient (Status 0xd0)
May 18 13:08:36 quark kernel: [175012.311144] ata5: soft resetting link
May 18 13:08:36 quark kernel: [175012.478592] ata5: EH complete
May 18 13:08:36 quark kernel: [175012.478684] sd 4:0:0:0: [sdb] Result: 
hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
May 18 13:08:36 quark kernel: [175012.478726] end_request: I/O error, 
dev sdb, sector 62412332
May 18 13:08:36 quark kernel: [175012.478751] md: super_written gets 
error=-5, uptodate=0
May 18 13:08:36 quark kernel: [175012.478777] raid1: Disk failure on 
sdb5, disabling device.

The ata/disk info from dmesg:
[    4.716187] sata_via 0000:00:0f.0: version 2.3
[    4.716429] sata_via 0000:00:0f.0: routed to hard irq line 10
[    4.720203] scsi3 : sata_via
[    4.721459] scsi4 : sata_via
[    4.721675] ata5: SATA max UDMA/133 cmd 0xd400 ctl 0xd000 bmdma 
0xcc08 irq 20
[    5.135540] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    5.299951] ata5.00: ATA-7: Maxtor 6Y080M0, YAR511W0, max UDMA/133
[    5.300033] ata5.00: 160086528 sectors, multi 16: LBA
[    5.315957] ata5.00: configured for UDMA/133
[    5.316356] sd 4:0:0:0: [sdb] 160086528 512-byte hardware sectors 
(81964 MB)
[    5.316448] sd 4:0:0:0: [sdb] Write Protect is off
[    5.316526] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    5.316543] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA
[    5.316684] sd 4:0:0:0: [sdb] 160086528 512-byte hardware sectors 
(81964 MB)
[    5.316772] sd 4:0:0:0: [sdb] Write Protect is off
[    5.316850] sd 4:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    5.316865] sd 4:0:0:0: [sdb] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA
[    5.316960]  sdb: sdb2 < sdb5 sdb6 sdb7 sdb8 >
[    5.404184] sd 4:0:0:0: [sdb] Attached SCSI disk

It is this sata controller:
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID 
Controller (rev 80)
    Subsystem: Micro-Star International Co., Ltd. K8T Neo 2 Motherboard
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
    Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
<TAbort- <MAbort- >SERR- <PERR-
    Latency: 128
    Interrupt: pin B routed to IRQ 20
    Region 0: I/O ports at dc00 [size=8]
    Region 1: I/O ports at d800 [size=4]
    Region 2: I/O ports at d400 [size=8]
    Region 3: I/O ports at d000 [size=4]
    Region 4: I/O ports at cc00 [size=16]
    Region 5: I/O ports at c800 [size=256]
    Capabilities: [c0] Power Management version 2
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 PME-Enable- DSel=0 DScale=0 PME-

This is the disk:
quark:~# hdparm -i /dev/sdb

/dev/sdb:

 Model=Maxtor 6Y080M0                          , FwRev=YAR511W0, 
SerialNo=Y236DHAC           
 Config={ Fixed }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=?16?
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=160086528
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5
 AdvancedPM=yes: disabled (255) WriteCache=enabled
 Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0:  ATA/ATAPI-1 
ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7

 * signifies the current active mode

Is there any other info that can help? Please ask.
I don't understand the error codes, so have no clue why or what fails.
I would welcome suggestions how to get this disk back online next time 
this happens. The other sata connection on this controller is unused, 
but the PATA at 0:0:f.1 is used, so if there's something I can do to the 
controller without disturbing the PATA... (I'm thinking power-down the 
disk and/or controller using the command line)

I'm not really keen on testing patches, because this is my home server 
and the rest of the family will not thank me for experimenting.

Thanks,
Jan Evert



Download attachment "smime.p7s" of type "application/x-pkcs7-signature" (5185 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ