linux-kernel - [BUG REPORT, 2.6.22] sata controler failure on nforce 2 chipset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <396712385.20080423011459@3d-io.com>
Date:	Wed, 23 Apr 2008 01:14:59 +0200
From:	speedy <speedy@...io.com>
To:	linux-kernel@...r.kernel.org
Subject: [BUG REPORT, 2.6.22] sata controler failure on nforce 2 chipset

Hello Linux kernel crew,

       [Consider this more as a datapoint then a bug report, as after
       one network and one sata/southbridge issues showing up
       interminnently, the ASRock motherboard involved will be
       scrapped for a different one]

       The integrated NVidia sata controller and/or the hard-drive has failed
       during operation with the following output:

Apr 22 23:36:54 backupserver kernel: [91202.294632]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 22 23:36:59 backupserver kernel: [91207.657630] ata2: port is slow to respond, please be patient (Status 0xd0)
Apr 22 23:37:04 backupserver kernel: [91212.331576] ata2: device not ready (errno=-16), forcing hardreset
Apr 22 23:37:04 backupserver kernel: [91212.331583] ata2: hard resetting port
Apr 22 23:37:09 backupserver kernel: [91217.874396] ata2: port is slow to respond, please be patient (Status 0x80)
Apr 22 23:37:14 backupserver kernel: [91222.368598] ata2: hard resetting port
Apr 22 23:37:19 backupserver kernel: [91227.911395] ata2: port is slow to respond, please be patient (Status 0x80)
Apr 22 23:37:24 backupserver kernel: [91232.405597] ata2: hard resetting port
Apr 22 23:37:29 backupserver kernel: [91237.948395] ata2: port is slow to respond, please be patient (Status 0x80)
Apr 22 23:37:59 backupserver kernel: [91267.370311] ata2: hard resetting port
Apr 22 23:38:04 backupserver kernel: [91272.373843] ata2.00: disabled
Apr 22 23:38:04 backupserver kernel: [91272.373858] ata2: EH complete
Apr 22 23:38:04 backupserver kernel: [91272.374653] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 22 23:38:04 backupserver kernel: [91272.374659] end_request: I/O error, dev sdb, sector 35277535
Apr 22 23:38:04 backupserver kernel: [91272.374682] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374706] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374726] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374745] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374765] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374785] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374805] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374825] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374844] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.374864] lost page write due to I/O error on md0
Apr 22 23:38:04 backupserver kernel: [91272.375058] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 22 23:38:04 backupserver kernel: [91272.375062] end_request: I/O error, dev sdb, sector 35278559
Apr 22 23:38:04 backupserver kernel: [91272.375096] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 22 23:38:04 backupserver kernel: [91272.375099] end_request: I/O error, dev sdb, sector 407240943
.
.
.

       Full /var/log/messages can be found on: http://87.230.23.147/messages_sata_crash.txt

       The two 500GB Samsung HD501LJ hard-drives were making resetting
       sounds in regular intervals, trying to recover from the error,
       unsucessfuly. The system was accessed via network/SSH and was
       shutdown "gracefully" via shutdown -h now.

       After restarting, the system seemingly continued to operate
       normaly without any apparent data loss.

       One thing of note is that the south-bridge was alarmingly hot
       to the touch (you could "burn your finger" on it) so I would
       attribute the problems to improper cooling of hardware.
       Previously the system had uptimes of 100+ days as a render farm
       master using Windows 2000 (mostly CPU/memory load, though).

       I won't be able to test the same system further as it's
       motherboard will be (promptly:p) exchanged.

       ps. Keep me in CC:, not following the list.

-- 
Best regards,
 speedy                          mailto:speedy@...io.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/