linux-kernel - Re: sata_mv and Highpoint RocketRAID 230x, corruption?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4CC303FD.1000802@teksavvy.com>
Date:	Sat, 23 Oct 2010 11:49:17 -0400
From:	Mark Lord <kernel@...savvy.com>
To:	Mathias Burén <mathias.buren@...il.com>
CC:	linux-kernel@...r.kernel.org
Subject: Re: sata_mv and Highpoint RocketRAID 230x, corruption?

On 10-10-23 11:20 AM, Mathias Burén wrote:
> Hi,
>
> Interesting, as the badblocks program doesn't think these sectors are
> bad. Can I test them any other way?
..
> On 23 October 2010 16:19, Mark Lord<kernel@...savvy.com>  wrote:
>> On 10-10-23 08:57 AM, Mathias Burén wrote:
..
>>> ata2.00: status: { DRDY }
>>> ata2: hard resetting link
>>> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> ata2.00: configured for UDMA/133
>>> ata2.00: device reported invalid CHS sector 0
>>> sd 1:0:0:0: [sdb]  Result: hostbyte=0x00 driverbyte=0x08
>>> sd 1:0:0:0: [sdb]  Sense Key : 0xb [current] [descriptor]
>>> Descriptor sense data with sense descriptors (in hex):
>>>          72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
>>>          00 00 00 00
>>> sd 1:0:0:0: [sdb]  ASC=0x0 ASCQ=0x0
>>> sd 1:0:0:0: [sdb] CDB: cdb[0]=0x28: 28 00 e7 70 c8 e8 00 05 40 00
>>> end_request: I/O error, dev sdb, sector 3882928360
>>> md/raid:md0: read error not correctable (sector 3882926312 on sdb1).
>>> md/raid:md0: Disk failure on sdb1, disabling device.
>>
>>
>> No, that error looks like a real disk media error -- bad sector(s) on the drive.
>>
>> The BIOS issue merely gives corrupted data, not read errors.

MMm.. you're right.
I just now looked at the full dmesg you posted,
and those are NOT media errors.

It looks like NCQ commands are behaving strangely for some reason
in your 2.6.36 kernel.

Can you retest with, say, 2.6.34 ?
There were a number of sata_mv updates in between,
and I'm wondering if perhaps one of them broke something?

Or if you just want to stabilize things, then turn off NCQ.

Cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/