lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.1.10.0901260708570.7084@p34.internal.lan>
Date:	Mon, 26 Jan 2009 07:15:36 -0500 (EST)
From:	Justin Piszcz <jpiszcz@...idpixels.com>
To:	Anton Petrusevich <casus@...us.us>
cc:	linux-ide@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: WD IDNF error



On Mon, 26 Jan 2009, Anton Petrusevich wrote:

> Hello,
>
>
> Google shows me you had a similar problem. I have a WDC WD3000HLFS-01G6U0 in
> one my server and this drive is used for oracle database, and it's already
> second time when I get error like this:
>
> Jan 26 11:13:31 rc03 kernel: [5131931.511985] ata2.00: status: { DRDY ERR }
> Jan 26 11:13:31 rc03 kernel: [5131931.511985] ata2.00: error: { IDNF }
>
> Several months ago when it happend first time I extensively checked the drive
> after reboot and it worked flawlessly, but database was corrupted. This time
> I just rebooted the server and overwrote one control file and everything goes
> normal. This error is actually bothers me as I don't actually like to restore
> database from backup which can be sometimes a little bit outdated.
>
> So, have you find a solution? My kernel is 2.6.26-1-amd64,  I read about NCQ,
> I see:
> Jan 26 14:24:16 rc03 kernel: [    3.759922] ata2.00: ATA-8: WDC
> WD3000HLFS-01G6U0, 04.04V01, max UDMA/133
> Jan 26 14:24:16 rc03 kernel: [    3.759995] ata2.00: 586072368 sectors, multi
> 16: LBA48 NCQ (depth 0/32)
>
> So, NCQ is already disabled.
> -- 
> Anton Petrusevich
>

Please see:
http://forums.storagereview.net/index.php?showtopic=27303&hl=premature

Other users are also reporting this, the fix for me was to replace my 
drives with WD RE3s.  After 9-10 RMAs and no fix, it is some bug in the 
kernel or drive firmware, after 6-12mos of RMA and corrupted files, broken 
raids and weeks of lost time dealing with this problem, I removed all my 
WD Velociraptors from production, replaced with WD RE3 and have not 
had a problem since.

Turning off NCQ fixes the problem with Raptor's and Velociraptors too, 
where NCQ doesn't work well with Linux for these particular drives; 
however, the DRDY ERR and IDNF errors-- there's something wrong with the 
Velociraptors in Linux, I'd suggest subscribing to that forum and adding 
your notes there, that is where I have been tracking the issue primarily.

I have cc'd linux-kernel and linux-ide here for further suggestions, 
unfortunately however, I do not know of anyone who works at WD who 
participates on these lists so I am not sure how to get the problem 
escalated. When I tried to open cases with WD about the problem, their 
solution was (again) to replace all the HDDs, etc.  As far as I am 
concerned the WD Velociraptor is defective, at least in Linux (both 
GLFS/HLFS) versions that I used and the 9-10 RMAs as well, I think there 
needs to be more bug reports etc and tracking on this issue, otherwise, it 
will probably never be solved.  However, if you have important data, I'd 
recommend not storing it on a VR unless you have it backed up elsewhere, 
and verify the backups regularly.

Justin.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ