[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49D11BDD.70702@redhat.com>
Date: Mon, 30 Mar 2009 15:22:05 -0400
From: Rik van Riel <riel@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Ric Wheeler <rwheeler@...hat.com>,
"Andreas T.Auer" <andreas.t.auer_lkml_73537@...us.ath.cx>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
Theodore Tso <tytso@....edu>, Mark Lord <lkml@....ca>,
Stefan Richter <stefanr@...6.in-berlin.de>,
Jeff Garzik <jeff@...zik.org>,
Matthew Garrett <mjg59@...f.ucam.org>,
Andrew Morton <akpm@...ux-foundation.org>,
David Rees <drees76@...il.com>, Jesper Krogh <jesper@...gh.cc>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 2.6.29
Linus Torvalds wrote:
> On Mon, 30 Mar 2009, Ric Wheeler wrote:
>> Heat is a major killer of spinning drives (as is severe cold). A lot of times,
>> drives that have read errors only (not failed writes) might be fully
>> recoverable if you can re-write that injured sector.
>
> It's not worked for me, and yes, I've tried.
It's worked here. It would be nice to have a device mapper module
that can just insert itself between the disk and the higher device
mapper layer and "scrub" the disk, fetching unreadable sectors from
the other RAID copy where required.
> I'm sure it works for some "ok, the write just failed to take, and the CRC
> was bad" case, but that's apparently not what I've had. I suspect either
> the track markers got overwritten (and maybe a disk-specific low-level
> reformat would have helped, but at that point I was not going to trust the
> drive anyway, so I didn't care), or there was actual major physical damage
> due to heat and/or head crash and remapping was just not able to cope.
Maybe a stupid question, but aren't tracks so small compared to
the disk head that a physical head crash would take out multiple
tracks at once? (the last on I experienced here took out a major
part of the disk)
Another case I have seen years ago was me writing data to a disk
while it was still cold (I brought it home, plugged it in and
started using it). Once the drive came up to temperature, it
could no longer read the tracks it just wrote - maybe the disk
expanded by more than it is willing to seek around for tracks
due to thermal correction? Low level formatting the drive
made it work perfectly and I kept using it until it was just
too small to be useful :)
> And my point is, IT MAKES SENSE to just do the elevator barrier, _without_
> the drive command.
No argument there. I have seen NCQ starvation on SATA disks,
with some requests sitting in the drive for seconds, while
the drive was busy handling hundreds of requests/second
elsewhere...
--
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists