[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <499CFC63.2070608@kernel.org>
Date: Thu, 19 Feb 2009 15:29:55 +0900
From: Tejun Heo <tj@...nel.org>
To: Serguei Miridonov <mirsev@...ese.mx>
CC: Robert Hancock <hancockrwd@...il.com>,
linux-kernel@...r.kernel.org, Jeff Garzik <jeff@...zik.org>
Subject: Re: Intel ICH9M/M-E SATA error-handling/reset problems
Hello, Serguei.
Serguei Miridonov wrote:
>>>> I agree with you completely. Nevertheless, something like 10
>>>> errors per 2GB transfer can not be the reason to give up. Vista,
>>>> at least, recovers and continues the data transfer. Linux simply
>>>> can not return the interface or connected device into operating
>>>> mode. Do you think it is normal?
>> Well, there isn't much point in keeping retrying if the same
>> command fails consecutively.
>
> I'm not talking about the _same_ transfer command. I mean intermittent
> errors, average 10 parity errors per 2GB file. Let me repeat myself
> from another post:
>
> ... my very strong opinion based just on general physics is that
> error rate on SATA can be (and will be) much higher than that one on
> PATA. PATA operates at lower frequencies and cables are much shorter.
> eSATA cables are longer and work at up to 3Gb/s. Moreover, consider
> all these consumer-grade connectors, cables, etc. So, CRC errors could
> be quite common and software needs to handle them properly to keep
> transfers fast and maintain the communication with a device.
The kernel doesn't give up after intermittent errors.
> And, remember USB bulk transfer? Who is taking care on CRC check and
> retries there?
What you're describing is already handled. No need to worry about it.
>> The problem was the broken speed down
>> logic, so all the retries failed and FS eventually received IO
>> failure. Should have been fixed with recent changes.
>
> Slow down may help to reduce amount of errors but it may happen that
> they can not be avoided completely.
>
>> In the log, ata2.00 went down after a timeout. The reset per-se
>> isn't the problem and is the RTTD after a timeout as the controller
>> and device states are unknown. The situations like yours in the
>> log often happens because an ATAPI device shuts down completely
>> after certain transmission problems. When this happens, there's
>> nothing much the driver can do and soft reboot wouldn't recover the
>> device either.
>
> So, this is the kernel job to keep things working, not break them :-)
Yeah, and other than the hardware quirkiness on your machine, it
already works fine.
>> But seeing you're on dv5, I think you might be experiencing
>> something else. Please take a look at the following bz.
>>
>> http://bugzilla.kernel.org/show_bug.cgi?id=12276
>
> Yes, I tried to suspend to RAM and when the laptop waked up it failed
> to communicate with the hard drive. So, I use hibernate instead.
Can you please try to take a look at the kernel log after the kernel
resumes and see whether you're actually seeing the same problem?
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists