[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <499877C5.6090205@gmail.com>
Date:	Sun, 15 Feb 2009 14:15:01 -0600
From:	Robert Hancock <hancockrwd@...il.com>
To:	Serguei Miridonov <mirsev@...ese.mx>
CC:	linux-kernel@...r.kernel.org, Jeff Garzik <jeff@...zik.org>,
	Tejun Heo <tj@...nel.org>
Subject: Re: Intel ICH9M/M-E SATA error-handling/reset problems
Serguei Miridonov wrote:
> On Sunday 15 February 2009, Robert Hancock wrote:
>> Serguei Miridonov wrote:
>>> On Saturday 14 February 2009, Robert Hancock wrote:
>>>> Serguei Miridonov wrote:
>>> ... something like 10
>>> errors per 2GB transfer can not be the reason to give up. Vista,
>>> at least, recovers and continues the data transfer. Linux simply
>>> can not return the interface or connected device into operating
>>> mode. Do you think it is normal?
>> Could be that Linux is being a bit more aggressive on error
>> handling. In your case, it looks like an error occurred, triggering
>> a hard reset of the device, and the controller seemed unable to
>> talk to the device afterwards. If the command had just been
>> retried, maybe it would have worked better. However, doing that in
>> general can cause issues since you don't know what the state of the
>> link may be..
> 
> Hmm... I was sure there are general recommendations from chipset 
> vendors regarding recovery procedures.
> 
> What is the behavior expected from a SATA connected device if it 
> detects parity error in received data? I'm not familiar with PATA/SATA 
> protocols but I suppose that it just doesn't send data to the physical 
> disk for recording, asserts the error line and waits next command from 
> the controller. If the data block was too big to keep it in the drive 
> cache memory, it may also set number of successfully (physically) 
> written bytes to prevent the software to send it again.
In the case of a CRC error the error flag gets set and the transfer is 
aborted by whichever side detects it. In this case the entire transfer 
gets retried.
> 
> If the above is correct then the kernel should only log the error, do 
> some housekeeping work for the controller and attempt to send data 
> again. There is no need for hard reset right after first error.
Right now interface CRC error is considered an ATA bus error which 
always triggers a reset. It's possible this could be relaxed in some 
cases, but the issue is that if CRC errors are occurring the link may be 
in an invalid state which simply retrying the command will not clear.
Tejun, any thoughts?
> 
> Another question is how the drive reacts to hard reset... My error log 
> shows that both drives do not like it for some reason - they stop 
> responding sometimes, so may be some additional programming of drives 
> is necessary after hard reset... Something which is done in BIOS after 
> power on... I don't know...
The same hard reset is done (and generally has to be done) on driver 
initialization and when a drive is hot plugged, so it should work. 
However, if the link is having problems (and it obviously is, from the 
CRC errors) the drive may not receive the reset either.
> 
> Well, it becomes interesting... I've got datasheet for ICH9 but don't 
> have a kernel driver source to check what messages in log file really 
> mean. Could you point me a link to the uncompressed kernel tree where 
> I can see source files?
> 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git is 
likely the easiest place to view..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
