linux-kernel - Re: Intel ICH9M/M-E SATA error-handling/reset problems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sun, 15 Feb 2009 13:55:32 -0800
From:	Serguei Miridonov <mirsev@...ese.mx>
To:	Robert Hancock <hancockrwd@...il.com>
Cc:	linux-kernel@...r.kernel.org, Jeff Garzik <jeff@...zik.org>,
	Tejun Heo <tj@...nel.org>
Subject: Re: Intel ICH9M/M-E SATA error-handling/reset problems

On Sunday 15 February 2009, Robert Hancock wrote:
> Right now interface CRC error is considered an ATA bus error which
> always triggers a reset.

Well, my very strong opinion based just on general physics is that 
error rate on SATA can be (and will be) much higher than that one on 
PATA. PATA operates at lower frequencies and cables are much shorter. 
eSATA cables are longer and work at up to 3Gb/s. Moreover, consider 
all these consumer-grade connectors, cables, etc. So, CRC errors could 
be quite common and software needs to handle them properly to keep 
transfers fast and maintain the communication with a device.

> It's possible this could be relaxed in
> some cases, but the issue is that if CRC errors are occurring the
> link may be in an invalid state which simply retrying the command
> will not clear.

Let's think positively ;-). If CRC error occurs (in data or command 
sequence), the device just doesn't accept what it receives with the 
last transfer. So, it should wait what host says next. I think, before 
doing hard reset or whatever is necessary to completely restart the 
interface together with connected device - before doing that the 
kernel should try to check if link is up and the device is listenning. 
Why not to try a short request to let the device send something short 
in response?

> Tejun, any thoughts?
>
> > Another question is how the drive reacts to hard reset... My
> > error log shows that both drives do not like it for some reason -
> > they stop responding sometimes, so may be some additional
> > programming of drives is necessary after hard reset... Something
> > which is done in BIOS after power on... I don't know...
>
> The same hard reset is done (and generally has to be done) on
> driver initialization and when a drive is hot plugged, so it should
> work.

It depends... If hard reset is like a reboot for the driver firmware, 
it may take more that 30 seconds for Seagate external drive, though 
I'm not sure... Trying to push the interaface before the device is 
ready to receive commands may be considered by the drive as link 
problem and it may refuse to communicate. Well, again, I'm not 
familiar with this, just speculating...

> > ... Could you point me a link to the uncompressed
> > kernel tree where I can see source files?
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git is
> likely the easiest place to view..

Thank you, I'll take a look.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/