linux-kernel - Re: Intel ICH9M/M-E SATA error-handling/reset problems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200902151141.44367.mirsev@cicese.mx>
Date:	Sun, 15 Feb 2009 11:41:44 -0800
From:	Serguei Miridonov <mirsev@...ese.mx>
To:	Robert Hancock <hancockrwd@...il.com>
Cc:	linux-kernel@...r.kernel.org, Jeff Garzik <jeff@...zik.org>
Subject: Re: Intel ICH9M/M-E SATA error-handling/reset problems

On Sunday 15 February 2009, Robert Hancock wrote:
> Serguei Miridonov wrote:
> > On Saturday 14 February 2009, Robert Hancock wrote:
> >> Serguei Miridonov wrote:
> > ... something like 10
> > errors per 2GB transfer can not be the reason to give up. Vista,
> > at least, recovers and continues the data transfer. Linux simply
> > can not return the interface or connected device into operating
> > mode. Do you think it is normal?
>
> Could be that Linux is being a bit more aggressive on error
> handling. In your case, it looks like an error occurred, triggering
> a hard reset of the device, and the controller seemed unable to
> talk to the device afterwards. If the command had just been
> retried, maybe it would have worked better. However, doing that in
> general can cause issues since you don't know what the state of the
> link may be..

Hmm... I was sure there are general recommendations from chipset 
vendors regarding recovery procedures.

What is the behavior expected from a SATA connected device if it 
detects parity error in received data? I'm not familiar with PATA/SATA 
protocols but I suppose that it just doesn't send data to the physical 
disk for recording, asserts the error line and waits next command from 
the controller. If the data block was too big to keep it in the drive 
cache memory, it may also set number of successfully (physically) 
written bytes to prevent the software to send it again.

If the above is correct then the kernel should only log the error, do 
some housekeeping work for the controller and attempt to send data 
again. There is no need for hard reset right after first error.

Another question is how the drive reacts to hard reset... My error log 
shows that both drives do not like it for some reason - they stop 
responding sometimes, so may be some additional programming of drives 
is necessary after hard reset... Something which is done in BIOS after 
power on... I don't know...

Well, it becomes interesting... I've got datasheet for ICH9 but don't 
have a kernel driver source to check what messages in log file really 
mean. Could you point me a link to the uncompressed kernel tree where 
I can see source files?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/