linux-kernel - Re: ATA device reset, shoud I be concerned?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4794D020.4060204@gmail.com>
Date:	Tue, 22 Jan 2008 02:02:24 +0900
From:	Tejun Heo <htejun@...il.com>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Georgi Chulkov <g.chulkov@...obs-university.de>,
	linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org,
	Mark Lord <liml@....ca>
Subject: Re: ATA device reset, shoud I be concerned?

Alan Cox wrote:
>> Can you elaborate a bit?  I don't really think completing a command
>> after 30sec timeout contributes a lot to driver stability.
> 
> Timeout, timeout, timeout, reset, timeout.. (repeat), failed I/O
> 
> This gives the end user no information about the fault, nor does it let
> the upper layers of SCSI and above distinguish between a random passing
> sulk and media errors which need the disk replacing.

I still don't think it's worth the trouble.  There's currently only one
reported device which forgets to raise IRQ on media error.  The behavior
is out of spec and rare.  I don't think it's a good idea to change EH
behavior for it.

>>> Should that not then be a per host flag ?
>> Yeah, that would be the best.  The problem is that there are several
>> different kinds of timeouts and we don't know which controller locks up
>> after which timeout and investigating them is really difficult.
> 
> PATA controllers don't lock up in that case so its quite easy. The one
> exception is if the device jams IORDY but in that case you are dead
> anyway the next I/O (except on a SIL680 which has a timer we could use).
> 
> Old IDE says it works for PATA. For SATA I can see it might need more
> care and you might simply not be able to get the info.

Old IDE often locks up the machine hard after timeouts.  I'm all for
gathering more info but benefit vs. risk equation just doesn't look good
here.  Why take risk for a rare device which forgets to raise IRQ on
media error?  If such behavior is wide spread among PATA drives && we
can verify that TF register access after timeout is safe for PATA
controllers, sure, but currently we aren't sure about either.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/