lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 22 Jan 2008 09:31:31 +0900
From:	Tejun Heo <htejun@...il.com>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Georgi Chulkov <g.chulkov@...obs-university.de>,
	linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org,
	Mark Lord <liml@....ca>
Subject: Re: ATA device reset, shoud I be concerned?

Hello,

Alan Cox wrote:
>> I still don't think it's worth the trouble.  There's currently only one
>> reported device which forgets to raise IRQ on media error.  The behavior
> 
> Most people wouldn't realise what is going on.

Yeap, true but I don't think we have many timeouts due to media errors.
 I've seen lots of SMART logs for drives which caused timeouts but
haven't seen any which logged related media errors.

>>> Old IDE says it works for PATA. For SATA I can see it might need more
>>> care and you might simply not be able to get the info.
>> Old IDE often locks up the machine hard after timeouts.  I'm all for
> 
> The code paths are racy - it didn't use to in 2.4 (except for the promise
> drain bug)

My jmicron locks up hard under certain conditions.  I haven't
investigated it too deep but it looks like a hard lockup (controller
dying while holding PCI bus).  NMI watchdog doesn't work afterwards.

>> gathering more info but benefit vs. risk equation just doesn't look good
>> here.  Why take risk for a rare device which forgets to raise IRQ on
>> media error?  If such behavior is wide spread among PATA drives && we
>> can verify that TF register access after timeout is safe for PATA
>> controllers, sure, but currently we aren't sure about either.
> 
> We lose IRQs in lots of other cases. Promise PATA is particularly bad at
> forgetting to give us the completion interrupt.

In that case, completing commands after 30secs doesn't really help as
long as normal operation can be recovered afterward.  The driver should
take measures against lost interrupts like polling for interrupts after
a while.  Those are two different problems and require different almost
opposite solutions.  Some controllers need registers polled once in a
while while others die when registers are read unexpectedly.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists