linux-kernel - Re: Linux v2.6.22-rc3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <465F6F48.8080804@gmail.com>
Date:	Fri, 01 Jun 2007 09:58:48 +0900
From:	Tejun Heo <htejun@...il.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Gregor Jasny <gjasny@...glemail.com>,
	Jeff Garzik <jgarzik@...ox.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-ide@...r.kernel.org, Alan Cox <alan@...rguk.ukuu.org.uk>
Subject: Re: Linux v2.6.22-rc3

Hello, Linus.  Sorry about late response.  I'm still recovering from jet
lag.

Linus Torvalds wrote:
> On Tue, 29 May 2007, Tejun Heo wrote:
>> Aieee, so the drive doesn't like the new SRST sequence.
> 
> It would appear that the old code largely ignored the SRST error entirely, 
> no?

Yes, we used to ignore some error conditions, which sometimes led to
very long timeout sequence after hotplugging under certain circumstances
- a few 30sec timeouts for the reset and another 30sec for the following
IDENTIFY (because reset didn't fail after the timeouts).

> If we *used* to do (in ata_bus_post_reset()):
> 
> 	if (dev0)
> 		ata_busy_sleep(ap, ATA_TMOUT_BOOT_QUICK, ATA_TMOUT_BOOT);
> 
> and you changed that to actually care about the return value:
> 
> 	if (dev0) {
> 		rc = ata_wait_ready(ap, deadline);
> 		if (rc && rc != -ENODEV)
> 			return rc;
> 	}
> 
> (in _two_ places). That change also changed the same "post_reset" handling 
> in a totally _different_ way: it used to do ata_busy_sleep() twice, now it 
> still does it twice, but it does it with the same "timeout" value, so if 
> the first one times out, then the second one won't be given any timeout AT 
> ALL!
> 
> And to make matters worse: the first timeout seems to be for ANOTHER PORT 
> ENTIRELY! So you seem to break port 1 even if the timeout happened on port 
> 0, as far as I can read that sequence. 

Yes, the whole operation uses single timeout.  If the given time runs
out, it fails for both devices on the channel.  This is because reset
timeout is channel wide.  After SRST, both devices are required to
respond in certain time (30secs max per spec).  If anyone of the device
doesn't respond, it isn't clear what we can or can't do with the bus -
reset protocol itself requires responding master, cable detection needs
both devices working, etc...

> So I think your ata_bus_post_reset() changes are rather suspect. The fact 
> that you don't change the timeout, and use the same deadline for two 
> different ports (and for multiple commands to the same port, afaik), seems 
> rather suspect. The old code also didn't care about failures in certain 
> phases of the reset sequence, and it appears that it did so for good 
> reason.

Gregor's cdrom has broken SRST support which is extremely rare and
broken.  The reason why it works flawlessly with the ide driver is
because the ide driver doesn't issue SRST during probing and libata
worked after 30sec timeout before reset-seq change because libata didn't
use to check reset failures in some cases and the device just worked
when issued commands even when it didn't report ready status after reset.

I'll try to add some code in the reset path and see where the device
fails reset protocol.  Hopefully, we can work around it.  If it doesn't,
maybe we'll need to issue IDENTIFY and see whether that works after
reset timeout.  :-(

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/