linux-kernel - Re: [PATCH] Re: Linux v2.6.22-rc3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LFD.0.98.0706091054360.20321@woody.linux-foundation.org>
Date:	Sat, 9 Jun 2007 11:12:06 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Jeff Garzik <jeff@...zik.org>
cc:	Tejun Heo <htejun@...il.com>, Gregor Jasny <gjasny@...glemail.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-ide@...r.kernel.org, Alan Cox <alan@...rguk.ukuu.org.uk>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] Re: Linux v2.6.22-rc3

On Thu, 7 Jun 2007, Jeff Garzik wrote:
>
> Ack'ing the sata_promise change was easy, but with this one it would
> be nice to wait a bit before changing the core probe code that [now]
> every ATA setup goes through, based on a single bug report.

So what's the resolution? Right now this is apparently the reasong for 
two of our current regressions, and we need to solve it.

"Just wait" is not an option. The options are:
 - fix things
 - revert everything that caused the regressions

and quite frankly, I don't think the second alternative is palatable to 
anybody, including you.

> The values assist in detecting ghost devices (same device appearing
> on both master and slave) and TF register malfunctions, and I would
> appreciate not breaking _that_ so late in 2.6.22-rc for a single
> report.  Thankfully we have -some- ghost device prevention code
> elsewhere, but this is part of it.

Apparently this isn't even true. As far as I can tell, the old code used 
to wait for lbal to be 1, but if it never became one, it would just 
silently time out and not *do* anything about it. Later commands would 
"just work", and the device would go its merry ways.

IOW, the new code is *crap*. The old code was arguably not wonderful 
either, but because of its nature, the old code not working didn't 
actually matter all that much.

So the biggest change (and the one that broke peoples machines) is that 
the code that you claim matters, *never* apparently did matter, and now 
that we've made it matter (by returning an error, and aborting the SRST 
sequence as a failure), people are complaining.

I really think we should apply Tejun's patch. Add a delay there, by all 
means if you are nervous: but no, it shouldn't be 150ms. This is the 
_post_ reset handling, we've already done the 150ms wait for the reset!

In fact, if you look at ata_bus_post_reset(), you'll notice that for 
port0, we just do the "ata_wait_ready()" call. Tejun's patch just made us 
do the same (logical) thing for port1 too!  If you think it breaks due to 
ome timeout issue, then the port0 thing was already broken.

(And maybe we could make the 

	ap->ops->dev_select(ap, 1);
	rc = ata_wait_ready(ap, deadline);

instead be a

	ata_dev_select(ap, 1, 1, 1);

and you'd wait even more? But that's actually a bigger change than Tejun's 
thing, and it uses "ata_wait_idle()" rather than "ata_wait_ready()", and 
doesn't return any status.. I dunno. But right now, we have several known 
broken things, that apparently weren't broken before!)

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/