linux-kernel - Re: Problem with ata layer in 2.6.24

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LNX.1.00.0801291202550.13593@iabervon.org>
Date:	Tue, 29 Jan 2008 13:14:15 -0500 (EST)
From:	Daniel Barkalow <barkalow@...ervon.org>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
cc:	Richard Heck <rgheck@...jweil.com>,
	Gene Heskett <gene.heskett@...il.com>,
	Zan Lynx <zlynx@....org>,
	Calvin Walton <calvin.walton@...il.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linux ide Mailing list <linux-ide@...r.kernel.org>
Subject: Re: Problem with ata layer in 2.6.24

On Tue, 29 Jan 2008, Alan Cox wrote:

> > not one problem but lots---is sufficiently widespread that a Mini HOWTO, 
> > say, would be really welcome and, I'm guessing, widely used.
> 
> We don't see very many libata problems at the distro level and they for
> the most part boil down to
> 
> - error messages looking different - Most bugs I get are things like
> media errors (timeout looks different, UNC report looks different)

The SCSI error reporting really ought to include a simple interpretation 
of the error for end users ("The drive doesn't support this command" "A 
sector's data got lost" "The drive timed out" "The drive failed" "The 
drive is entirely gone"). There's too much similarity between the message 
you get when you try a SMART test that doesn't apply to the drive and what 
you get when the drive is broken.

> - faulty hardware being picked up because we actually do real error
> checking now. We now check for and give some devices more slack while
> still doing error checking. Both IDE layers also added blacklists for
> stuff like the TSScorp DVD drives. Qemu has now had its bugs patched.

I think this is the big source of unhappy users (and, of course, they all 
look the same and the reports stay findable by Google, so it looks a lot 
worse than it is). People getting this problem in distro kernels probably 
really do want to have a way to report it with enough detail from logs to 
get it dealt with and then switch back to old IDE until the fix propagates 
through.

And it's possible that the error recovery is suboptimal in some cases. It 
seems to like resetting drives too much; perhaps if it keeps seeing the 
same problem and resetting the drive, it should decide that the drive's 
error reporting is just bad and just ignore that error like the old IDE 
did (but, in this case, after saying what it's doing).

	-Daniel
*This .sig left intentionally blank*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/