linux-kernel - Re: 2.6.29 regression: ATA bus errors on resume (output with debug patch)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4A19E721.9030103@kernel.org>
Date:	Mon, 25 May 2009 09:32:33 +0900
From:	Tejun Heo <tj@...nel.org>
To:	Niel Lambrechts <niel.lambrechts@...il.com>
CC:	"linux.kernel" <linux-kernel@...r.kernel.org>,
	Theodore Tso <tytso@....edu>
Subject: Re: 2.6.29 regression: ATA bus errors on resume (output with debug
 patch)

Hello,

Niel Lambrechts wrote:
> Bug triggered with your patch! I played audio while suspending to try
> and increase activity  (I also removed a CD on boot), and the filesystem
> came up dirty! This was on attempt nr. 3 or 4.

Great.

Here's the problem.

 May 23 12:15:11 linux-7vph kernel: XXX scsi_eh_flush_done_q: online=1(2) noretry=2 retries=0 allowed=5 

scsi_noretry_cmd() is returning non-zero indicating that the request
shouldn't be retried and failed immediagely.  Looks like the return
value 2 is from blk_failfast_dev() which tests REQ_FAILFAST_DEV.  It's
most likely to be set in init_request_from_bio() while translating bio
flags.

cc'ing Theodore Tso.  Hello, Neil is reporting ext4 checking out after
resuming.

  http://thread.gmane.org/gmane.linux.kernel/814466/focus=817937

The origin of the problem is ATA device triggering a PHY event after
resume sequence is complete.  I still don't know why this happens but
it does on certain machines.  This in itself shouldn't be a big
problem as the device works fine after one more pass of ATA EH and the
in-flight requests would be retried.  However, for some reason, the
aborted commands seem to have REQ_FAILFAST_DEV set thus failing
immediately which, in turn, triggers ext4 errors.  Does anything ring
a bell?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/