[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4A19E721.9030103@kernel.org>
Date: Mon, 25 May 2009 09:32:33 +0900
From: Tejun Heo <tj@...nel.org>
To: Niel Lambrechts <niel.lambrechts@...il.com>
CC: "linux.kernel" <linux-kernel@...r.kernel.org>,
Theodore Tso <tytso@....edu>
Subject: Re: 2.6.29 regression: ATA bus errors on resume (output with debug
patch)
Hello,
Niel Lambrechts wrote:
> Bug triggered with your patch! I played audio while suspending to try
> and increase activity (I also removed a CD on boot), and the filesystem
> came up dirty! This was on attempt nr. 3 or 4.
Great.
Here's the problem.
May 23 12:15:11 linux-7vph kernel: XXX scsi_eh_flush_done_q: online=1(2) noretry=2 retries=0 allowed=5
scsi_noretry_cmd() is returning non-zero indicating that the request
shouldn't be retried and failed immediagely. Looks like the return
value 2 is from blk_failfast_dev() which tests REQ_FAILFAST_DEV. It's
most likely to be set in init_request_from_bio() while translating bio
flags.
cc'ing Theodore Tso. Hello, Neil is reporting ext4 checking out after
resuming.
http://thread.gmane.org/gmane.linux.kernel/814466/focus=817937
The origin of the problem is ATA device triggering a PHY event after
resume sequence is complete. I still don't know why this happens but
it does on certain machines. This in itself shouldn't be a big
problem as the device works fine after one more pass of ATA EH and the
in-flight requests would be retried. However, for some reason, the
aborted commands seem to have REQ_FAILFAST_DEV set thus failing
immediately which, in turn, triggers ext4 errors. Does anything ring
a bell?
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists