[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4748ACC7.4010509@free.fr>
Date: Sat, 24 Nov 2007 23:59:19 +0100
From: Laurent Riffard <laurent.riffard@...e.fr>
To: James Bottomley <James.Bottomley@...elEye.com>
CC: Hannes Reinecke <hare@...e.de>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org,
linux-scsi@...r.kernel.org
Subject: Re: 2.6.24-rc3-mm1: I/O error, system hangs
Le 24.11.2007 14:26, James Bottomley a écrit :
> On Sat, 2007-11-24 at 13:57 +0100, Laurent Riffard wrote:
>> Le 24.11.2007 07:42, James Bottomley a écrit :
>>> On Fri, 2007-11-23 at 18:52 +0100, Laurent Riffard wrote:
>>>> Le 23.11.2007 12:38, Hannes Reinecke a écrit :
[snip]
>>>> I can confirm : reverting commit 8655a546c83fc43f0a73416bbd126d02de7ad6c0
>>>> does fix the problem.
>>>>
>>>>>> Hmm. Weird. I'll have a look into it. Apparently I'll be returning an error where
>>>>>> I shouldn't. Checking ...
>>>>>>
>>>>> Ok, found it. We are blocking even special commands (ie requests with PREEMPT not set)
>>>>> when FAILFAST is set. Which is clearly wrong. The attached patch fixes this.
>>>> Sorry, it's not enough. 2.6.24-rc3-mm1 + your patch still hangs with I/O errors.
>>> I think the problem is the way we treat BLOCKED and QUIESCED (the latter
>>> is the state that the domain validation uses and which we cannot kill
>>> fastfail on). It's definitely wrong to kill fastfail requests when the
>>> state is QUIESCE.
>>>
>>> This patch (which is applied on top of Hannes original) separates the
>>> BLOCK and QUIESCE states correctly ... does this fix the problem?
>>
>> No, it doesn't help... (2.6.24-rc3-mm1 + your patch still has problems)
>
> OK, could you post dmesgs again, please. I actually tested this with an
> aic79xx card, and for me it does cause Domain Validation to succeed
> again.
James,
Here is a dmesg produced by 2.6.24-rc3-mm1 + your patch "separates the
BLOCK and QUIESCE states correctly" (http://lkml.org/lkml/2007/11/24/8).
How to reproduce :
- boot
- switch to a text console
- capture dmesg in a file, sync, etc. There are 3 I/O errors, but the
system does work.
- switch to X console, log in the Gnome Desktop, the system partially
hangs.
- switch back to a text console: dmesg(1) still works, it shows some
additonal I/O errors. At this point, any disk access makes the system
completely hung.
Additionnal data:
- the I/O errors always happen on the same blocks.
--
laurent
View attachment "dmesg-2.6.24-rc3-mm1-patched" of type "text/plain" (30536 bytes)
Powered by blists - more mailing lists