[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <490C54D4.4030603@vlnb.net>
Date: Sat, 01 Nov 2008 16:08:36 +0300
From: Vladislav Bolkhovitin <vst@...b.net>
To: Nick Piggin <nickpiggin@...oo.com.au>
CC: linux-fsdevel@...r.kernel.org, viro@...iv.linux.org.uk,
linux-kernel@...r.kernel.org,
James Bottomley <James.Bottomley@...senPartnership.com>,
scst-devel <scst-devel@...ts.sourceforge.net>
Subject: Re: FS corruption after I/O errors
Vladislav Bolkhovitin wrote:
> Nick Piggin wrote:
>> On Wednesday 29 October 2008 06:38, Vladislav Bolkhovitin wrote:
>>> Nick Piggin wrote:
>>>> On Saturday 25 October 2008 03:10, Vladislav Bolkhovitin wrote:
>>>>> Hi,
>>>>>
>>>>> During recent debugging session of my SCSI target SCST
>>>>> (http://scst.sf.net) I noticed many
>>>>>
>>>>> WARNING: at fs/buffer.c:1186 mark_buffer_dirty+0x51/0x66()
>>>>>
>>>>> messages in kernel log on the initiator. I attached the full log of
>>>>> several of them.
>>>>>
>>>>> My target was buggy and I was working on fixing it, but I suppose Linux
>>>>> should handle such failures more gracefully. In all the cases the target
>>>>> had one type of failure: it "ate" a SCSI command and never returned
>>>>> result of it.
>>>> Right. This is one of the warnings I see in my fault-injection testing.
>>>> It is fixed by my patch to clean up and improve the page and buffer
>>>> error handling in the vm/fs.
>>> Can you specify which patch you referring? Is it in 2.6.27?
>> It's just an RFC at the moment which I posted to fsdevel. Not in 2.6.27.
>
> I see. I'm looking forward to see it in 2.6.28 or .29. This is really a
> needed work.
>
> BTW, have you even seen in your fault-injection testing that after
> receiving a failure from a SCSI device during heavy load ext3 file
> system mounted on it gets corrupted and journal replay on remount
> doesn't repair it, only manual e2fsck helps? I've many times seen that,
> including cases when the target was remaining up and fully functional.
> See, e.g., "MOANING MODE ON" part in
> http://marc.info/?l=linux-scsi&m=121932252324432&w=2. I haven't checked
> that case since then, although I see such corruptions quite often. But
> in all them I can't so clearly say that it isn't a target's failure.
I've just checked it with 2.6.27. The situation greatly improved and
dbench was able to complete several runs under constant TASK_ABORTED
"bombarding" (TASK RESET task management commands using "sg_reset -b"
each 31 seconds from another "connection" to that device via qla2xxx
initiator driver. You can see those resets in the attached log). But
when then I unmounted the affected partition, e2fsck found errors on it.
See attachments for details. The target all the times was fine and
completely healthy.
View attachment "dbench" of type "text/plain" (17885 bytes)
View attachment "kernel" of type "text/plain" (38110 bytes)
Powered by blists - more mailing lists