lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sat, 01 Nov 2008 16:08:36 +0300 From: Vladislav Bolkhovitin <vst@...b.net> To: Nick Piggin <nickpiggin@...oo.com.au> CC: linux-fsdevel@...r.kernel.org, viro@...iv.linux.org.uk, linux-kernel@...r.kernel.org, James Bottomley <James.Bottomley@...senPartnership.com>, scst-devel <scst-devel@...ts.sourceforge.net> Subject: Re: FS corruption after I/O errors Vladislav Bolkhovitin wrote: > Nick Piggin wrote: >> On Wednesday 29 October 2008 06:38, Vladislav Bolkhovitin wrote: >>> Nick Piggin wrote: >>>> On Saturday 25 October 2008 03:10, Vladislav Bolkhovitin wrote: >>>>> Hi, >>>>> >>>>> During recent debugging session of my SCSI target SCST >>>>> (http://scst.sf.net) I noticed many >>>>> >>>>> WARNING: at fs/buffer.c:1186 mark_buffer_dirty+0x51/0x66() >>>>> >>>>> messages in kernel log on the initiator. I attached the full log of >>>>> several of them. >>>>> >>>>> My target was buggy and I was working on fixing it, but I suppose Linux >>>>> should handle such failures more gracefully. In all the cases the target >>>>> had one type of failure: it "ate" a SCSI command and never returned >>>>> result of it. >>>> Right. This is one of the warnings I see in my fault-injection testing. >>>> It is fixed by my patch to clean up and improve the page and buffer >>>> error handling in the vm/fs. >>> Can you specify which patch you referring? Is it in 2.6.27? >> It's just an RFC at the moment which I posted to fsdevel. Not in 2.6.27. > > I see. I'm looking forward to see it in 2.6.28 or .29. This is really a > needed work. > > BTW, have you even seen in your fault-injection testing that after > receiving a failure from a SCSI device during heavy load ext3 file > system mounted on it gets corrupted and journal replay on remount > doesn't repair it, only manual e2fsck helps? I've many times seen that, > including cases when the target was remaining up and fully functional. > See, e.g., "MOANING MODE ON" part in > http://marc.info/?l=linux-scsi&m=121932252324432&w=2. I haven't checked > that case since then, although I see such corruptions quite often. But > in all them I can't so clearly say that it isn't a target's failure. I've just checked it with 2.6.27. The situation greatly improved and dbench was able to complete several runs under constant TASK_ABORTED "bombarding" (TASK RESET task management commands using "sg_reset -b" each 31 seconds from another "connection" to that device via qla2xxx initiator driver. You can see those resets in the attached log). But when then I unmounted the affected partition, e2fsck found errors on it. See attachments for details. The target all the times was fine and completely healthy. View attachment "dbench" of type "text/plain" (17885 bytes) View attachment "kernel" of type "text/plain" (38110 bytes)
Powered by blists - more mailing lists