linux-ext4 - Re: DIO process stuck apparently due to dioread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4E488625.609@tao.ma>
Date:	Mon, 15 Aug 2011 10:36:21 +0800
From:	Tao Ma <tm@....ma>
To:	Michael Tokarev <mjt@....msk.ru>
CC:	linux-ext4@...r.kernel.org, sandeen@...hat.com,
	Jan Kara <jack@...e.cz>
Subject: Re: DIO process stuck apparently due to dioread_nolock (3.0)

On 08/15/2011 05:07 AM, Michael Tokarev wrote:
> 15.08.2011 00:57, Michael Tokarev пишет:
>> 13.08.2011 20:02, Tao Ma wrote:
>>> From: Tao Ma <boyu.mt@...bao.com>
>>>
>>> Hi Michael,
>>> 	could you please check whether this patch work for you?
>>
>> With this patch applied to 3.0.1 I can't trigger the issue anymore,
>> after several attempts -- the system just works as it shold be.
>> I'm not sure this is the right fix or it's just my testcase isn't
>> as good as it can be... ;)
Thanks for the test.
> 
> Well, I found a way to trigger data corruption with this patch
> applied.  I guess it's not fault of this patch, but some more
> deep problem instead.
> 
> The sequence is my usual copy of an oracle database from another
> place and start it.  When oracle starts doing it's direct-I/O
> against its redologs, we had problem which is now solved.  But
> now I do the following: I shutdown the database, rename the current
> redologs out of the way and copy them back into place as new files.
> And start the database again.
> 
> This time, oracle complains that the redologs contains garbage.
> I can reboot the machine now, and compare old (renamed) redologs
> with copies - they're indeed different.
> 
> My guess is that copy is done from the pagecache - from the old
> contents of the files, somehow ignoring the (direct) writes
> performed by initial database open.  But that copy is somehow
> damaged now too, since even file identification is now different.
> 
> Is this new issue something that dioread_nolock supposed to create?
> I mean, it isn't entirely clear what it supposed to do, it looks
> somewhat hackish, but without it performance is quite bad.
So could I generalize your sequence like below:
1. copy a large file to a new ext4 volume
2. do some direct i/o read/write to this file(bs=512)
3. rename it.
4. cp this back to the original file
5. do direct i/o read/write(bs=512) now and the file is actually corrupted.

You used to meet with problem in step 2, and my patch resolved it. Now
you met with problems in step 5. Right?

Thanks
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html