lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4E48390E.9050102@msgid.tls.msk.ru>
Date:	Mon, 15 Aug 2011 01:07:26 +0400
From:	Michael Tokarev <mjt@....msk.ru>
To:	Tao Ma <tm@....ma>
CC:	linux-ext4@...r.kernel.org, sandeen@...hat.com,
	Jan Kara <jack@...e.cz>
Subject: Re: DIO process stuck apparently due to dioread_nolock (3.0)

15.08.2011 00:57, Michael Tokarev пишет:
> 13.08.2011 20:02, Tao Ma wrote:
>> From: Tao Ma <boyu.mt@...bao.com>
>>
>> Hi Michael,
>> 	could you please check whether this patch work for you?
> 
> With this patch applied to 3.0.1 I can't trigger the issue anymore,
> after several attempts -- the system just works as it shold be.
> I'm not sure this is the right fix or it's just my testcase isn't
> as good as it can be... ;)

Well, I found a way to trigger data corruption with this patch
applied.  I guess it's not fault of this patch, but some more
deep problem instead.

The sequence is my usual copy of an oracle database from another
place and start it.  When oracle starts doing it's direct-I/O
against its redologs, we had problem which is now solved.  But
now I do the following: I shutdown the database, rename the current
redologs out of the way and copy them back into place as new files.
And start the database again.

This time, oracle complains that the redologs contains garbage.
I can reboot the machine now, and compare old (renamed) redologs
with copies - they're indeed different.

My guess is that copy is done from the pagecache - from the old
contents of the files, somehow ignoring the (direct) writes
performed by initial database open.  But that copy is somehow
damaged now too, since even file identification is now different.

Is this new issue something that dioread_nolock supposed to create?
I mean, it isn't entirely clear what it supposed to do, it looks
somewhat hackish, but without it performance is quite bad.

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ