linux-ext4 - Re: fsx-linux loosing mmap() writes under memory pressure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200903051355.43909.nickpiggin@yahoo.com.au>
Date:	Thu, 5 Mar 2009 13:55:43 +1100
From:	Nick Piggin <nickpiggin@...oo.com.au>
To:	Jan Kara <jack@...e.cz>, Andrew Morton <akpm@...ux-foundation.org>
Cc:	linux-kernel@...r.kernel.org,
	user-mode-linux-devel@...ts.sourceforge.net,
	linux-ext4@...r.kernel.org
Subject: Re: fsx-linux loosing mmap() writes under memory pressure

On Thursday 05 March 2009 04:50:31 Jan Kara wrote:
> On Wed 04-03-09 16:55:35, Jan Kara wrote:
> > On Wed 04-03-09 15:51:09, Jan Kara wrote:
> > >   first, I'd like to point out that this has happened under UML so it
> > > can be just some obscure bug in that architecture but I belive it's
> > > worth debugging anyway. Now to the problem:
> > >   This has happened with today Linus's git snapshot. The filesystem is
> > > ext3 with *1KB* blocksize. I booted UML with 64MB of memory and run
> > > (these are test's from Andrew Morton's torture tests):
> > >   fsx-linux -l 8000000 /mnt/testfile
> > >   bash-shared-mapping -t 8 /mnt/bashfile 50000000
> > > (the second test just makes the UML under memory pressure and stresses
> > > the filesystem, otherwise it does not interact with fsx-linux in any
> > > way). After some time (like an hour) fsx-linux reported the file is
> > > corrupted. I tried again and it happened again so probably some
> > > debugging should be possible.
> > >   Both times it seems we've simply completely lost a write which
> > > happened through mmap (2 pages in the first case, 3 pages in the second
> > > case). Also I've checked and in the first case no blocks are allocated
> > > for the offsets where the data should be so most probably we've lost
> > > the write before block_write_full_page() called get_block().
> > >   I'll debug this further but I wanted let people know there's some
> > > problem and maybe somebody has some bright idea :).  I'm attaching the
> > > log from fsx if someone is interested.
> >
> >   Testing a bit more, I managed to reproduce the problem on ext2 and
> > what's more strange, now the lost page was written via ordinary write()
> > (fsxlog attached). So I believe this is more likely to be UML specific...
>
>   And to add even more information, this also happens on ext2 with 4KB
> blocksize (although much more rarely it seems). Again the data was written
> by an extending write() but the block for it was not even allocated...

What block device driver are you using?

Can it be reproduced without mapped reads and writes completely? (-W -R)


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html