lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20090305104233.GA29531@duck.suse.cz>
Date:	Thu, 5 Mar 2009 11:42:33 +0100
From:	Jan Kara <jack@...e.cz>
To:	Nick Piggin <nickpiggin@...oo.com.au>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org,
	user-mode-linux-devel@...ts.sourceforge.net,
	linux-ext4@...r.kernel.org
Subject: Re: fsx-linux loosing mmap() writes under memory pressure

On Thu 05-03-09 21:18:54, Nick Piggin wrote:
> On Thursday 05 March 2009 21:05:16 Jan Kara wrote:
> > On Thu 05-03-09 13:55:43, Nick Piggin wrote:
> > > On Thursday 05 March 2009 04:50:31 Jan Kara wrote:
> > > > On Wed 04-03-09 16:55:35, Jan Kara wrote:
> > > > > On Wed 04-03-09 15:51:09, Jan Kara wrote:
> > > > > >   first, I'd like to point out that this has happened under UML so
> > > > > > it can be just some obscure bug in that architecture but I belive
> > > > > > it's worth debugging anyway. Now to the problem:
> > > > > >   This has happened with today Linus's git snapshot. The filesystem
> > > > > > is ext3 with *1KB* blocksize. I booted UML with 64MB of memory and
> > > > > > run (these are test's from Andrew Morton's torture tests):
> > > > > >   fsx-linux -l 8000000 /mnt/testfile
> > > > > >   bash-shared-mapping -t 8 /mnt/bashfile 50000000
> > > > > > (the second test just makes the UML under memory pressure and
> > > > > > stresses the filesystem, otherwise it does not interact with
> > > > > > fsx-linux in any way). After some time (like an hour) fsx-linux
> > > > > > reported the file is corrupted. I tried again and it happened again
> > > > > > so probably some debugging should be possible.
> > > > > >   Both times it seems we've simply completely lost a write which
> > > > > > happened through mmap (2 pages in the first case, 3 pages in the
> > > > > > second case). Also I've checked and in the first case no blocks are
> > > > > > allocated for the offsets where the data should be so most probably
> > > > > > we've lost the write before block_write_full_page() called
> > > > > > get_block(). I'll debug this further but I wanted let people know
> > > > > > there's some problem and maybe somebody has some bright idea :). 
> > > > > > I'm attaching the log from fsx if someone is interested.
> > > > >
> > > > >   Testing a bit more, I managed to reproduce the problem on ext2 and
> > > > > what's more strange, now the lost page was written via ordinary
> > > > > write() (fsxlog attached). So I believe this is more likely to be UML
> > > > > specific...
> > > >
> > > >   And to add even more information, this also happens on ext2 with 4KB
> > > > blocksize (although much more rarely it seems). Again the data was
> > > > written by an extending write() but the block for it was not even
> > > > allocated...
> > >
> > > What block device driver are you using?
> >
> >   UML was just using image file to back the filesystem I was testing on.
> > But I don't think that plays a big role because the blocks were not even
> > allocated in the fs-image so we must have lost them quite early.
> 
> So you're using ubd driver? OK, I just have a report of a problem
> with brd driver...
  Yes, I'm using UBD.

									Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ