linux-kernel - Re: ftruncate-mmap: pages are lost after writing to mmaped file.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090319164638.GB3899@duck.suse.cz>
Date:	Thu, 19 Mar 2009 17:46:39 +0100
From:	Jan Kara <jack@...e.cz>
To:	Nick Piggin <nickpiggin@...oo.com.au>
Cc:	Ying Han <yinghan@...gle.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>, guichaz@...il.com,
	Alex Khesin <alexk@...gle.com>,
	Mike Waychison <mikew@...gle.com>,
	Rohit Seth <rohitseth@...gle.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: ftruncate-mmap: pages are lost after writing to mmaped file.

  Hi,

On Fri 20-03-09 02:48:21, Nick Piggin wrote:
> On Thursday 19 March 2009 10:54:33 Ying Han wrote:
> > On Wed, Mar 18, 2009 at 4:36 PM, Linus Torvalds
> >
> > <torvalds@...ux-foundation.org> wrote:
> > > On Wed, 18 Mar 2009, Ying Han wrote:
> > >> > Can you say what filesystem, and what mount-flags you use? Iirc, last
> > >> > time we had MAP_SHARED lost writes it was at least partly triggered by
> > >> > the filesystem doing its own flushing independently of the VM (ie ext3
> > >> > with "data=journal", I think), so that kind of thing does tend to
> > >> > matter.
> > >>
> > >> /etc/fstab
> > >> "/dev/hda1 / ext2 defaults 1 0"
> > >
> > > Sadly, /etc/fstab is not necessarily accurate for the root filesystem. At
> > > least Fedora will ignore the flags in it.
> > >
> > > What does /proc/mounts say? That should be a more reliable indication of
> > > what the kernel actually does.
> >
> > "/dev/root / ext2 rw,errors=continue 0 0"
> 
> No luck with finding the problem yet.
  I've been staring at the code whole yesterday and didn't find the problem
either.

> But I think we do have a race in __set_page_dirty_buffers():
> 
> The page may not have buffers between the mapping->private_lock
> critical section and the __set_page_dirty call there. So between
> them, another thread might do a create_empty_buffers which can
> see !PageDirty and thus it will create clean buffers. The page
> will get dirtied by the original thread, but if the buffers are
> clean it can be cleaned without writing out buffers.
> 
> Holding mapping->private_lock over the __set_page_dirty should
> fix it, although I guess you'd want to release it before calling
> __mark_inode_dirty so as not to put inode_lock under there. I
> have a patch for this if it sounds reasonable.
  Yes, that seems to be a bug - the function actually looked suspitious to
me yesterday but I somehow convinced myself that it's fine. Probably
because fsx-linux is single-threaded.
  Anyway, I've tried the following hack:

diff --git a/fs/buffer.c b/fs/buffer.c
index 985f617..f764c8a 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -763,10 +763,15 @@ EXPORT_SYMBOL(mark_buffer_dirty_inode);
 static int __set_page_dirty(struct page *page,
                struct address_space *mapping, int warn)
 {
+       int ret;
+
        if (unlikely(!mapping))
                return !TestSetPageDirty(page);
 
-       if (TestSetPageDirty(page))
+       ret = TestSetPageDirty(page);
+       if (warn)
+               spin_unlock(&mapping->private_lock);
+       if (ret)
                return 0;
 
        spin_lock_irq(&mapping->tree_lock);
@@ -831,8 +836,6 @@ int __set_page_dirty_buffers(struct page *page)
                        bh = bh->b_this_page;
                } while (bh != head);
        }
-       spin_unlock(&mapping->private_lock);
-
        return __set_page_dirty(page, mapping, 1);
 }

   But it didn't help my data corruption under UML :(.

									Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/