linux-ext4 - Re: [BUG] xfstests #68 with data=journal hang against 3.8-rc7 and 'dev' branch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20130226184923.GA19092@quack.suse.cz>
Date:	Tue, 26 Feb 2013 19:49:23 +0100
From:	Jan Kara <jack@...e.cz>
To:	Theodore Ts'o <tytso@....edu>
Cc:	linux-ext4@...r.kernel.org
Subject: Re: [BUG] xfstests #68 with data=journal hang against 3.8-rc7 and
 'dev' branch

On Fri 22-02-13 14:46:21, Ted Tso wrote:
> On Mon, Feb 18, 2013 at 01:22:08AM +0800, Zheng Liu wrote:
> > Hi all,
> > 
> > Xfstests #68 will hang with data=journal in 3.8-rc7 and 'dev' branch. I
> > remember that there has a patch for ext4 to fix filesystem freeze bug
> > but I am not sure whether it can fix this bug and it has been applied
> > into 'dev' branch.  So I file this bug here.
> 
> I've confirmed that I can reproduce this by using tmpfs imagea files
> under KVM.  I can replicate the bug as far back as the 3.0 kernel, so
> this is definitely not a recent regression.
  Yeah, I was thinking about it for a while now and I think I understand
what's going on. As I already mentioned, the problem with data=journal mode
is that when a transaction containing page data is committed, corresponding
buffers (and thus the page) is marked dirty so that flusher thread (or
checkpointing code) can do checkpoint. So after one iteration of
inode syncing, we have a plenty of dirty pages still around. Now syncing
happens in two rounds - the first in WB_SYNC_NONE mode and the second in
WB_SYNC_ALL mode so usually we perform writeback needed for checkpoint in
the second round. But if for some reason in the first round we skipped the
page (it was locked, under writeback or so) we have a problem and the page
remains dirty after sync.

Another variation of the problem is that ext4_sync_fs() just starts a
transaction commit if wait == 0 and pages are marked dirty only at the end
of transaction commit so the second syncing round in WB_SYNC_ALL mode may
miss some pages which will be marked dirty later.

The question is what to do with these races. We could sync the inodes again
in ext4_sync_fs() after waiting for transaction commit to flush
data needing checkpoint but that looks as an overkill...

And BTW, the trace below looks as a different problem. We fail on:
J_ASSERT_BH(bh, !buffer_jbddirty(bh));
which shouldn't really happen (__dispose_buffer() should have cleared
that). I'll try my luck with RAM based images as well...

								Honza

> kernel BUG at /tyt/linux/ext4/fs/jbd2/transaction.c:1986!
> invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
> Modules linked in:
> Pid: 3399, comm: fstest Not tainted 3.8.0-rc3-00026-ge7b04ac #54 Bochs Bochs
> EIP: 0060:[<c02b0bb7>] EFLAGS: 00010206 CPU: 0
> EIP is at jbd2_journal_invalidatepage+0x1bb/0x238
> EAX: 001c4025 EBX: f5f7ab38 ECX: 00000000 EDX: 00000246
> ESI: f481d3c8 EDI: f5b87800 EBP: f4d3dcac ESP: f4d3dc7c
>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> CR0: 80050033 CR2: b7666000 CR3: 34cef000 CR4: 000006f0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> Process fstest (pid: 3399, ti=f4d3c000 task=f4d208a0 task.ti=f4d3c000)
> Stack:
>  00001000 f5f7ab38 f5f7ab38 00000001 f5b87b80 f5b87814 00000000 f7bbc788
>  00000000 f7bbc788 f5f5f794 00000000 f4d3dcc4 c02741b1 f5b87800 c02315a0
>  f5f5f794 00000030 f4d3dccc c027524d f4d3dcd8 c01e9cab f7bbc788 f4d3dce8
> Call Trace:
>  [<c02741b1>] __ext4_journalled_invalidatepage+0x60/0x66
>  [<c02315a0>] ? sync_mapping_buffers+0x1e7/0x1e7
>  [<c027524d>] ext4_journalled_invalidatepage+0xd/0x22
>  [<c01e9cab>] do_invalidatepage+0x21/0x24
>  [<c01e9cfd>] truncate_inode_page+0x4f/0x70
>  [<c01e9dc6>] truncate_inode_pages_range+0xa8/0x206
>  [<c01e9fd3>] truncate_inode_pages+0x11/0x15
>  [<c01ea01f>] truncate_pagecache+0x48/0x64
>  [<c0277f2f>] ext4_setattr+0x3cc/0x464
>  [<c0277b63>] ? ext4_mark_inode_dirty+0x1b3/0x1b3
>  [<c02200fd>] notify_change+0x1b1/0x272
>  [<c08077ea>] ? mutex_lock_nested+0x26/0x2f
>  [<c020c576>] do_truncate+0x69/0x82
>  [<c0217ae7>] do_last+0x8af/0x8d6
>  [<c0215aca>] ? inode_permission+0x45/0x47
>  [<c0215b66>] ? link_path_walk+0x9a/0x3ab
>  [<c0217bab>] path_openat+0x9d/0x2bc
>  [<c0199afe>] ? lock_release_holdtime.part.21+0x5d/0x63
>  [<c0199693>] ? trace_hardirqs_off+0xb/0xd
>  [<c021800f>] do_filp_open+0x26/0x62
>  [<c022116d>] ? __alloc_fd+0xbd/0xc8
>  [<c020d0b5>] do_sys_open+0x58/0xd1
>  [<c020d154>] sys_open+0x26/0x2e
>  [<c0809748>] syscall_call+0x7/0xb
>  [<c0800000>] ? no_context+0x67/0x1a5
> Code: 68 7d 00 00 8b 45 e0 e8 da 88 55 00 89 d8 e8 39 ee ff ff 8b 45 e4 e8 67 83 55 00 89 d8 e8 c6 ec ff ff 8b 03 a9 00 00 08 00 74 02 <0f> 0b f0 80 23 df f0 8
> 3 bf f0 80 63 01 fd f0 80
> EIP: [<c02b0bb7>] jbd2_journal_invalidatepage+0x1bb/0x238 SS:ESP 0068:f4d3dc7c
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html