linux-kernel - Re: [patch 2/5] mm: fault vs invalidate/truncate race fix

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 11 Oct 2006 11:34:04 -0700
From:	Mark Fasheh <mark.fasheh@...cle.com>
To:	Nick Piggin <nickpiggin@...oo.com.au>
Cc:	Nick Piggin <npiggin@...e.de>, Hugh Dickins <hugh@...itas.com>,
	Linux Memory Management <linux-mm@...ck.org>,
	Andrew Morton <akpm@...l.org>, Jes Sorensen <jes@....com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: [patch 2/5] mm: fault vs invalidate/truncate race fix

On Tue, Oct 10, 2006 at 11:10:42AM +1000, Nick Piggin wrote:
> If you want a stable patchset for testing, the previous one to linux-mm
> starting with "[patch 1/3] mm: fault vs invalidate/truncate check" went
> through some stress testing here...
Hmm, unfortunately my testing so far hasn't been particularly encouraging...

Shortly after my test starts, one of the "ocfs2-vote" processes on one of my
nodes will begin consuming cpu at a rate which indicates it might be in an
infinite loop. The soft lockup detection code seems to agree:

BUG: soft lockup detected on CPU#0!
Call Trace:
[C00000003795F220] [C000000000011310] .show_stack+0x50/0x1cc (unreliable)
[C00000003795F2D0] [C000000000086100] .softlockup_tick+0xf8/0x120
[C00000003795F380] [C000000000060DA8] .run_local_timers+0x1c/0x30
[C00000003795F400] [C000000000023B28] .timer_interrupt+0x110/0x500
[C00000003795F520] [C0000000000034EC] decrementer_common+0xec/0x100
--- Exception: 901 at ._raw_spin_lock+0x84/0x1a0
    LR = ._spin_lock+0x10/0x24
[C00000003795F810] [C000000000788FC8] init_thread_union+0xfc8/0x4000 (unreliable)
[C00000003795F8B0] [C0000000004A66B8] ._spin_lock+0x10/0x24
[C00000003795F930] [C00000000009EDBC] .unmap_mapping_range+0x88/0x2d4
[C00000003795FA90] [C0000000000967E4] .truncate_inode_pages_range+0x2b8/0x490
[C00000003795FBE0] [D0000000005FA8C0] .ocfs2_data_convert_worker+0x124/0x14c [ocfs2]
[C00000003795FC70] [D0000000005FB0BC] .ocfs2_process_blocked_lock+0x184/0xca4 [ocfs2]
[C00000003795FD50] [D000000000629DE8] .ocfs2_vote_thread+0x1a8/0xc18 [ocfs2]
[C00000003795FEE0] [C00000000007000C] .kthread+0x154/0x1a4
[C00000003795FF90] [C000000000027124] .kernel_thread+0x4c/0x68


A sysrq-t doesn't show anything interesting from any of the other OCFS2
processes. This is your patchset from the 10th, running against Linus' git
tree from that day, with my mmap patch merged in.

The stack seems to indicate that we're stuck in one of these
truncate_inode_pages_range() loops:

+                       while (page_mapped(page)) {
+                               unmap_mapping_range(mapping,
+                                 (loff_t)page_index<<PAGE_CACHE_SHIFT,
+                                 PAGE_CACHE_SIZE, 0);
+                       }


The test I run is over here btw:

http://oss.oracle.com/projects/ocfs2-test/src/trunk/programs/multi_node_mmap/multi_mmap.c

I ran it with the following parameters:

mpirun -np 6 n1-3 ./multi_mmap -w mmap -r mmap -i 1000 -b 1024 /ocfs2/mmap/test4.txt
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@...cle.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/