lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Mon, 01 Aug 2011 15:18:25 +0200
From:	Mihai Moldovan <ionic@...ic.de>
To:	LKML <linux-kernel@...r.kernel.org>
CC:	Dave Kleikamp <shaggy@...nel.org>
Subject: [BUG JFS] fs/jfs/jfs_txnmgr.c:940 assert(mp->nohomeok > 0) and fs/jfs/jfs_metapage.c:752
 assert(mp->count)

Hi,

yesterday while compiling LibreOffice on my amd64 box running Linux
3.0.0, those bugs happened:

[692827.021788] BUG at fs/jfs/jfs_txnmgr.c:940 assert(mp->nohomeok > 0)
[692827.021804] ------------[ cut here ]------------
[692827.021821] kernel BUG at fs/jfs/jfs_txnmgr.c:940!
[692827.021832] invalid opcode: 0000 [#1] SMP
[692827.021846] CPU 0
[692827.021851] Modules linked in: xt_state xt_dscp xt_hashlimit
xt_conntrack xt_DSCP ath9k i915 mac80211 ath9k_common ath9k_hw
drm_kms_helper ath cfg80211 cfbcopyarea e1000e video rfkill backlight
cfbimgblt cfbfillrect
[692827.021934]
[692827.021940] Pid: 697, comm: jfsCommit Not tainted 3.0.0-OSS4.2
#1                  /DQ45CB
[692827.021967] RIP: 0010:[<ffffffff81252835>]  [<ffffffff81252835>]
txUnlock+0xc2/0x21a
[692827.021991] RSP: 0018:ffff880231901e20  EFLAGS: 00010286
[692827.022003] RAX: 000000000000004e RBX: ffffc900080aef00 RCX:
000000000000ef51
[692827.022021] RDX: 0000000000000000 RSI: 0000000000000046 RDI:
ffffffff81ef0e60
[692827.022038] RBP: ffff880231901e60 R08: 0000000000000046 R09:
00000000fffffffe
[692827.022054] R10: ffff8802b1901d3f R11: 0000000000000000 R12:
ffff8801a8fca380
[692827.022070] R13: ffff88022e05b600 R14: ffffc900081db3e0 R15:
000000005063059c
[692827.022087] FS:  0000000000000000(0000) GS:ffff88023bc00000(0000)
knlGS:0000000000000000
[692827.022105] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[692827.022118] CR2: 00007f98337ce000 CR3: 0000000003f36000 CR4:
00000000000406f0
[692827.022135] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[692827.022152] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[692827.022168] Process jfsCommit (pid: 697, threadinfo
ffff880231900000, task ffff880232746140)
[692827.022185] Stack:
[692827.022191]  2e84880231901e60 ffff88022e05b718 ffffffff816a41f3
ffff880232746140
[692827.022214]  ffff880231901e80 ffff88023332b300 cccccccccccccccd
ffffc900080aef00
[692827.022238]  ffff880231901ee0 ffffffff81254de7 0000000000010bc0
ffff88022e05b600
[692827.022261] Call Trace:
[692827.022274]  [<ffffffff816a41f3>] ? _raw_spin_lock_irqsave+0xd/0x2a
[692827.022290]  [<ffffffff81254de7>] jfs_lazycommit+0x13d/0x24e
[692827.022305]  [<ffffffff8103c53e>] ? try_to_wake_up+0x1c5/0x1c5
[692827.022318]  [<ffffffff81254caa>] ? txCommit+0x33b/0x33b
[692827.022332]  [<ffffffff810595da>] kthread+0x7d/0x85
[692827.022346]  [<ffffffff816aa614>] kernel_thread_helper+0x4/0x10
[692827.022361]  [<ffffffff8105955d>] ? kthread_worker_fn+0x137/0x137
[692827.022376]  [<ffffffff816aa610>] ? gs_change+0xb/0xb
[692827.022387] Code: ff 41 8b 44 24 68 85 c0 7f 23 48 c7 c1 df 5a b1 81
ba ac 03 00 00 48 c7 c6 cb 5a b1 81 48 c7 c7 3a 4c b1 81 31 c0 e8 69 fd
44 00 <0f> 0b ff c8 41 89 44 24 68 85 c0 75 0a 49 8b 7c 24 58 e8 9b 68
[692827.022519] RIP  [<ffffffff81252835>] txUnlock+0xc2/0x21a
[692827.022533]  RSP <ffff880231901e20>
[692827.073901] ---[ end trace b4cb55fb5d49efbd ]---
[693433.340865] ERROR: (device md1): XT_GETPAGE: xtree page corrupt
[693433.366922] ERROR: (device md1): XT_GETPAGE: xtree page corrupt
[693433.366925] ERROR: (device md1): XT_GETPAGE: xtree page corrupt
[693433.366931] ERROR: (device md1): XT_GETPAGE: xtree page corrupt
[693433.366933] ERROR: (device md1): XT_GETPAGE: xtree page corrupt
[693433.366935] ERROR: (device md1): XT_GETPAGE: xtree page corrupt
[693433.366937] ERROR: (device md1): XT_GETPAGE: xtree page corrupt
[693433.366939] ERROR: (device md1): XT_GETPAGE: xtree page corrupt
[693433.366941] ERROR: (device md1): XT_GETPAGE: xtree page corrupt
[693433.366943] ERROR: (device md1): XT_GETPAGE: xtree page corrupt
[693433.366945] BUG at fs/jfs/jfs_metapage.c:752 assert(mp->count)
[693433.366955] ------------[ cut here ]------------
[693433.366970] kernel BUG at fs/jfs/jfs_metapage.c:752!
[693433.366981] invalid opcode: 0000 [#2] SMP
[693433.366993] CPU 2
[693433.366998] Modules linked in: xt_state xt_dscp xt_hashlimit
xt_conntrack xt_DSCP ath9k i915 mac80211 ath9k_common ath9k_hw
drm_kms_helper ath cfg80211 cfbcopyarea e1000e video rfkill backlight
cfbimgblt cfbfillrect
[693433.367069]
[693433.367075] Pid: 10230, comm: shutdown Tainted: G      D    
3.0.0-OSS4.2 #1                  /DQ45CB
[693433.367098] RIP: 0010:[<ffffffff8124ffb6>]  [<ffffffff8124ffb6>]
release_metapage+0x5f/0xf7
[693433.367122] RSP: 0018:ffff8801a7a41ae8  EFLAGS: 00010296
[693433.367134] RAX: 0000000000000049 RBX: ffff8801a8fca380 RCX:
0000000000000aa3
[693433.367149] RDX: 0000000000000000 RSI: 0000000000000046 RDI:
ffffffff81ef0e60
[693433.367165] RBP: ffff8801a7a41af8 R08: 0000000000000046 R09:
00000000fffffffe
[693433.367180] R10: ffff880227a41a07 R11: 0000000000000000 R12:
ffffea0007051da8
[693433.367196] R13: ffff88022e901000 R14: 0000000000000001 R15:
000000000000034b
[693433.367212] FS:  00007fcbd4f3f700(0000) GS:ffff88023bd00000(0000)
knlGS:0000000000000000
[693433.367229] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[693433.367242] CR2: 00007fcbd4abb980 CR3: 00000001fbe6d000 CR4:
00000000000406e0
[693433.367258] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[693433.367273] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[693433.367289] Process shutdown (pid: 10230, threadinfo
ffff8801a7a40000, task ffff88023328e140)
[693433.367307] Stack:
[693433.367312]  ffff8801a8fca380 0000000000000000 ffff8801a7a41b98
ffffffff8124dcb2
[693433.367334]  ffff8801a7a41b78 ffffffff81253dbc ffff880232cdbdc0
0000000200000002
[693433.367356]  ffff880231483400 00000001812b07bf ffff8801a7a41cc0
0000402000000000
[693433.367378] Call Trace:
[693433.367386]  [<ffffffff8124dcb2>] dtInsert+0x18c/0x19e
[693433.367399]  [<ffffffff81253dbc>] ? txLock+0x1bd/0x5ab
[693433.367411]  [<ffffffff8123e835>] jfs_create+0x153/0x27f
[693433.367425]  [<ffffffff810e120f>] ? do_lookup+0x46/0x28f
[693433.367439]  [<ffffffff816a41e4>] ? _raw_spin_lock+0x9/0xb
[693433.367452]  [<ffffffff810e9119>] ? __d_lookup+0x91/0xcb
[693433.367465]  [<ffffffff810e18e9>] vfs_create+0x6b/0x8d
[693433.367478]  [<ffffffff810e1b2e>] do_last+0x223/0x46b
[693433.367491]  [<ffffffff810e2ef7>] path_openat+0xc2/0x334
[693433.367503]  [<ffffffff810e31a1>] do_filp_open+0x38/0x84
[693433.367517]  [<ffffffff81306fbd>] ? strncpy_from_user+0x34/0x37
[693433.367532]  [<ffffffff810ec9c0>] ? alloc_fd+0x6f/0x11a
[693433.367545]  [<ffffffff810d6f3f>] do_sys_open+0x6f/0x101
[693433.367557]  [<ffffffff810d6fec>] sys_open+0x1b/0x1d
[693433.367570]  [<ffffffff816a953b>] system_call_fastpath+0x16/0x1b
[693433.367583] Code: 56 de ff 48 8b 43 28 48 85 c0 75 21 48 c7 c1 d6 59
b1 81 ba f0 02 00 00 48 c7 c6 33 59 b1 81 48 c7 c7 3a 4c b1 81 e8 e8 25
45 00 <0f> 0b 48 ff c8 48 89 43 28 48 85 c0 75 75 83 7b 68 00 75 6f 48
[693433.367707] RIP  [<ffffffff8124ffb6>] release_metapage+0x5f/0xf7
[693433.367723]  RSP <ffff8801a7a41ae8>
[693433.373041] ---[ end trace b4cb55fb5d49efbe ]---

The system was in an unusable state afterwards - new network connections
couldn't be established while already running ones continued to work
fine. Also, neither reboot (segfaulted the first time, won't return the
second time) or init 6 did work and I was forced to do a hard
shutdown/reboot.

I'm pretty sure that the system would have locked up completely after
some time if I hadn't rebooted after the incident, as happened several
times, but I've never been able to get a trace of what happened so far.

As this bug happened at pretty high disk load (whilst compiling), I
guess JFS has a few really nasty bugs.

This is actually not the first bug I encounter with JFS. I've noticed, that:
  - performance degrades a lot with files > 50GB for some reason. This
is especially annoying if copying full disk images. It starts up fine
with maximum speed and drops to a few MB/s as the file gets bigger and
bigger.
  - on older Kernels I've seen the jfsCommit threads causing a *lot* CPU
load when working with big files. It got worse and worse the bigger the
file became until the system hung completely (even AFTER the file was
transferred successfully and a sync called.) It started with as low as
6% CPU load and increased steadily, even on multiple jfsCommit threads.
I haven't tested this on 3.0.0 yet though, but every older version I ran
had this kind of problem.

All those problems may or may not be linked together. Something seems to
be really wrong.

Best regards,


Mihai


Download attachment "smime.p7s" of type "application/pkcs7-signature" (4369 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ