linux-ext4 - [Bug 102751] New: infinite loop in jbd2_journal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <bug-102751-13602@https.bugzilla.kernel.org/>
Date:	Wed, 12 Aug 2015 22:38:22 +0000
From:	bugzilla-daemon@...zilla.kernel.org
To:	linux-ext4@...r.kernel.org
Subject: [Bug 102751] New: infinite loop in jbd2_journal_destroy()

https://bugzilla.kernel.org/show_bug.cgi?id=102751

            Bug ID: 102751
           Summary: infinite loop in jbd2_journal_destroy()
           Product: File System
           Version: 2.5
    Kernel Version: 4.1.5
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: ext4
          Assignee: fs_ext4@...nel-bugs.osdl.org
          Reporter: mihai.dontu@...il.com
        Regression: No

While watching a video from a removable disk (USB), the connecting cable failed
(too much use) and I had to unplug it. I noticed, however, that vlc has started
consuming 100% CPU time while being zombie. An Alt+SysReq+l showed this:

NMI backtrace for cpu 2
CPU: 2 PID: 17378 Comm: vlc Tainted: G           O    4.1.5-gentoo #1
Hardware name: Dell Inc. Latitude E7440/07F3F4, BIOS A15 05/19/2015
task: ffff88029d050000 ti: ffff8802cd80c000 task.ti: ffff8802cd80c000
RIP: 0010:[<ffffffff8cec3320>]  [<ffffffff8cec3320>] mutex_unlock+0x10/0x20
RSP: 0018:ffff8802cd80fcd0  EFLAGS: 00000202
RAX: 00000000fffffffb RBX: ffff880084068000 RCX: 0000000000000000
RDX: 0000000080000001 RSI: 0000000000000000 RDI: ffff8800840680e8
RBP: ffff8802cd80fd38 R08: 000000000000000a R09: 00000000000004b0
R10: 0000000000017e98 R11: 00000000000004b0 R12: ffff880084068398
R13: ffff8800840680e8 R14: ffff8802cd80fcf0 R15: ffff8800840680a0
FS:  00007fa8ac663700(0000) GS:ffff88041eb00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5b3e946000 CR3: 000000000d80d000 CR4: 00000000001426e0
Stack:
 ffffffff8c3d1318 ffff880200000000 ffff88029d050000 ffffffff8c179cc0
 ffff8802cd80fcf0 ffff8802cd80fcf0 0000000028b119c8 ffff88015d99c400
 ffff88008406c000 ffff880185940400 ffff880084068800 ffff88029d050000
Call Trace:
 [<ffffffff8c3d1318>] ? jbd2_journal_destroy+0x138/0x240
 [<ffffffff8c179cc0>] ? wake_atomic_t_function+0x60/0x60
 [<ffffffff8c38f0e7>] ext4_put_super+0x67/0x360
 [<ffffffff8c29d726>] generic_shutdown_super+0x76/0x100
 [<ffffffff8c29dae7>] kill_block_super+0x27/0x80
 [<ffffffff8c29de59>] deactivate_locked_super+0x49/0x80
 [<ffffffff8c29e2cc>] deactivate_super+0x6c/0x80
 [<ffffffff8c2bc033>] cleanup_mnt+0x43/0xa0
 [<ffffffff8c2bc0e2>] __cleanup_mnt+0x12/0x20
 [<ffffffff8c153804>] task_work_run+0xd4/0xf0
 [<ffffffff8c139174>] do_exit+0x2f4/0xb90
 [<ffffffff8c1d381c>] ? __audit_syscall_entry+0xac/0x100
 [<ffffffff8c05f745>] ? do_audit_syscall_entry+0x55/0x80
 [<ffffffff8c139a9b>] do_group_exit+0x3b/0xb0
 [<ffffffff8c139b24>] SyS_exit_group+0x14/0x20
 [<ffffffff8cec59db>] system_call_fastpath+0x16/0x6e
Code: ff 4c 89 e7 e8 d2 1e 00 00 5b 41 5c 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00
00 00 00 0f 1f 44 00 00 48 c7 47 18 00 00 00 00 f0 ff 07 <7f> 0a 55 48 89 e5 e8
95 ff ff ff 5d c3 0f 1f 00 0f 1f 44 00 00

and perf top (first 9 lines):

  18.08%  [kernel]  [k] _raw_spin_lock
  17.97%  [kernel]  [k] mutex_lock
  15.36%  [kernel]  [k] mutex_unlock
  10.89%  [kernel]  [k] _raw_spin_unlock
   6.49%  [kernel]  [k] jbd2_log_do_checkpoint
   6.16%  [kernel]  [k] preempt_count_add
   4.53%  [kernel]  [k] jbd2_cleanup_journal_tail
   3.96%  [kernel]  [k] preempt_count_sub
   3.21%  [kernel]  [k] jbd2_journal_destroy

Looking at the code it would seem that I've hit a race in:

  while (journal->j_checkpoint_transactions != NULL) { ... }

because it's waiting for a transaction that cannot take place:

Buffer I/O error on dev dm-1, logical block 243826688, lost sync page write
JBD2: Error -5 detected when updating journal superblock for dm-1-8.
Aborting journal on device dm-1-8.
Buffer I/O error on dev dm-1, logical block 243826688, lost sync page write
JBD2: Error -5 detected when updating journal superblock for dm-1-8.

Maybe the loop should be abandoned on jbd2_log_do_checkpoint() error?

The USB failure happened several times before, but I've never seen vlc get
stuck. This also means that I'm unlikely to be able to reproduce this. :-(

One more detail: the ext4 filesystem sits on top a LUKS device.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html