[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1609161524550.5384@nanos>
Date:   Fri, 16 Sep 2016 15:35:39 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     LKML <linux-kernel@...r.kernel.org>
cc:     Jan Kara <jack@...e.cz>, Theodore Ts'o <tytso@....edu>,
        Sebastian Sewior <bigeasy@...utronix.de>
Subject: [BUG] jbd2: lockdep deadlock warning
Jan,
wer are currently moving the RT patch to 4.8 and on our test runs we
triggered the following lockdep splat:
[  153.887910] ======================================================
[  153.887911] [ INFO: possible circular locking dependency detected ]
[  153.887912] 4.8.0-rc6-rt0+ #679 Tainted: G        W
[  153.887912] -------------------------------------------------------
[  153.887913] find/2537 is trying to acquire lock:
[  153.887923]  (&journal->j_state_lock){+.+...}, at: [<ffffffff81300ed0>] jbd2_log_start_commit+0x20/0x40
[  153.887924]
[  153.887924] but task is already holding lock:
[  153.887925]  (jbd2_handle){++++..}, at: [<ffffffff812f5411>] start_this_handle+0x111/0x470
[  153.887925]
[  153.887925] which lock already depends on the new lock.
[  153.887925]
[  153.887926]
[  153.887926] the existing dependency chain (in reverse order) is:
[  153.887927]
[  153.887927] -> #1 (jbd2_handle){++++..}:
[  153.887931]        [<ffffffff810cfb5f>] lock_acquire+0x12f/0x2c0
[  153.887932]        [<ffffffff812f4fed>] add_transaction_credits+0x4d/0x310
[  153.887933]        [<ffffffff812f5409>] start_this_handle+0x109/0x470
[  153.887933]        [<ffffffff812f5d1b>] jbd2__journal_start+0xdb/0x380
[  153.887936]        [<ffffffff812d96a0>] __ext4_journal_start_sb+0x90/0x2b0
[  153.887940]        [<ffffffff8129d10e>] ext4_file_open+0x11e/0x230
[  153.887946]        [<ffffffff8120c808>] do_dentry_open.isra.16+0x168/0x310
[  153.887948]        [<ffffffff8120dbe5>] vfs_open+0x45/0x70
[  153.887950]        [<ffffffff8121f30c>] path_openat+0x48c/0xa20
[  153.887951]        [<ffffffff81220b9e>] do_filp_open+0x7e/0xe0
[  153.887953]        [<ffffffff81215164>] do_open_execat+0x64/0x150
[  153.887954]        [<ffffffff812174a8>] do_execveat_common.isra.43+0x278/0x950
[  153.887955]        [<ffffffff81217f30>] compat_SyS_execve+0x40/0x50
[  153.887956]        [<ffffffff810032d2>] do_fast_syscall_32+0xb2/0x2f0
[  153.887960]        [<ffffffff818e7b31>] entry_SYSENTER_compat+0x51/0x60
[  153.887960]
[  153.887960] -> #0 (&journal->j_state_lock){+.+...}:
[  153.887962]        [<ffffffff810cf54c>] __lock_acquire+0x105c/0x11d0
[  153.887963]        [<ffffffff810cfb5f>] lock_acquire+0x12f/0x2c0
[  153.887966]        [<ffffffff818e5e24>] rt_write_lock+0x34/0x50
[  153.887968]        [<ffffffff81300ed0>] jbd2_log_start_commit+0x20/0x40
[  153.887968]        [<ffffffff812f67a7>] jbd2_journal_stop+0x147/0x640
[  153.887969]        [<ffffffff812d98fd>] __ext4_journal_stop+0x3d/0xa0
[  153.887972]        [<ffffffff812ad72c>] ext4_dirty_inode+0x5c/0x70
[  153.887975]        [<ffffffff81243b4a>] __mark_inode_dirty+0x29a/0x9d0
[  153.887984]        [<ffffffff8122dfb6>] generic_update_time+0x76/0xc0
[  153.887986]        [<ffffffff81230058>] touch_atime+0x88/0xa0
[  153.887987]        [<ffffffff812248c0>] iterate_dir+0x170/0x190
[  153.887990]        [<ffffffff812719b6>] compat_SyS_getdents64+0x86/0xf0
[  153.887991]        [<ffffffff810032d2>] do_fast_syscall_32+0xb2/0x2f0
[  153.887992]        [<ffffffff818e7b31>] entry_SYSENTER_compat+0x51/0x60
[  153.887992]
[  153.887992] other info that might help us debug this:
[  153.887992]
[  153.887992]  Possible unsafe locking scenario:
[  153.887992]
[  153.887993]        CPU0                    CPU1
[  153.887993]        ----                    ----
[  153.887993]   lock(jbd2_handle);
[  153.887994]                                lock(&journal->j_state_lock);
[  153.887994]                                lock(jbd2_handle);
[  153.887995]   lock(&journal->j_state_lock);
[  153.887995]
[  153.887995]  *** DEADLOCK ***
[  153.887995]
[  153.887996] 4 locks held by find/2537:
[  153.888004]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff812323aa>] __fdget_pos+0x4a/0x50
[  153.888007]  #1:  (&type->i_mutex_dir_key){++++++}, at: [<ffffffff810d4650>] rt_down_read+0x10/0x20
[  153.888008]  #2:  (sb_writers#3){.+.+.+}, at: [<ffffffff8121272a>] __sb_start_write+0xda/0xf0
[  153.888010]  #3:  (jbd2_handle){++++..}, at: [<ffffffff812f5411>] start_this_handle+0x111/0x470
[  153.888010]
[  153.888010] stack backtrace:
[  153.888011] CPU: 2 PID: 2537 Comm: find Tainted: G        W       4.8.0-rc6-rt0+ #679
[  153.888012] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
[  153.888014]  0000000000000000 ffff88003826fb20 ffffffff81438d04 ffffffff82a054c0
[  153.888015]  ffffffff82a054c0 ffff88003826fb60 ffffffff810ccd3e 0000000038d8db70
[  153.888016]  ffff880038d8dba8 ffff880038d8db70 5b9d99a44104f32e ffff880038d8d200
[  153.888016] Call Trace:
[  153.888019]  [<ffffffff81438d04>] dump_stack+0x86/0xc2
[  153.888021]  [<ffffffff810ccd3e>] print_circular_bug+0x1be/0x210
[  153.888022]  [<ffffffff810cf54c>] __lock_acquire+0x105c/0x11d0
[  153.888024]  [<ffffffff810ca83e>] ? put_lock_stats.isra.28+0xe/0x50
[  153.888025]  [<ffffffff810cfb5f>] lock_acquire+0x12f/0x2c0
[  153.888026]  [<ffffffff81300ed0>] ? jbd2_log_start_commit+0x20/0x40
[  153.888028]  [<ffffffff818e5e24>] rt_write_lock+0x34/0x50
[  153.888029]  [<ffffffff81300ed0>] ? jbd2_log_start_commit+0x20/0x40
[  153.888030]  [<ffffffff81300ed0>] jbd2_log_start_commit+0x20/0x40
[  153.888031]  [<ffffffff812f67a7>] jbd2_journal_stop+0x147/0x640
[  153.888032]  [<ffffffff812a7bbf>] ? ext4_mark_iloc_dirty+0x4df/0x7d0
[  153.888033]  [<ffffffff812d9c49>] ? __ext4_journal_get_write_access+0x49/0x90
[  153.888034]  [<ffffffff812ad718>] ? ext4_dirty_inode+0x48/0x70
[  153.888035]  [<ffffffff812d98fd>] __ext4_journal_stop+0x3d/0xa0
[  153.888036]  [<ffffffff812ad72c>] ext4_dirty_inode+0x5c/0x70
[  153.888037]  [<ffffffff81243b4a>] __mark_inode_dirty+0x29a/0x9d0
[  153.888038]  [<ffffffff8122dfb6>] generic_update_time+0x76/0xc0
[  153.888039]  [<ffffffff81230058>] touch_atime+0x88/0xa0
[  153.888040]  [<ffffffff812248c0>] iterate_dir+0x170/0x190
[  153.888041]  [<ffffffff812719b6>] compat_SyS_getdents64+0x86/0xf0
[  153.888042]  [<ffffffff8126ff80>] ? C_SYSC_newstat+0x30/0x30
[  153.888046]  [<ffffffff81181523>] ? __context_tracking_exit.part.5+0x33/0x1c0
[  153.888048]  [<ffffffff810ce100>] ? trace_hardirqs_on_caller+0x100/0x1c0
[  153.888049]  [<ffffffff810032d2>] do_fast_syscall_32+0xb2/0x2f0
[  153.888050]  [<ffffffff818e7b31>] entry_SYSENTER_compat+0x51/0x60
The only change in the ext4/jbd2 area on RT is the conversion of the bit
spinlocks in the buffer head to real spinlocks, but that is not involved
here.
So this looks like a genuine issue which might be easier to expose with RT.
Thanks,
	tglx
Powered by blists - more mailing lists
 
