linux-kernel - Re: possible lockdep regression introduced by 4d004099a668 ("lockdep: Fix lockdep recursion")

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20201110014129.GB286534@boqun-archlinux>
Date:   Tue, 10 Nov 2020 09:41:29 +0800
From:   Boqun Feng <boqun.feng@...il.com>
To:     Filipe Manana <fdmanana@...e.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, peterz@...radead.org,
        Jan Kara <jack@...e.cz>, David Sterba <dsterba@...e.com>,
        Nikolay Borisov <nborisov@...e.com>, Qu Wenruo <wqu@...e.com>,
        darrick.wong@...cle.com
Subject: Re: possible lockdep regression introduced by 4d004099a668
 ("lockdep: Fix lockdep recursion")

On Mon, Nov 09, 2020 at 09:57:05AM +0000, Filipe Manana wrote:
> 
> 
> On 09/11/20 08:44, Boqun Feng wrote:
> > Hi Filipe,
> > 
> > On Thu, Nov 05, 2020 at 09:10:12AM +0800, Boqun Feng wrote:
> >> On Wed, Nov 04, 2020 at 07:54:40PM +0000, Filipe Manana wrote:
> >> [...]
> >>>
> >>> Ok, so I ran 5.10-rc2 plus your two patches (the fix and the debug one):
> >>>
> >>> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> >>> index b71ad8d9f1c9..b31d4ad482c7 100644
> >>> --- a/kernel/locking/lockdep.c
> >>> +++ b/kernel/locking/lockdep.c
> >>> @@ -539,8 +539,10 @@ static struct lock_trace *save_trace(void)
> >>>                 LOCK_TRACE_SIZE_IN_LONGS;
> >>>
> >>>         if (max_entries <= 0) {
> >>> -               if (!debug_locks_off_graph_unlock())
> >>> +               if (!debug_locks_off_graph_unlock()) {
> >>> +                       WARN_ON_ONCE(1);
> >>>                         return NULL;
> >>> +               }
> >>>
> >>>                 print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too low!");
> >>>                 dump_stack();
> >>> @@ -5465,7 +5467,7 @@ noinstr int lock_is_held_type(const struct
> >>> lockdep_map *lock, int read)
> >>>         unsigned long flags;
> >>>         int ret = 0;
> >>>
> >>> -       if (unlikely(!lockdep_enabled()))
> >>> +       if (unlikely(debug_locks && !lockdep_enabled()))
> >>>                 return 1; /* avoid false negative lockdep_assert_held() */
> >>>
> >>>         raw_local_irq_save(flags);
> >>>
> >>> With 3 runs of all fstests, the WARN_ON_ONCE(1) wasn't triggered.
> >>> Unexpected, right?
> >>>
> >>
> >> Kinda, that means we still don't know why lockdep was turned off.
> >>
> >>> Should I try something else?
> >>>
> >>
> >> Thanks for trying this. Let me set up the reproducer on my side, and see
> >> if I could get more information.
> >>
> > 
> > I could hit this with btrfs/187, and when we hit it, lockdep will report
> > the deadlock and turn off, and I think this is the root cause for your
> > hitting the original problem, I will add some analysis after the lockdep
> > splat.
> > 
> > [12295.973309] ============================================
> > [12295.974770] WARNING: possible recursive locking detected
> > [12295.974770] 5.10.0-rc2-btrfs-next-71 #20 Not tainted
> > [12295.974770] --------------------------------------------
> > [12295.974770] zsh/701247 is trying to acquire lock:
> > [12295.974770] ffff92cef43480b8 (&eb->lock){++++}-{2:2}, at: btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> > [12295.974770] 
> >                but task is already holding lock:
> > [12295.974770] ffff92cef434a038 (&eb->lock){++++}-{2:2}, at: btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> > [12295.974770] 
> >                other info that might help us debug this:
> > [12295.974770]  Possible unsafe locking scenario:
> > 
> > [12295.974770]        CPU0
> > [12295.974770]        ----
> > [12295.974770]   lock(&eb->lock);
> > [12295.974770]   lock(&eb->lock);
> > [12295.974770] 
> >                 *** DEADLOCK ***
> > 
> > [12295.974770]  May be due to missing lock nesting notation
> > 
> > [12295.974770] 2 locks held by zsh/701247:
> > [12295.974770]  #0: ffff92cef3d315b0 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0xaa/0x920
> > [12295.974770]  #1: ffff92cef434a038 (&eb->lock){++++}-{2:2}, at: btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> > [12295.974770] 
> >                stack backtrace:
> > [12295.974770] CPU: 6 PID: 701247 Comm: zsh Not tainted 5.10.0-rc2-btrfs-next-71 #20
> > [12295.974770] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.0 12/17/2019
> > [12295.974770] Call Trace:
> > [12295.974770]  dump_stack+0x8b/0xb0
> > [12295.974770]  __lock_acquire.cold+0x175/0x2e9
> > [12295.974770]  lock_acquire+0x15b/0x490
> > [12295.974770]  ? btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> > [12295.974770]  ? read_block_for_search+0xf4/0x350 [btrfs]
> > [12295.974770]  _raw_read_lock+0x40/0xa0
> > [12295.974770]  ? btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> > [12295.974770]  btrfs_tree_read_lock_atomic+0x34/0x140 [btrfs]
> > [12295.974770]  btrfs_search_slot+0x6ac/0xca0 [btrfs]
> > [12295.974770]  btrfs_lookup_xattr+0x7d/0xd0 [btrfs]
> > [12295.974770]  btrfs_getxattr+0x67/0x130 [btrfs]
> > [12295.974770]  __vfs_getxattr+0x53/0x70
> > [12295.974770]  get_vfs_caps_from_disk+0x68/0x1a0
> > [12295.974770]  ? sched_clock_cpu+0x114/0x180
> > [12295.974770]  cap_bprm_creds_from_file+0x181/0x6c0
> > [12295.974770]  security_bprm_creds_from_file+0x2a/0x40
> > [12295.974770]  begin_new_exec+0xf4/0xc40
> > [12295.974770]  ? load_elf_phdrs+0x6b/0xb0
> > [12295.974770]  load_elf_binary+0x66b/0x1620
> > [12295.974770]  ? read_hv_sched_clock_tsc+0x5/0x20
> > [12295.974770]  ? sched_clock+0x5/0x10
> > [12295.974770]  ? sched_clock_local+0x12/0x80
> > [12295.974770]  ? sched_clock_cpu+0x114/0x180
> > [12295.974770]  bprm_execve+0x3ce/0x920
> > [12295.974770]  do_execveat_common+0x1b0/0x1f0
> > [12295.974770]  __x64_sys_execve+0x39/0x50
> > [12295.974770]  do_syscall_64+0x33/0x80
> > [12295.974770]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [12295.974770] RIP: 0033:0x7f6aaefc13cb
> > [12295.974770] Code: 48 3d 00 f0 ff ff 76 e7 f7 d8 64 41 89 00 eb df 0f 1f 80 00 00 00 00 f7 d8 64 41 89 00 eb dc f3 0f 1e fa b8 3b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 4a 0f 00 f7 d8 64 89 01 48
> > [12295.974770] RSP: 002b:00007ffd33b54d58 EFLAGS: 00000207 ORIG_RAX: 000000000000003b
> > [12295.974770] RAX: ffffffffffffffda RBX: 00007f6aaf28bc88 RCX: 00007f6aaefc13cb
> > [12295.974770] RDX: 00007ffd33b5fd98 RSI: 00007f6aaf28bc88 RDI: 00007ffd33b55280
> > [12295.974770] RBP: 00007ffd33b54d80 R08: 00007ffd33b54ce0 R09: 0000000000000001
> > [12295.974770] R10: 0000000000000008 R11: 0000000000000207 R12: 00007ffd33b55280
> > 
> > The deadlock senario reported by this splat is as follow:
> > 
> > 	CPU 0					CPU 2
> > 	=====					=====
> > 	btrfs_search_slot():
> > 	  btrfs_tree_read_lock_atomic():
> > 	    read_lock(&eb->lock);		<a random writer>
> > 						write_lock(&eb->lock);
> > 						// waiting for the lock
> > 	  ...
> > 	  btrfs_tree_read_lock_atomic():
> > 	    read_lock(&eb->lock); // blocked because the fairness.
> > 
> > In short, you can not use nested read_lock() on the same lock. However,
> > I'm not sure whether we have the real problem here, because I don't
> > think btrfs_search_slot() can pick the same extent buffer twice in
> > btrfs_search_slot(). Also copy the author of the code for more
> > information.
> 
> Ah yes. We have several lockdep splats for which the fixes are not in
> 5.10-rcs yet. Some may be already in the integration branch [1] and
> others not yet in any tree, but can be found in the btrfs mailing list.
> 
> For that particular one, it's fixed by the following patch:
> 
> https://lore.kernel.org/linux-btrfs/1cee2922a32c305056a9559ccf7aede49777beae.1604591048.git.josef@toxicpanda.com/
> 
> (It belongs to a series)
> 
> You'll likely see more different lockdep splats, this is due to a
> transition from custom btree locks to rw semaphores that is in progress,
> and some preparatory work for that is already in 5.10-rcs.
> 

Thanks for the information!

> Btw, I could hit the fs freeze deadlock even when lockdep didn't produce
> any previous splat.
> 

And this is the weird part, which means lockdep get turned off silently.
That probably means there is a bug (either in fs code or lockdep) that
we don't know, and my previous "fix" although works as per your testing,
but it may hide the real problem...

So I just send out another fix, which is similar to my previous one, but
leave a warning so that once we hit the problem we can be warned and not
miss the silent lockdep turn-off.

Regards,
Boqun

> Jan was hitting it too with ext4, and Darrick with xfs - though I can't
> tell if they hit any lockdep splat before hitting the freeze deadlock.
> 
> Thanks for reporting it.
> 
> [1] https://github.com/kdave/btrfs-devel/commits/misc-next
> 
> > 
> > Regards,
> > Boqun
> > 
> >> Regards,
> >> Boqun
> >>
> >>> Thanks.
> >>>
> >>>
> >