linux-ext4 - Re: [syzbot] possible deadlock in jbd2_journal_lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20221014132543.i3aiyx4ent4qwy4i@quack3>
Date:   Fri, 14 Oct 2022 15:25:43 +0200
From:   Jan Kara <jack@...e.cz>
To:     Thilo Fromm <t-lo@...ux.microsoft.com>
Cc:     Jan Kara <jack@...e.cz>, Ye Bin <yebin10@...wei.com>,
        jack@...e.com, tytso@....edu, linux-ext4@...r.kernel.org,
        regressions@...ts.linux.dev,
        Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com>
Subject: Re: [syzbot] possible deadlock in jbd2_journal_lock_updates

Hello!

On Fri 14-10-22 08:42:57, Thilo Fromm wrote:
> Just want to make sure this does not get lost - as mentioned earlier,
> reverting 51ae846cff5 leads to a kernel build that does not have this issue.

Yes, I'm aware of this and still cannot quite wrap my head how it could be
given the stacktraces I see :) They do not seem to come anywhere near that
code...

> > Sure, I think this worked fine. It's the buffer lock but right before it we're
> > opening a journal transaction. Symbolized it looks like this:
> > 
> >    ext4_mark_iloc_dirty (include/linux/buffer_head.h:308 fs/ext4/inode.c:5712) ext4
> >    __schedule (kernel/sched/core.c:4994 kernel/sched/core.c:6341)
> >    _raw_spin_lock_irqsave (arch/x86/include/asm/paravirt.h:585 arch/x86/include/asm/qspinlock.h:51 include/asm-generic/qspinlock.h:85 include/linux/spinlock.h:199 include/linux/spinlock_api_smp.h:119 kernel/locking/spinlock.c:162)
> >    __ext4_journal_start_sb (fs/ext4/ext4_jbd2.c:105) ext4
> >    __wait_on_bit_lock (arch/x86/include/asm/bitops.h:214 include/asm-generic/bitops/instrumented-non-atomic.h:135 kernel/sched/wait_bit.c:89)
> >    out_of_line_wait_on_bit_lock (kernel/sched/wait_bit.c:118)
> >    var_wake_function (kernel/sched/wait_bit.c:22)
> >    ext4_xattr_block_set (include/linux/buffer_head.h:391 fs/ext4/xattr.c:2019) ext4
> >    ext4_xattr_set_handle (fs/ext4/xattr.c:2395) ext4
> >    ext4_initxattrs (fs/ext4/xattr_security.c:48) ext4
> >    security_inode_init_security (security/security.c:1114)
> >    ext4_init_acl (fs/ext4/xattr_security.c:38) ext4
> >    __ext4_new_inode (fs/ext4/ialloc.c:1325) ext4
> >    ext4_create (fs/ext4/namei.c:2796) ext4
> >    path_openat (fs/namei.c:3334 fs/namei.c:3404 fs/namei.c:3612)
> >    do_filp_open (fs/namei.c:3642)
> >    vfs_statx (include/linux/namei.h:57 fs/stat.c:221)
> >    __check_object_size (mm/usercopy.c:240 mm/usercopy.c:286 mm/usercopy.c:256)
> >    do_sys_openat2 (fs/open.c:1214)
> >    __x64_sys_openat (fs/open.c:1241)
> >    do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
> >    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:118)
> 
> Is the symbolised stack trace Jeremi sent helpful to get to the bottom of
> this issue? Can we do anything else to help?

Yes, thanks for the symbolized stacktraces and sorry for the delay. It made
it clear we are hanging on buffer lock. So far I still don't understand the
deadlock scenario (in particular who can be holding the buffer locked)
and I'm busy with something else at SUSE to seriously dwelve into this but
I'll get back to you :).

								Honza

-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR