linux-kernel - Re: possible deadlock in start_this

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210215142935.GB22417@quack2.suse.cz>
Date:   Mon, 15 Feb 2021 15:29:35 +0100
From:   Jan Kara <jack@...e.cz>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     Jan Kara <jack@...e.cz>, jack@...e.com, linux-ext4@...r.kernel.org,
        linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com,
        tytso@....edu, mhocko@...e.cz, linux-mm@...ck.org,
        syzbot <syzbot+bfdded10ab7dcd7507ae@...kaller.appspotmail.com>
Subject: Re: possible deadlock in start_this_handle (2)

On Mon 15-02-21 23:06:15, Tetsuo Handa wrote:
> On 2021/02/15 21:45, Jan Kara wrote:
> > On Sat 13-02-21 23:26:37, Tetsuo Handa wrote:
> >> Excuse me, but it seems to me that nothing prevents
> >> ext4_xattr_set_handle() from reaching ext4_xattr_inode_lookup_create()
> >> without memalloc_nofs_save() when hitting ext4_get_nojournal() path.
> >> Will you explain when ext4_get_nojournal() path is executed?
> > 
> > That's a good question but sadly I don't think that's it.
> > ext4_get_nojournal() is called when the filesystem is created without a
> > journal. In that case we also don't acquire jbd2_handle lockdep map. In the
> > syzbot report we can see:
> 
> Since syzbot can test filesystem images, syzbot might have tested a filesystem
> image created both with and without journal within this boot.

a) I think that syzbot reboots the VM between executing different tests to
get reproducible conditions. But in theory I agree the test may have
contained one image with and one image without a journal.

*but*

b) as I wrote in the email you are replying to, the jbd2_handle key is
private per filesystem. Thus for lockdep to complain about
jbd2_handle->fs_reclaim->jbd2_handle deadlock, those jbd2_handle lockdep
maps must come from the same filesystem.

*and*

c) filesystem without journal doesn't use jbd2_handle lockdep map at all so
for such filesystems lockdep creates no dependency for jbd2_handle map.

								Honza

> 
> > 
> > kswapd0/2246 is trying to acquire lock:
> > ffff888041a988e0 (jbd2_handle){++++}-{0:0}, at: start_this_handle+0xf81/0x1380 fs/jbd2/transaction.c:444
> > 
> > but task is already holding lock:
> > ffffffff8be892c0 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x0/0x30 mm/page_alloc.c:5195
> > 
> > So this filesystem has very clearly been created with a journal. Also the
> > journal lockdep tracking machinery uses:
> 
> While locks held by kswapd0/2246 are fs_reclaim, shrinker_rwsem, &type->s_umount_key#38
> and jbd2_handle, isn't the dependency lockdep considers problematic is
> 
>   Chain exists of:
>     jbd2_handle --> &ei->xattr_sem --> fs_reclaim
> 
>    Possible unsafe locking scenario:
> 
>          CPU0                    CPU1
>          ----                    ----
>     lock(fs_reclaim);
>                                  lock(&ei->xattr_sem);
>                                  lock(fs_reclaim);
>     lock(jbd2_handle);
> 
> where CPU0 is kswapd/2246 and CPU1 is the case of ext4_get_nojournal() path?
> If someone has taken jbd2_handle and &ei->xattr_sem in this order, isn't this
> dependency true?
> 
> > 
> > rwsem_acquire_read(&journal->j_trans_commit_map, 0, 0, _THIS_IP_);
> > 
> > so a lockdep key is per-filesystem. Thus it is not possible that lockdep
> > would combine lock dependencies from two different filesystems.
> > 
> > But I guess we could narrow the search for this problem by adding WARN_ONs
> > to ext4_xattr_set_handle() and ext4_xattr_inode_lookup_create() like:
> > 
> > WARN_ON(ext4_handle_valid(handle) && !(current->flags & PF_MEMALLOC_NOFS));
> > 
> > It would narrow down a place in which PF_MEMALLOC_NOFS flag isn't set
> > properly... At least that seems like the most plausible way forward to me.
> 
> You can use CONFIG_DEBUG_AID_FOR_SYZBOT for adding such WARN_ONs on linux-next.
> 
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR