linux-kernel - Re: possible deadlock in start_this

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+Z1yUxb5DQuQ4AqW2NypUkeYTmQm7rMAgG34fmdv1_CtA@mail.gmail.com>
Date:   Sat, 13 Feb 2021 11:58:34 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        Matthew Wilcox <willy@...radead.org>, Jan Kara <jack@...e.cz>,
        syzbot <syzbot+bfdded10ab7dcd7507ae@...kaller.appspotmail.com>,
        Jan Kara <jack@...e.com>, linux-ext4@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
        "Theodore Ts'o" <tytso@....edu>, Linux-MM <linux-mm@...ck.org>
Subject: Re: possible deadlock in start_this_handle (2)

On Fri, Feb 12, 2021 at 4:43 PM Michal Hocko <mhocko@...e.com> wrote:
>
> On Fri 12-02-21 21:58:15, Tetsuo Handa wrote:
> > On 2021/02/12 21:30, Michal Hocko wrote:
> > > On Fri 12-02-21 12:22:07, Matthew Wilcox wrote:
> > >> On Fri, Feb 12, 2021 at 08:18:11PM +0900, Tetsuo Handa wrote:
> > >>> On 2021/02/12 1:41, Michal Hocko wrote:
> > >>>> But I suspect we have drifted away from the original issue. I thought
> > >>>> that a simple check would help us narrow down this particular case and
> > >>>> somebody messing up from the IRQ context didn't sound like a completely
> > >>>> off.
> > >>>>
> > >>>
> > >>>  From my experience at https://lkml.kernel.org/r/201409192053.IHJ35462.JLOMOSOFFVtQFH@I-love.SAKURA.ne.jp ,
> > >>> I think we can replace direct PF_* manipulation with macros which do not receive "struct task_struct *" argument.
> > >>> Since TASK_PFA_TEST()/TASK_PFA_SET()/TASK_PFA_CLEAR() are for manipulating PFA_* flags on a remote thread, we can
> > >>> define similar ones for manipulating PF_* flags on current thread. Then, auditing dangerous users becomes easier.
> > >>
> > >> No, nobody is manipulating another task's GFP flags.
> > >
> > > Agreed. And nobody should be manipulating PF flags on remote tasks
> > > either.
> > >
> >
> > No. You are misunderstanding. The bug report above is an example of
> > manipulating PF flags on remote tasks.
>
> The bug report you are referring to is ancient. And the cpuset code
> doesn't touch task->flags for a long time. I haven't checked exactly but
> it is years since regular and atomic flags have been separated unless I
> misremember.
>
> > You say "nobody should", but the reality is "there indeed was". There
> > might be unnoticed others. The point of this proposal is to make it
> > possible to "find such unnoticed users who are manipulating PF flags
> > on remote tasks".
>
> I am really confused what you are proposing here TBH and referring to an
> ancient bug doesn't really help. task->flags are _explicitly_ documented
> to be only used for _current_. Is it possible that somebody writes a
> buggy code? Sure, should we build a whole infrastructure around that to
> catch such a broken code? I am not really sure. One bug 6 years ago
> doesn't sound like a good reason for that.

Another similar one was just reported:

https://syzkaller.appspot.com/bug?extid=1b2c6989ec12e467d65c

WARNING: possible circular locking dependency detected
5.11.0-rc7-syzkaller #0 Not tainted
------------------------------------------------------
kswapd0/2232 is trying to acquire lock:
ffff88801f552650 (sb_internal){.+.+}-{0:0}, at: evict+0x2ed/0x6b0 fs/inode.c:577

but task is already holding lock:
ffffffff8be89240 (fs_reclaim){+.+.}-{0:0}, at:
__fs_reclaim_acquire+0x0/0x30 mm/page_alloc.c:5195

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (fs_reclaim){+.+.}-{0:0}:
       __fs_reclaim_acquire mm/page_alloc.c:4326 [inline]
       fs_reclaim_acquire+0x117/0x150 mm/page_alloc.c:4340
       might_alloc include/linux/sched/mm.h:193 [inline]
       slab_pre_alloc_hook mm/slab.h:493 [inline]
       slab_alloc_node mm/slab.c:3221 [inline]
       kmem_cache_alloc_node_trace+0x48/0x520 mm/slab.c:3596
       __do_kmalloc_node mm/slab.c:3618 [inline]
       __kmalloc_node+0x38/0x60 mm/slab.c:3626
       kmalloc_node include/linux/slab.h:575 [inline]
       kvmalloc_node+0x61/0xf0 mm/util.c:587
       kvmalloc include/linux/mm.h:781 [inline]
       ext4_xattr_inode_cache_find fs/ext4/xattr.c:1465 [inline]
       ext4_xattr_inode_lookup_create fs/ext4/xattr.c:1508 [inline]
       ext4_xattr_set_entry+0x1ce6/0x3780 fs/ext4/xattr.c:1649
       ext4_xattr_ibody_set+0x78/0x2b0 fs/ext4/xattr.c:2224
       ext4_xattr_set_handle+0x8f4/0x13e0 fs/ext4/xattr.c:2380
       ext4_xattr_set+0x13a/0x340 fs/ext4/xattr.c:2493
       __vfs_setxattr+0x10e/0x170 fs/xattr.c:177
       __vfs_setxattr_noperm+0x11a/0x4c0 fs/xattr.c:208
       __vfs_setxattr_locked+0x1bf/0x250 fs/xattr.c:266
       vfs_setxattr+0x135/0x320 fs/xattr.c:291
       setxattr+0x1ff/0x290 fs/xattr.c:553
       path_setxattr+0x170/0x190 fs/xattr.c:572
       __do_sys_setxattr fs/xattr.c:587 [inline]
       __se_sys_setxattr fs/xattr.c:583 [inline]
       __x64_sys_setxattr+0xc0/0x160 fs/xattr.c:583
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46