[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160901180424.GC10153@twins.programming.kicks-ass.net>
Date: Thu, 1 Sep 2016 20:04:24 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: eparis@...hat.com, LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>,
syzkaller <syzkaller@...glegroups.com>,
Kostya Serebryany <kcc@...gle.com>,
Alexander Potapenko <glider@...gle.com>
Subject: Re: fanotify: unkillable hanged processes
On Thu, Sep 01, 2016 at 07:53:44PM +0200, Dmitry Vyukov wrote:
> Hello,
>
> The following program:
>
> https://gist.githubusercontent.com/dvyukov/0952eeac71069b46b3fe0e28bd1a02bf/raw/396b9dcce2636cecab1a4161c15d3f066e6ef639/gistfile1.txt
>
> if run in a parallel loop creates unkillable hanged processes:
>
> -bash-4.3# ps afxu | grep a.out
> root 4489 0.0 0.0 8868 340 ttyp1 S+ 17:19 0:00
> \_ grep a.out
> root 4423 0.0 0.0 0 0 ttyp0 D 17:16 0:00 [a.out]
> root 4424 0.0 0.0 0 0 ttyp0 D 17:16 0:00 [a.out]
> root 4425 0.0 0.0 0 0 ttyp0 D 17:16 0:00 [a.out]
> root 4470 0.0 0.1 7016 2316 ttyp0 D 17:16 0:00
> ./stress ./a.out
>
> This looks like a classical deadlock, but LOCKDEP is silent so
> +LOCKDEP maintainers.
fanotify_get_response() uses wait_event(), that's an asymmetric API and
therefore one cannot include it in block-on chains. That is, while it
blocks, it doesn't know who it blocks on etc..
> Most of the processes are hanged at (from /proc/pid/stack):
>
> [<ffffffff814dd6c8>] __synchronize_srcu+0x248/0x380 kernel/rcu/srcu.c:448
> [<ffffffff814dd81e>] synchronize_srcu+0x1e/0x40 kernel/rcu/srcu.c:492
> [<ffffffff819448d7>] fsnotify_mark_destroy_list+0x107/0x370 fs/notify/mark.c:551
> [<ffffffff8194163e>] fsnotify_destroy_group+0x1e/0xc0 fs/notify/group.c:57
> [<ffffffff8194a84b>] fanotify_release+0x20b/0x2d0
> fs/notify/fanotify/fanotify_user.c:392
> [<ffffffff81868bbc>] __fput+0x28c/0x780 fs/file_table.c:208
> [<ffffffff81869135>] ____fput+0x15/0x20 fs/file_table.c:244
> [<ffffffff813ebf63>] task_work_run+0xf3/0x170 kernel/task_work.c:116
> [< inline >] exit_task_work ./include/linux/task_work.h:21
> [<ffffffff81394218>] do_exit+0x868/0x2e70 kernel/exit.c:828
> [<ffffffff81396998>] do_group_exit+0x108/0x330 kernel/exit.c:958
> [< inline >] SYSC_exit_group kernel/exit.c:969
> [<ffffffff81396bdd>] SyS_exit_group+0x1d/0x20 kernel/exit.c:967
> [<ffffffff86e10700>] entry_SYSCALL_64_fastpath+0x23/0xc1
> arch/x86/entry/entry_64.S:208
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> One process holds the srcu lock:
>
> stress D ffffffff86dfe09a 28024 4470 1 0x00000004
> ffff880064d9c440 0000000000000000 ffff8800686ab0c0 ffff88006cecc080
> ffff88006d322bd8 ffff8800653f7630 ffffffff86dfe09a ffff8800653f76e0
> 0000000000000282 ffff88006d323568 ffff88006d323540 ffff880064d9c448
> Call Trace:
> [<ffffffff86dff497>] schedule+0x97/0x1c0 kernel/sched/core.c:3414
> [< inline >] fanotify_get_response fs/notify/fanotify/fanotify.c:70
> [<ffffffff8194a347>] fanotify_handle_event+0x537/0x830
> fs/notify/fanotify/fanotify.c:233
> [< inline >] send_to_group fs/notify/fsnotify.c:179
> [<ffffffff8193fc8d>] fsnotify+0x73d/0x1020 fs/notify/fsnotify.c:275
> [< inline >] fsnotify_perm ./include/linux/fsnotify.h:55
> [<ffffffff82acefb1>] security_file_open+0x151/0x190 security/security.c:887
> [<ffffffff8185a8db>] do_dentry_open+0x2ab/0xd30 fs/open.c:736
> [<ffffffff8185e445>] vfs_open+0x105/0x220 fs/open.c:860
> [< inline >] do_last fs/namei.c:3374
> [<ffffffff81895909>] path_openat+0x12f9/0x2ab0 fs/namei.c:3497
> [<ffffffff8189a7ac>] do_filp_open+0x18c/0x250 fs/namei.c:3532
> [<ffffffff81873528>] do_open_execat+0xe8/0x4d0 fs/exec.c:818
> [<ffffffff8187a18f>] do_execveat_common.isra.35+0x71f/0x1d80 fs/exec.c:1679
> [< inline >] do_execve fs/exec.c:1783
> [< inline >] SYSC_execve fs/exec.c:1864
> [<ffffffff8187c262>] SyS_execve+0x42/0x50 fs/exec.c:1859
> [<ffffffff810088ff>] do_syscall_64+0x1df/0x640 arch/x86/entry/common.c:288
> [<ffffffff86e107c3>] entry_SYSCALL64_slow_path+0x25/0x25
> arch/x86/entry/entry_64.S:249
>
>
> [ 467.548485] Showing all locks held in the system:
> [ 467.548981] 2 locks held by bash/4044:
> [ 467.549313] #0: (&tty->ldisc_sem){.+.+.+}, at:
> [<ffffffff86e0f7f7>] ldsem_down_read+0x37/0x40
> [ 467.550076] #1: (&ldata->atomic_read_lock){+.+...}, at:
> [<ffffffff8322ec95>] n_tty_read+0x1e5/0x1860
> [ 467.550897] 3 locks held by bash/4062:
> [ 467.551210] #0: (sb_writers#5){.+.+.+}, at: [<ffffffff81863c04>]
> vfs_write+0x3a4/0x4e0
> [ 467.551923] #1: (rcu_read_lock){......}, at: [<ffffffff83245c30>]
> __handle_sysrq+0x0/0x4d0
> [ 467.552655] #2: (tasklist_lock){.+.+..}, at: [<ffffffff81496fc4>]
> debug_show_all_locks+0x74/0x290
> [ 467.553454] 2 locks held by stress/4470:
> [ 467.553754] #0: (&sig->cred_guard_mutex){+.+.+.}, at:
> [<ffffffff818799b3>] prepare_bprm_creds+0x53/0x110
> [ 467.554590] #1: (&fsnotify_mark_srcu){......}, at:
> [<ffffffff8193f71d>] fsnotify+0x1cd/0x1020
So not quite enough information, what is bash/4044's stack trace?
Because that is holding the fanotify srcu reference.
But it looks like a scenario where everyone is waiting for SRCU to
complete, while the task holding up SRCU completion is waiting for
something else.
But I'm not at all familiar with fanotify.
Powered by blists - more mailing lists