linux-kernel - Re: [PATCH] xfs: don't do inodgc work if task is exiting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230512000724.GH858799@frogsfrogsfrogs>
Date:   Thu, 11 May 2023 17:07:24 -0700
From:   "Darrick J. Wong" <djwong@...nel.org>
To:     Tycho Andersen <tycho@...ho.pizza>
Cc:     linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
        Tycho Andersen <tandersen@...flix.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>
Subject: Re: [PATCH] xfs: don't do inodgc work if task is exiting

On Thu, May 11, 2023 at 09:17:02AM -0600, Tycho Andersen wrote:
> From: Tycho Andersen <tandersen@...flix.com>
> 
> Similar to 5a8bee63b10f ("fuse: in fuse_flush only wait if someone wants
> the return code"), we have a task that is stuck that can't be killed, so a
> pid ns can't exit, wreaking all kinds of havoc:
> 
> INFO: task C2 CompilerThre:3546103 blocked for more than 1912 seconds.
>       Tainted: G           OE     5.15.35netflix-g54efd87a8576 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:C2 CompilerThre state:D stack:    0 pid:3546103 ppid:3546047 flags:0x00004222
> Call Trace:
>  <TASK>
>  __schedule+0x2c5/0x8d0
>  schedule+0x3a/0xa0
>  schedule_timeout+0x115/0x280
>  ? __schedule+0x2cd/0x8d0
>  wait_for_completion+0x9f/0x100
>  __flush_work+0x161/0x1f0
>  ? worker_detach_from_pool+0xb0/0xb0
>  destroy_inode+0x3b/0x70
>  __dentry_kill+0xcc/0x160
>  dput+0x141/0x2e0
>  ovl_destroy_inode+0x15/0x50 [overlay]
>  destroy_inode+0x3b/0x70
>  __dentry_kill+0xcc/0x160
>  dput+0x141/0x2e0
>  __fput+0xe1/0x250
>  task_work_run+0x73/0xb0
>  do_exit+0x37e/0xb80
>  do_group_exit+0x3a/0xa0
>  get_signal+0x140/0x870
>  ? perf_event_groups_first+0x80/0x80
>  arch_do_signal_or_restart+0xae/0x7c0
>  ? __x64_sys_futex+0x5e/0x1d0
>  ? __x64_sys_futex+0x5e/0x1d0
>  exit_to_user_mode_prepare+0x10f/0x1c0
>  syscall_exit_to_user_mode+0x26/0x40
>  do_syscall_64+0x46/0xb0
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f3295cf3cf5
> RSP: 002b:00007f327c834d00 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: fffffffffffffe00 RBX: 00007f32900bde50 RCX: 00007f3295cf3cf5
> RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f32900bde78
> RBP: 00007f327c834dd0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f32900bde74
> R13: 00007f32900bde78 R14: 00007f32900bde28 R15: 0000000000000000
>  </TASK>
> 
> The bad call path is:
> 
> xfs_fs_destroy_inode() ->
>   xfs_inode_mark_reclaimable ->
>     xfs_inodegc_queue() ->
>       xfs_inodegc_want_queue_work()
>       xfs_inodegc_want_flush_work() ->
>         flush_work() ->
>           __flush_work() ->
>             wait_for_completion()
> 
> We can avoid this task getting stuck by just not queuing the gc work from
> do_exit().
> 
> The fact that there's a lockup at all probably indicative of another xfs
> bug somewhere else that I am still looking for, but we should at least not
> generate unkillable tasks as a result.

Yeah, we just added a couple of fixpatches to 6.4 to deal with inodegc
hangs:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/xfs?id=03e0add80f4cf3f7393edb574eeb3a89a1db7758
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/xfs?id=2254a7396a0ca6309854948ee1c0a33fa4268cec

If you've got a spare machine and a reproducer, could you try applying
those two to see if the problem goes away?

If you have online fsck enabled (I hope not in a 5.15 kernel) then
turn it off or apply:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/xfs?id=2d5f38a31980d7090f5bf91021488dc61a0ba8ee

> Signed-off-by: Tycho Andersen <tandersen@...flix.com>
> CC: "Eric W. Biederman" <ebiederm@...ssion.com>
> ---
>  fs/xfs/xfs_icache.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 351849fc18ff..90e94d84f8ad 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -2011,6 +2011,9 @@ xfs_inodegc_want_queue_work(
>   *
>   * Note: If the current thread is running a transaction, we don't ever want to
>   * wait for other transactions because that could introduce a deadlock.
> + * If the task is currently exiting there is nobody to wait for
> + * the flush and it can deadlock, so we should not try to flush in this case

What do you mean by "there is nobody to wait for"?  The process state
still exists in the kernel, so the completion should wake up the exiting
process, right?

--D

> + * either.
>   */
>  static inline bool
>  xfs_inodegc_want_flush_work(
> @@ -2021,6 +2024,9 @@ xfs_inodegc_want_flush_work(
>  	if (current->journal_info)
>  		return false;
>  
> +	if (current->flags & PF_EXITING)
> +		return false;
> +
>  	if (shrinker_hits > 0)
>  		return true;
>  
> 
> base-commit: 78b421b6a7c6dbb6a213877c742af52330f5026d
> -- 
> 2.34.1
>