[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZNqseD4hqHWmeF2w@tycho.pizza>
Date: Mon, 14 Aug 2023 16:36:40 -0600
From: Tycho Andersen <tycho@...ho.pizza>
To: Miklos Szeredi <miklos@...redi.hu>
Cc: Jürg Billeter <j@...ron.ch>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
regressions@...ts.linux.dev
Subject: Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async
fuse_flush
On Mon, Aug 14, 2023 at 04:35:56PM +0200, Miklos Szeredi wrote:
> On Mon, 14 Aug 2023 at 16:00, Tycho Andersen <tycho@...ho.pizza> wrote:
>
> > It seems like we really do need to wait here. I guess that means we
> > need some kind of exit-proof wait?
>
> Could you please recap the original problem?
Sure, the symptom is a deadlock, something like:
# cat /proc/1528591/stack
[<0>] do_wait+0x156/0x2f0
[<0>] kernel_wait4+0x8d/0x140
[<0>] zap_pid_ns_processes+0x104/0x180
[<0>] do_exit+0xa41/0xb80
[<0>] do_group_exit+0x3a/0xa0
[<0>] __x64_sys_exit_group+0x14/0x20
[<0>] do_syscall_64+0x37/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
which is stuck waiting for:
# cat /proc/1544574/stack
[<0>] request_wait_answer+0x12f/0x210
[<0>] fuse_simple_request+0x109/0x2c0
[<0>] fuse_flush+0x16f/0x1b0
[<0>] filp_close+0x27/0x70
[<0>] put_files_struct+0x6b/0xc0
[<0>] do_exit+0x360/0xb80
[<0>] do_group_exit+0x3a/0xa0
[<0>] get_signal+0x140/0x870
[<0>] arch_do_signal_or_restart+0xae/0x7c0
[<0>] exit_to_user_mode_prepare+0x10f/0x1c0
[<0>] syscall_exit_to_user_mode+0x26/0x40
[<0>] do_syscall_64+0x46/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
I have a reproducer here:
https://github.com/tych0/kernel-utils/blob/master/fuse2/Makefile#L7
The problem is that the second thread has called do_exit() ->
exit_signals(), but then tries to request_wait_answer() which uses the
core wait primitives that no longer get woken up from signals due to
the code in exit_signals(). So when we try to exit the pid ns, the
whole cleanup hangs.
It seems we really do need to wait in do_exit(), otherwise we get
the behavior described in this regression...
Tycho
Powered by blists - more mailing lists