lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZON8hKOAGRvTn83a@tycho.pizza>
Date:   Mon, 21 Aug 2023 09:02:28 -0600
From:   Tycho Andersen <tycho@...ho.pizza>
To:     Miklos Szeredi <miklos@...redi.hu>
Cc:     Jürg Billeter <j@...ron.ch>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        regressions@...ts.linux.dev
Subject: Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async
 fuse_flush

On Mon, Aug 21, 2023 at 04:24:00PM +0200, Miklos Szeredi wrote:
> On Tue, 15 Aug 2023 at 00:36, Tycho Andersen <tycho@...ho.pizza> wrote:
> >
> > On Mon, Aug 14, 2023 at 04:35:56PM +0200, Miklos Szeredi wrote:
> > > On Mon, 14 Aug 2023 at 16:00, Tycho Andersen <tycho@...ho.pizza> wrote:
> > >
> > > > It seems like we really do need to wait here. I guess that means we
> > > > need some kind of exit-proof wait?
> > >
> > > Could you please recap the original problem?
> >
> > Sure, the symptom is a deadlock, something like:
> >
> > # cat /proc/1528591/stack
> > [<0>] do_wait+0x156/0x2f0
> > [<0>] kernel_wait4+0x8d/0x140
> > [<0>] zap_pid_ns_processes+0x104/0x180
> > [<0>] do_exit+0xa41/0xb80
> > [<0>] do_group_exit+0x3a/0xa0
> > [<0>] __x64_sys_exit_group+0x14/0x20
> > [<0>] do_syscall_64+0x37/0xb0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > which is stuck waiting for:
> >
> > # cat /proc/1544574/stack
> > [<0>] request_wait_answer+0x12f/0x210
> > [<0>] fuse_simple_request+0x109/0x2c0
> > [<0>] fuse_flush+0x16f/0x1b0
> > [<0>] filp_close+0x27/0x70
> > [<0>] put_files_struct+0x6b/0xc0
> > [<0>] do_exit+0x360/0xb80
> > [<0>] do_group_exit+0x3a/0xa0
> > [<0>] get_signal+0x140/0x870
> > [<0>] arch_do_signal_or_restart+0xae/0x7c0
> > [<0>] exit_to_user_mode_prepare+0x10f/0x1c0
> > [<0>] syscall_exit_to_user_mode+0x26/0x40
> > [<0>] do_syscall_64+0x46/0xb0
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > I have a reproducer here:
> > https://github.com/tych0/kernel-utils/blob/master/fuse2/Makefile#L7
> 
> The issue seems to be that the server process is recursing into the
> filesystem it is serving (nested_fsync()).  It's quite easy to
> deadlock fuse this way, and I'm not sure why this would be needed for
> any server implementation.   Can you explain?

I think the idea is that they're saving snapshots of their own threads
to the fs for debugging purposes.

Whether this is a sane thing to do or not, it doesn't seem like it
should deadlock pid ns destruction.

Tycho

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ