linux-kernel - Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async fuse

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ZO4t6pnCokUoEsoi@tycho.pizza>
Date:   Tue, 29 Aug 2023 11:42:02 -0600
From:   Tycho Andersen <tycho@...ho.pizza>
To:     Miklos Szeredi <miklos@...redi.hu>
Cc:     Jürg Billeter <j@...ron.ch>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        regressions@...ts.linux.dev
Subject: Re: [REGRESSION] fuse: execve() fails with ETXTBSY due to async
 fuse_flush

On Mon, Aug 21, 2023 at 05:31:48PM +0200, Miklos Szeredi wrote:

(Apologies for the delay, I have been away without cell signal for
some time.)

> > I think the idea is that they're saving snapshots of their own threads
> > to the fs for debugging purposes.
> 
> This seems a fairly special situation.   Have they (whoever they may
> be) thought about fixing this in their server?

Sorry, "we" here is some internal team that works for my employer
Netflix. We can't use imap clients on our corporate e-mails, whee.

> > Whether this is a sane thing to do or not, it doesn't seem like it
> > should deadlock pid ns destruction.
> 
> True.   So the suggested solution is to allow wait_event_killable() to
> return if a terminal signal is pending in the exiting state and only
> in that case turn the flush into a background request?  That would
> still allow for regressions like the one reported, but that would be
> much less likely to happen in real life.  Okay, I said this for the
> original solution as well, so this may turn out to be wrong as well.

I wonder if there's room here for a completion that doesn't use the
wait primitives. Something like an atomic + queuing in task_work()
would both fix this bug and not exhibit this regression, IIUC.

> Anyway, I'd prefer if this was fixed in the server code, as it looks
> fairly special and adding complexity to the kernel for this case might
> not be justifiable.   But I'm also open to suggestions on fixing this
> in the kernel in a not too complex manner.

I don't think this is specific to the server-accessing-its-own-file
case. My reproducer uses that because I didn't quite understand the
bug fully at the time. I believe that *any* task that is killed with
an inflight fuse request will exhibit this. We have seen this fairly
rarely on another fuse fs we use throughout the fleet:
https://github.com/lxc/lxcfs and it doesn't really do anything
strange, and is mounted from the host's pid ns.

Tycho