[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2026-01-14-pushy-weedy-weapons-linguini-Vz3I33@cyphar.com>
Date: Wed, 14 Jan 2026 22:18:38 +0100
From: Aleksa Sarai <cyphar@...har.com>
To: Andy Lutomirski <luto@...capital.net>
Cc: Christian Brauner <brauner@...nel.org>,
Florian Weimer <fweimer@...hat.com>, linux-fsdevel@...r.kernel.org, linux-api@...r.kernel.org,
linux-kernel@...r.kernel.org, Al Viro <viro@...iv.linux.org.uk>,
David Howells <dhowells@...hat.com>, DJ Delorie <dj@...hat.com>
Subject: Re: O_CLOEXEC use for OPEN_TREE_CLOEXEC
On 2026-01-14, Andy Lutomirski <luto@...capital.net> wrote:
> On Wed, Jan 14, 2026 at 8:09 AM Christian Brauner <brauner@...nel.org> wrote:
> >
> > On Tue, Jan 13, 2026 at 11:40:55PM +0100, Florian Weimer wrote:
> > > In <linux/mount.h>, we have this:
> > >
> > > #define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */
> > >
> > > This causes a few pain points for us to on the glibc side when we mirror
> > > this into <linux/mount.h> becuse O_CLOEXEC is defined in <fcntl.h>,
> > > which is one of the headers that's completely incompatible with the UAPI
> > > headers.
> > >
> > > The reason why this is painful is because O_CLOEXEC has at least three
> > > different values across architectures: 0x80000, 0x200000, 0x400000
> > >
> > > Even for the UAPI this isn't ideal because it effectively burns three
> > > open_tree flags, unless the flags are made architecture-specific, too.
> >
> > I think that just got cargo-culted... A long time ago some API define as
> > O_CLOEXEC and now a lot of APIs have done the same. I'm pretty sure we
> > can't change that now but we can document that this shouldn't be ifdefed
> > and instead be a separate per-syscall bit. But I think that's the best
> > we can do right now.
> >
>
> How about, for future syscalls, we make CLOEXEC unconditional? If
> anyone wants an ofd to get inherited across exec, they can F_SETFD it
> themselves.
I believe newer interfaces have already started doing that (e.g., all of
the pidfd stuff is O_CLOEXEC by default) but we should definitely update
the documentation in Documentation/process/adding-syscalls.rst to stop
recommending the inclusion of the O_CLOEXEC flag.
The funniest thing about open_tree(2) is that it actually borrows flag
bits from three distinct namespaces! It has an OPEN_TREE_* namespace,
the AT_* namespace (which now has a concept of "per-syscall flags"), and
O_CLOEXEC. What a fun interface!
--
Aleksa Sarai
https://www.cyphar.com/
Download attachment "signature.asc" of type "application/pgp-signature" (266 bytes)
Powered by blists - more mailing lists