linux-kernel - Re: [GIT PULL] pidfd updates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230425-mulmig-wandschmuck-75699afb1ecc@brauner>
Date:   Tue, 25 Apr 2023 16:36:27 +0200
From:   Christian Brauner <brauner@...nel.org>
To:     Al Viro <viro@...iv.linux.org.uk>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [GIT PULL] pidfd updates

On Tue, Apr 25, 2023 at 02:54:29PM +0100, Al Viro wrote:
> On Tue, Apr 25, 2023 at 02:34:15PM +0200, Christian Brauner wrote:
> 
> > It is rife with misunderstandings just looking at what we did in
> > kernel/fork.c earlier:
> > 
> > 	retval = get_unused_fd_flags(O_RDWR | O_CLOEXEC);
> >         [...]
> >         pidfile = anon_inode_getfile("[pidfd]", &pidfd_fops, pid,
> >                                      O_RDWR | O_CLOEXEC);
> > 
> > seeing where both get_unused_fd_flags() and both *_getfile() take flag
> > arguments. Sure, for us this is pretty straightforward since we've seen
> > that code a million times. For others it's confusing why there's two
> > flag arguments. Sure, we could use one flags argument but it still be
> > weird to look at.
> 
> First of all, get_unused_fd_flags() doesn't give a damn about O_RDWR and
> anon_inode_getfile() - about O_CLOEXEC.  Duplicating the expression in
> places like that is cargo-culting.

I distinctly remember us having that conversation about how to do this
nicely back then and fwiw this is your patch... ;)
6fd2fe494b17 ("copy_process(): don't use ksys_close() on cleanups")

So sure, that was my point: people are confused why there's two flag
arguments and what exactly has to go into them and just copy-paste
whatever other users already have.

>  
> > But with this api we also force all users to remember that they need to
> > cleanup the fd and the file - but definitely one - depending on where
> > they fail.
> > 
> > But conceptually the fd and the file belong together. After all it's the
> > file we're about to make that reserved fd refer to.
> 
> Why?  The common pattern is
> 	* reserve the descriptor
> 	* do some work that might fail
> 	* get struct file set up (which also might fail)
> 	* do more work (possibly including copyout, etc.)
> 	* install file into descriptor table
> 
> We want to reserve the descriptor early, since it's about the easiest
> thing to undo.  Why bother doing it next to file setup?  That can be
> pretty deep in call chain and doing it that way comes with headache
> about passing the damn thing around.  And cleanup rules with your
> variant end up being more complicated.
> 
> Keep the manipulations of descriptor table as close to the surface
> as possible.  The real work is with file references; descriptors
> belong in userland and passing them around kernel-side is asking
> for headache.

Imho, there's different use-cases.

There's definitely one where people allocate a file descriptor early on
and then sometimes maybe way later allocate a struct file and install
it. And that's where exposing and using get_unused_fd_flags() and
fd_install() is great and works fine.

But there's also users that do the reserve an fd and allocate a file at
the same time or receive both at the same time as the seccomp notifier,
scm creds, or pidfds. The receive_fd* stuff is basically a version of
the sketch I did without the ability to simply use a struct and not
having to open-code everything multiple times.

I never expected excitement around this as it was difficult enough to
get that receive_fd* thing going so I'm not arguing for this to be the
ultima ratio...