linux-kernel - Re: [PATCH 1/2] seccomp: notify user trap about unused filter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200527224501.jddwcmvtvjtjsmsx@wittgenstein>
Date:   Thu, 28 May 2020 00:45:01 +0200
From:   Christian Brauner <christian.brauner@...ntu.com>
To:     Kees Cook <keescook@...omium.org>
Cc:     linux-kernel@...r.kernel.org, Andy Lutomirski <luto@...nel.org>,
        Tycho Andersen <tycho@...ho.ws>,
        Matt Denton <mpdenton@...gle.com>,
        Sargun Dhillon <sargun@...gun.me>,
        Jann Horn <jannh@...gle.com>, Chris Palmer <palmer@...gle.com>,
        Aleksa Sarai <cyphar@...har.com>,
        Robert Sesek <rsesek@...gle.com>,
        Jeffrey Vander Stoep <jeffv@...gle.com>,
        Linux Containers <containers@...ts.linux-foundation.org>
Subject: Re: [PATCH 1/2] seccomp: notify user trap about unused filter

On Wed, May 27, 2020 at 03:37:58PM -0700, Kees Cook wrote:
> On Thu, May 28, 2020 at 12:05:32AM +0200, Christian Brauner wrote:
> > The main question also is, is there precedence where the kernel just
> > closes the file descriptor for userspace behind it's back? I'm not sure
> > I've heard of this before. That's not how that works afaict; it's also
> > not how we do pidfds. We don't just close the fd when the task
> > associated with it goes away, we notify and then userspace can close.
> 
> But there's a mapping between pidfd and task struct that is separate
> from task struct itself, yes? I.e. keeping a pidfd open doesn't pin
> struct task in memory forever, right?

No, but that's an implementation detail and we discussed that. It pins
struct pid instead of task_struct. Once the process is fully gone you
just get ESRCH.
For example, fds to /proc/<pid>/<tid>/ fds aren't just closed once the
task has gone away, userspace will just get ESRCH when it tries to open
files under there but the fd remains valid until close() is called.

In addition, of all the anon inode fds, none of them have the "close the
file behind userspace back" behavior: io_uring, signalfd, timerfd, btf,
perf_event, bpf-prog, bpf-link, bpf-map, pidfd, userfaultfd, fanotify,
inotify, eventpoll, fscontext, eventfd. These are just core kernel ones.
I'm pretty sure that it'd be very odd behavior if we did that. I'd
rather just notify userspace and leave the close to them. But maybe I'm
missing something.

Christian