[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKOZuessqcjrZ4rfGLgrnOhrLnsVYiVJzOj4Aa=o3ZuZ013d0g@mail.gmail.com>
Date: Tue, 19 Mar 2019 15:48:32 -0700
From: Daniel Colascione <dancol@...gle.com>
To: Christian Brauner <christian@...uner.io>
Cc: Joel Fernandes <joel@...lfernandes.org>,
Suren Baghdasaryan <surenb@...gle.com>,
Steven Rostedt <rostedt@...dmis.org>,
Sultan Alsawaf <sultan@...neltoast.com>,
Tim Murray <timmurray@...gle.com>,
Michal Hocko <mhocko@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Arve Hjønnevåg <arve@...roid.com>,
Todd Kjos <tkjos@...roid.com>,
Martijn Coenen <maco@...roid.com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>,
"open list:ANDROID DRIVERS" <devel@...verdev.osuosl.org>,
linux-mm <linux-mm@...ck.org>,
kernel-team <kernel-team@...roid.com>,
Oleg Nesterov <oleg@...hat.com>,
Andy Lutomirski <luto@...capital.net>,
"Serge E. Hallyn" <serge@...lyn.com>,
Kees Cook <keescook@...omium.org>
Subject: Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android
On Tue, Mar 19, 2019 at 3:14 PM Christian Brauner <christian@...uner.io> wrote:
> So I dislike the idea of allocating new inodes from the procfs super
> block. I would like to avoid pinning the whole pidfd concept exclusively
> to proc. The idea is that the pidfd API will be useable through procfs
> via open("/proc/<pid>") because that is what users expect and really
> wanted to have for a long time. So it makes sense to have this working.
> But it should really be useable without it. That's why translate_pid()
> and pidfd_clone() are on the table. What I'm saying is, once the pidfd
> api is "complete" you should be able to set CONFIG_PROCFS=N - even
> though that's crazy - and still be able to use pidfds. This is also a
> point akpm asked about when I did the pidfd_send_signal work.
I agree that you shouldn't need CONFIG_PROCFS=Y to use pidfds. One
crazy idea that I was discussing with Joel the other day is to just
make CONFIG_PROCFS=Y mandatory and provide a new get_procfs_root()
system call that returned, out of thin air and independent of the
mount table, a procfs root directory file descriptor for the caller's
PID namspace and suitable for use with openat(2).
C'mon: /proc is used by everyone today and almost every program breaks
if it's not around. The string "/proc" is already de facto kernel ABI.
Let's just drop the pretense of /proc being optional and bake it into
the kernel proper, then give programs a way to get to /proc that isn't
tied to any particular mount configuration. This way, we don't need a
translate_pid(), since callers can just use procfs to do the same
thing. (That is, if I understand correctly what translate_pid does.)
We still need a pidfd_clone() for atomicity reasons, but that's a
separate story. My goal is to be able to write a library that
transparently creates and manages a helper child process even in a
"hostile" process environment in which some other uncoordinated thread
is constantly doing a waitpid(-1) (e.g., the JVM).
> So instead of going throught proc we should probably do what David has
> been doing in the mount API and come to rely on anone_inode. So
> something like:
>
> fd = anon_inode_getfd("pidfd", &pidfd_fops, file_priv_data, flags);
>
> and stash information such as pid namespace etc. in a pidfd struct or
> something that we then can stash file->private_data of the new file.
> This also lets us avoid all this open coding done here.
> Another advantage is that anon_inodes is its own kernel-internal
> filesystem.
Sure. That works too.
Powered by blists - more mailing lists