linux-kernel - Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKOZuessqcjrZ4rfGLgrnOhrLnsVYiVJzOj4Aa=o3ZuZ013d0g@mail.gmail.com>
Date:   Tue, 19 Mar 2019 15:48:32 -0700
From:   Daniel Colascione <dancol@...gle.com>
To:     Christian Brauner <christian@...uner.io>
Cc:     Joel Fernandes <joel@...lfernandes.org>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Sultan Alsawaf <sultan@...neltoast.com>,
        Tim Murray <timmurray@...gle.com>,
        Michal Hocko <mhocko@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Arve Hjønnevåg <arve@...roid.com>,
        Todd Kjos <tkjos@...roid.com>,
        Martijn Coenen <maco@...roid.com>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "open list:ANDROID DRIVERS" <devel@...verdev.osuosl.org>,
        linux-mm <linux-mm@...ck.org>,
        kernel-team <kernel-team@...roid.com>,
        Oleg Nesterov <oleg@...hat.com>,
        Andy Lutomirski <luto@...capital.net>,
        "Serge E. Hallyn" <serge@...lyn.com>,
        Kees Cook <keescook@...omium.org>
Subject: Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android

On Tue, Mar 19, 2019 at 3:14 PM Christian Brauner <christian@...uner.io> wrote:
> So I dislike the idea of allocating new inodes from the procfs super
> block. I would like to avoid pinning the whole pidfd concept exclusively
> to proc. The idea is that the pidfd API will be useable through procfs
> via open("/proc/<pid>") because that is what users expect and really
> wanted to have for a long time. So it makes sense to have this working.
> But it should really be useable without it. That's why translate_pid()
> and pidfd_clone() are on the table.  What I'm saying is, once the pidfd
> api is "complete" you should be able to set CONFIG_PROCFS=N - even
> though that's crazy - and still be able to use pidfds. This is also a
> point akpm asked about when I did the pidfd_send_signal work.

I agree that you shouldn't need CONFIG_PROCFS=Y to use pidfds. One
crazy idea that I was discussing with Joel the other day is to just
make CONFIG_PROCFS=Y mandatory and provide a new get_procfs_root()
system call that returned, out of thin air and independent of the
mount table, a procfs root directory file descriptor for the caller's
PID namspace and suitable for use with openat(2).

C'mon: /proc is used by everyone today and almost every program breaks
if it's not around. The string "/proc" is already de facto kernel ABI.
Let's just drop the pretense of /proc being optional and bake it into
the kernel proper, then give programs a way to get to /proc that isn't
tied to any particular mount configuration. This way, we don't need a
translate_pid(), since callers can just use procfs to do the same
thing. (That is, if I understand correctly what translate_pid does.)

We still need a pidfd_clone() for atomicity reasons, but that's a
separate story. My goal is to be able to write a library that
transparently creates and manages a helper child process even in a
"hostile" process environment in which some other uncoordinated thread
is constantly doing a waitpid(-1) (e.g., the JVM).

> So instead of going throught proc we should probably do what David has
> been doing in the mount API and come to rely on anone_inode. So
> something like:
>
> fd = anon_inode_getfd("pidfd", &pidfd_fops, file_priv_data, flags);
>
> and stash information such as pid namespace etc. in a pidfd struct or
> something that we then can stash file->private_data of the new file.
> This also lets us avoid all this open coding done here.
> Another advantage is that anon_inodes is its own kernel-internal
> filesystem.

Sure. That works too.