linux-kernel - Re: [RESEND PATCH] prctl: add PR_[GS]ET_PDEATHSIG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8760bwb04k.fsf@xmission.com>
Date:   Tue, 03 Oct 2017 12:40:43 -0500
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Jürg Billeter <j@...ron.ch>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Oleg Nesterov <oleg@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Michael Kerrisk <mtk.manpages@...il.com>,
        Filipe Brandenburger <filbranden@...gle.com>,
        David Wilcox <davidvsthegiant@...il.com>, hansecke@...il.com,
        linux-kernel@...r.kernel.org
Subject: Re: [RESEND PATCH] prctl: add PR_[GS]ET_PDEATHSIG_PROC

Jürg Billeter <j@...ron.ch> writes:

> On Tue, 2017-10-03 at 09:46 -0500, Eric W. Biederman wrote:
>> There is a general need to find out about the death of other processes,
>> if you are not the parent of the process.   I would be inclined to call
>> it waitfd.  Something that you give a pid.  It performs a permission
>> check and the pid becomes readable when the process dies.  With poll
>> working on the fd, and the fd returning wstatus of the dead child.
>> 
>> Support SIGIO on the fd and you have a signal delivery mechanism,
>> if you want it.
>
> File descriptors for processes (waitfd/clonefd) are definitely
> interesting.  Especially if reaping the process (and reparenting its
> children) is delayed until the last process file descriptor is closed. 
> However, this would be a much larger addition and also less intuitive
> to use if all you want is killing the process tree.
>
>> For the kill all children when the parent dies the mechanism you are
>> proposing is escapable.  We already have an inescapable version of it
>> with init in a pid namespace.  We already have an escapable version of
>> it with orphaned process groups and SIGHUP.
>> 
>> So I would really appreciate a very clear use case for what we are
>> building here.  As it appears the killing of children can already be
>> done another way, and that the waiting for the parent can be done better
>> another way.
>
> My use case is to provide a way for a process to spawn a child and
> ensure that no descendants survive when that child dies.  Avoiding
> runaway processes is desirable in many situations.  My motivation is
> very lightweight (nested) sandboxing (every process is potentially
> sandboxed).
>
> I.e., pid namespaces would be a pretty good fit (assuming they are
> sufficiently lightweight) but CLONE_NEWPID requires CAP_SYS_ADMIN. 
> User namespaces can help here, but creating tons of user namespaces
> just for this doesn't sound sensible. MAX_PID_NS_LEVEL could be an
> issue as well at some point but 32 levels are likely fine in practice.
>
> For my particular scenario I may actually be able to create a single
> user namespace, run all processes with (namespaced) CAP_SYS_ADMIN and
> use CLONE_NEWPID for every process.  However, I would prefer not
> requiring CAP_SYS_ADMIN and a regular application that wants to avoid
> runaway processes for a spawned helper process cannot rely on
> CAP_SYS_ADMIN.
>
> My plan was to use PR_SET_PDEATHSIG_PROC with PR_NO_NEW_PRIVS and a
> suitable seccomp filter to prevent changes to pdeath_signal_proc.  For
> my SIGKILL use case it would be even better to simply require
> PR_NO_NEW_PRIVS and make pdeath_signal_proc sticky, avoiding the need
> for seccomp.  I wanted to keep the differences to the existing
> PR_SET_PDEATHSIG minimal but if we argue that the non-SIGKILL use case
> is better solved with waitfd (or maybe the process events connector),
> we could tailor the prctl for the SIGKILL use case (or support both via
> prctl arg3).
>
> I have another small patch locally that adds a prctl that restricts
> kill(2) to direct children of the current thread group for lightweight
> sandboxing.  That would also be redundant if it was possible to use
> CLONE_NEWPID for every process.

I believe the current default limits allow using CLONE_NEWPID for every
process.  The data structures seem light enough as well.

> What's actually the reason that CLONE_NEWPID requires CAP_SYS_ADMIN? 
> Does CLONE_NEWPID pose any risks that don't exist for
> CLONE_NEWUSER|CLONE_NEWPID?  Assuming we can't simply drop the
> CAP_SYS_ADMIN requirement, do you see a better solution for this use
> case?

CLONE_NEWPID without a permission check would allow runing a setuid root
application in a pid namespace.  Off the top of my head I can't think of
a really good exploit.  But when you mess up pid files, and hide
information from a privileged application I can completely imagine
forcing that application to misbehave in ways the attacker can control.
Leading to bad things.

Eric