linux-kernel - Re: [RFC PATCH] Minimal non-child process exit notification support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEXW_YS-nTccFtuE9DMkFYA7wXAfeRHt7v4KSZcXMWVYKm+Uyg@mail.gmail.com>
Date:   Wed, 31 Oct 2018 07:41:44 -0700
From:   Joel Fernandes <joel@...lfernandes.org>
To:     David Laight <David.Laight@...lab.com>
Cc:     Daniel Colascione <dancol@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "timmurray@...gle.com" <timmurray@...gle.com>,
        "joelaf@...gle.com" <joelaf@...gle.com>
Subject: Re: [RFC PATCH] Minimal non-child process exit notification support

On Wed, Oct 31, 2018 at 7:25 AM, David Laight <David.Laight@...lab.com> wrote:
> From: Daniel Colascione
>> Sent: 31 October 2018 12:56
>> On Wed, Oct 31, 2018 at 12:27 PM, David Laight <David.Laight@...lab.com> wrote:
>> > From: Daniel Colascione
>> >> Sent: 29 October 2018 17:53
>> >>
>> >> This patch adds a new file under /proc/pid, /proc/pid/exithand.
>> >> Attempting to read from an exithand file will block until the
>> >> corresponding process exits, at which point the read will successfully
>> >> complete with EOF.  The file descriptor supports both blocking
>> >> operations and poll(2). It's intended to be a minimal interface for
>> >> allowing a program to wait for the exit of a process that is not one
>> >> of its children.
>> >
>> > Why do you need an extra file?
>>
>> Because no current file suffices.
>
> That doesn't stop you making something work on any/all of the existing files.
>
>> > It ought to be possible to use poll() to wait for POLLERR having set
>> > 'events' to zero on any of the nodes in /proc/pid - or even on
>> > the directory itself.
>>
>> That doesn't actually work today. And waiting on a directory with
>> POLLERR would be very weird, since directories in general don't do
>> things like blocking reads or poll support. A separate file with
>> self-contained, well-defined semantics is cleaner.
>
> Device drivers will (well ought to) return POLLERR when a device
> is removed.
> Making procfs behave the same way wouldn't be too stupid.
>
>> > Indeed, to avoid killing the wrong process you need to have opened
>> > some node of /proc/pid/* (maybe cmdline) before sending the kill
>> > signal.
>>
>> The kernel really needs better documentation of the semantics of
>> procfs file descriptors. You're not the only person to think,
>> mistakenly, that keeping a reference to a /proc/$PID/something FD
>> reserves $PID and prevents it being used for another process. Procfs
>> FDs do no such thing. kill(2) is unsafe whether or not
>> /proc/pid/cmdline or any other /proc file is open.
>
> Interesting.
> Linux 'fixed' the problem of pid reuse in the kernel by adding (IIRC)
> 'struct pid' that reference counts the pid stopping reuse.

This is incorrect if you mean numeric pids. See the end of these
comments in include/linux/pid.h . A pid value can be reused, it just
works Ok because it causes a new struct pid allocation. That doesn't
mean there isn't a numeric reuse. There's also no where in pid_alloc()
where we prevent the numeric reuse AFAICT.

/*
 * What is struct pid?
 *
 * A struct pid is the kernel's internal notion of a process identifier.
 * It refers to individual tasks, process groups, and sessions.  While
 * there are processes attached to it the struct pid lives in a hash
 * table, so it and then the processes that it refers to can be found
 * quickly from the numeric pid value.  The attached processes may be
 * quickly accessed by following pointers from struct pid.
 *
 * Storing pid_t values in the kernel and referring to them later has a
 * problem.  The process originally with that pid may have exited and the
 * pid allocator wrapped, and another process could have come along
 * and been assigned that pid.
 *
 * Referring to user space processes by holding a reference to struct
 * task_struct has a problem.  When the user space process exits
 * the now useless task_struct is still kept.  A task_struct plus a
 * stack consumes around 10K of low kernel memory.  More precisely
 * this is THREAD_SIZE + sizeof(struct task_struct).  By comparison
 * a struct pid is about 64 bytes.
 *
 * Holding a reference to struct pid solves both of these problems.
 * It is small so holding a reference does not consume a lot of
 * resources, and since a new struct pid is allocated when the numeric pid
 * value is reused (when pids wrap around) we don't mistakenly refer to new
 * processes.
 */