[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <562611A7.7070606@yandex-team.ru>
Date: Tue, 20 Oct 2015 13:04:23 +0300
From: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: linux-api@...r.kernel.org, containers@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org,
Roman Gushchin <klamm@...dex-team.ru>,
Serge Hallyn <serge.hallyn@...ntu.com>,
Oleg Nesterov <oleg@...hat.com>,
Chen Fan <chen.fan.fnst@...fujitsu.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Stéphane Graber <stgraber@...ntu.com>
Subject: Re: [PATCH RFC v3 2/2] pidns: introduce syscall getvpid
On 28.09.2015 19:57, Eric W. Biederman wrote:
> Konstantin Khlebnikov <khlebnikov@...dex-team.ru> writes:
>
>> If pid is negative then getvpid() returns pid of parent task for -pid.
>
> Now that I am noticing this. I don't think I have seen any discussion
> about justifying a syscall getting another processes parent pid. My
> apologies if I just missed it.
>
Sorry for late response. This completely fell out of my mind after LinuxCon.
> Why do we want the the parent pid? We can we usefully do with it?
> Is proc really that bad of an interface?
>
> Fetching a parent pid feels like a separate logical operation
> from pid translation. Which makes me a bit uneasy about this
> part of the conversation.
Yep proc interface is bad. /proc/$pid/stat is almost impossible to
parse without flaws because task could set second field "comm" into
any string and fake ppid - for example ") Z 1". /proc/$pid/status
is better but it has more information and thus slower.
This trick for distant getppid looks cheap useful:
in this interface space of negative pids is free for use.
>
>> Examples:
>> getvpid(pid, ns, -1) - get pid in our pid namespace
>> getvpid(pid, -1, ns) - get pid in container
>> getvpid(pid, -1, ns) > 0 - is pid is reachable from container?
>> getvpid(1, ns1, ns2) > 0 - is ns1 inside ns2?
>> getvpid(1, ns1, ns2) == 0 - is ns1 outside ns2?
>> getvpid(1, ns, -1) - get init task of pid-namespace
>> getvpid(-1, ns, -1) - get reaper of init task in parent pid-namespace
>> getvpid(-pid, -1, -1) - get ppid by pid
>
> As I step back and pay attention to this case I am half wondering if
> perhaps what would be most useful is a file descriptor that refers
> to a pid and an updated set of system calls that takes pid file
> descriptors instead of pids.
Fd which pins pids isn't a good idea.
I think it's better to refer (but not hold) task rather than pid.
For example inode of taskfd will hold small buffer for task exit
status: task holds reference to its own taskfd inode and populates
status when exits. Here will be no zombies and delayed reaping.
Something like:
task_fd = clonefd()
...
select(...)
exit(...)
pread(task_fd, &status_rusage_etc, sizeof, 0);
close(task_fd);
Task pid also could be part of structure in that fd. Potentially it
could provide the same information as /proc/$pid/... in effective
binary format: we can read only required fields of structure and
kernel can skip unneeded calculations.
>
> Something like:
>
> getpidfd(int pidnsfd, pid_t pid);
>
> waitfd(int pidfd, int *status, int options, struct rusage *rusage);
>
> killfd(int pidfd, int sig);
>
> clonefd(...);
>
> And perhaps:
> pid_nr_ns(int pidnsfd, int pidfd);
>
> parentfd(int pidfd);
>
> Eric
>
--
Konstantin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists