[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55F91C3D.1040209@yandex-team.ru>
Date: Wed, 16 Sep 2015 10:37:33 +0300
From: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To: Serge Hallyn <serge.hallyn@...ntu.com>,
Stéphane Graber <stgraber@...ntu.com>
Cc: linux-api@...r.kernel.org, containers@...ts.linux-foundation.org,
Oleg Nesterov <oleg@...hat.com>, linux-kernel@...r.kernel.org,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH RFC] pidns: introduce syscall getvpid
On 15.09.2015 20:41, Serge Hallyn wrote:
> Quoting Stéphane Graber (stgraber@...ntu.com):
>> On Tue, Sep 15, 2015 at 06:01:38PM +0300, Konstantin Khlebnikov wrote:
>>> On 15.09.2015 17:27, Eric W. Biederman wrote:
>>>> Konstantin Khlebnikov <khlebnikov@...dex-team.ru> writes:
>>>>
>>>>> pid_t getvpid(pid_t pid, pid_t source, pid_t target);
>>>>>
>>>>> This syscall converts pid from one pid-ns into pid in another pid-ns:
>>>>> it takes @pid in namespace of @source task (zero for current) and
>>>>> returns related pid in namespace of @target task (zero for current too).
>>>>> If pid is unreachable from target pid-ns then it returns zero.
>>>>
>>>> This interface as presented is inherently racy. It would be better
>>>> if source and target were file descriptors referring to the namespaces
>>>> you wish to translate between.
>>>
>>> Yep, it's racy. As well as any operation with non-child pids.
>>> With file descriptors for source/target result will be racy anyway.
>>>
>>>>
>>>>> Such conversion is required for interaction between processes from
>>>>> different pid-namespaces. For example when system service talks with
>>>>> client from isolated container via socket about task in container:
>>>>
>>>> Sockets are already supported. At least the metadata of sockets is.
>>>>
>>>> Maybe we need this but I am not convinced of it's utility.
>>>>
>>>> What are you trying to do that motivates this?
>>>
>>> I'm working on hierarchical container management system which
>>> allows to create and control nested sub-containers from containers
>>> ( https://github.com/yandex/porto ). Main server works in host and
>>> have to interact with all levels of nested namespaces. This syscall
>>> makes some operations much easier: server must remember only pid in
>>> host pid namespace and convert it into right vpid on demand.
>>
>> Note that as Eric said earlier, sending a PID inside a ucred through a
>> unix socket will have the pid translated.
>>
>> So while your solution certainly should be faster, you can already achieve
>> what you want today by doing:
>>
>> == Translate PID in container to PID in host
>> - open a socket
>> - setns to container's pidns
>> - send ucred from that container containing the requested container PID
>> - host sees the host PID
>>
>> == Translate PID on host to PID in container
>> - open a socket
>> - setns to container's pidns
>> - send ucred from the host containing the request host PID
>> (send will fail if the host PID isn't part of that container)
>> - container sees the container PID
>
> In addition, since commit e4bc332451 : /proc/PID/status: show all sets of pid according to ns
> we now also have 'NSpid' etc in /proc/$$/status.
>
As I see this works perfectly only for converting host pid into virtual.
Backward conversion is troublesome: we have to scan all pids in host
procfs and somehow filter tasks from container and its sub-pid-ns.
Or I am missing something trivial?
--
Konstantin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists