[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <59F9FD8B.8090607@oracle.com>
Date: Wed, 01 Nov 2017 09:59:55 -0700
From: nagarathnam muthusamy <nagarathnam.muthusamy@...cle.com>
To: prakash sangappa <prakash.sangappa@...cle.com>
CC: Andy Lutomirski <luto@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Konstantin Khlebnikov <khlebnikov@...dex-team.ru>,
Oleg Nesterov <oleg@...hat.com>,
Linux API <linux-api@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Serge Hallyn <serge.hallyn@...ntu.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Eugene Syromiatnikov <esyr@...hat.com>
Subject: Re: [PATCH v4] pidns: introduce syscall translate_pid
I believe all the questions raised in this thread were answered. Just
wondering if there are any outstanding questions?
Thanks,
Nagarathnam.
On 10/17/2017 3:53 PM, prakash sangappa wrote:
>
> On 10/17/2017 3:40 PM, Andy Lutomirski wrote:
>> On Tue, Oct 17, 2017 at 3:35 PM, prakash sangappa
>> <prakash.sangappa@...cle.com> wrote:
>>> On 10/17/2017 3:02 PM, Andy Lutomirski wrote:
>>>> On Tue, Oct 17, 2017 at 8:38 AM, Prakash Sangappa
>>>> <prakash.sangappa@...cle.com> wrote:
>>>>>
>>>>> On 10/16/17 5:52 PM, Andy Lutomirski wrote:
>>>>>> On Mon, Oct 16, 2017 at 3:54 PM, prakash.sangappa
>>>>>> <prakash.sangappa@...cle.com> wrote:
>>>>>>>
>>>>>>> On 10/16/2017 03:07 PM, Nagarathnam Muthusamy wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/16/2017 02:36 PM, Andrew Morton wrote:
>>>>>>>>> On Sat, 14 Oct 2017 11:17:47 +0300 Konstantin Khlebnikov
>>>>>>>>> <khlebnikov@...dex-team.ru> wrote:
>>>>>>>>>
>>>>>>>>>>>>> pid_t translate_pid(pid_t pid, int source, int target);
>>>>>>>>>>>>>
>>>>>>>>>>>>> This syscall converts pid from source pid-ns into pid in
>>>>>>>>>>>>> target
>>>>>>>>>>>>> pid-ns.
>>>>>>>>>>>>> If pid is unreachable from target pid-ns it returns zero.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Pid-namespaces are referred file descriptors opened to
>>>>>>>>>>>>> proc files
>>>>>>>>>>>>> /proc/[pid]/ns/pid or /proc/[pid]/ns/pid_for_children.
>>>>>>>>>>>>> Negative
>>>>>>>>>>>>> argument
>>>>>>>>>>>>> refers to current pid namespace, same as file
>>>>>>>>>>>>> /proc/self/ns/pid.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kernel expose virtual pids in /proc/[pid]/status:NSpid, but
>>>>>>>>>>>>> backward
>>>>>>>>>>>>> translation requires scanning all tasks. Also pids could be
>>>>>>>>>>>>> translated
>>>>>>>>>>>>> by sending them through unix socket between namespaces, this
>>>>>>>>>>>>> method
>>>>>>>>>>>>> is
>>>>>>>>>>>>> slow and insecure because other side is exposed inside pid
>>>>>>>>>>>>> namespace.
>>>>>>>>>> Andrew asked why we might need this.
>>>>>>>>>>
>>>>>>>>>> Such conversion is required for interaction between processes
>>>>>>>>>> across
>>>>>>>>>> pid-namespaces.
>>>>>>>>>> For example to identify process in container by pid file looking
>>>>>>>>>> from
>>>>>>>>>> outside.
>>>>>>>>>>
>>>>>>>>>> Two years ago I've solved this in project of mine with monstrous
>>>>>>>>>> code
>>>>>>>>>> which
>>>>>>>>>> forks couple times just to convert pid, lucky for me performance
>>>>>>>>>> wasn't
>>>>>>>>>> important.
>>>>>>>>> That's a single user who needed this a single time, and found a
>>>>>>>>> userspace-based solution anyway. This is not exactly compelling!
>>>>>>>>>
>>>>>>>>> Is there a stronger case to be made? How does this change
>>>>>>>>> benefit
>>>>>>>>> our
>>>>>>>>> users? Sell it to us!
>>>>>>>> Oracle database is planning to use pid namespace for sandboxing
>>>>>>>> database
>>>>>>>> instances and they need an API similar to translate_pid to
>>>>>>>> effectively
>>>>>>>> translate process IDs from other pid namespaces. Prakash (cced in
>>>>>>>> mail)
>>>>>>>> can
>>>>>>>> provide more details on this usecase.
>>>>>>>
>>>>>>> As Nagarathnam indicated, Oracle Database will be using pid
>>>>>>> namespaces
>>>>>>> and
>>>>>>> needs a direct method of converting pids of processes in the pid
>>>>>>> namespace
>>>>>>> hierarchy. In this use case multiple
>>>>>>> nested PID namespaces will be used. The currently available
>>>>>>> mechanism
>>>>>>> are
>>>>>>> not very efficient for this use case. For ex. as Konstantin
>>>>>>> described,
>>>>>>> using
>>>>>>> /proc/<pid>/status would require the application to scan all the
>>>>>>> pid's
>>>>>>> status files to determine the pid of given process in a child
>>>>>>> namespace.
>>>>>>>
>>>>>>> Use of SCM_CREDENTIALS's socket message is another way, which would
>>>>>>> require
>>>>>>> every process starting inside a pid namespace to send this
>>>>>>> message and
>>>>>>> the
>>>>>>> receiving process in the target namespace would have to save the
>>>>>>> converted
>>>>>>> pid and reference it. This mechanism becomes cumbersome
>>>>>>> especially if
>>>>>>> the
>>>>>>> application has to deal with multiple nested pid namespaces.
>>>>>>> Also, the
>>>>>>> Database needs to be able to convert a thread's global
>>>>>>> pid(gettid()).
>>>>>>> Passing the thread's pid(gettid()) in SCM_CREDENTIALS message
>>>>>>> requires
>>>>>>> CAP_SYS_ADMIN, which is an issue.
>>>>>>>
>>>>>>> So having a direct method, like the API that Konstantin is
>>>>>>> proposing,
>>>>>>> will
>>>>>>> work best for the Database
>>>>>>> since pid of a process in any of the nested pid namespaces can be
>>>>>>> converted
>>>>>>> as and when required. I think with the proposed API, the
>>>>>>> application
>>>>>>> should
>>>>>>> be able to convert pid of a process or tid(gettid()) of a thread as
>>>>>>> well.
>>>>>>>
>>>>>> Can you explain what Oracle's database is planning to do with this
>>>>>> information?
>>>>>
>>>>> Database uses the PID to programmatically find out if the
>>>>> process/thread
>>>>> is
>>>>> alive(kill 0) also send signals to the processes requesting it to
>>>>> dump
>>>>> status/debug information and kill the processes in case of a shutdown
>>>>> abort
>>>>> of the instance.
>>>> What I'm wondering is: how does the caller of kill() end up
>>>> controlling a task whose pid it doesn't know in its own namespace?
>>>
>>> I was generally describing how DB would use the PID of process. The
>>> above
>>> description
>>> was in the case when no namespaces are used.
>>>
>>> With use of namespaces, the DB would convert the PID of processes
>>> inside
>>> its children namespaces to PID in its namespace and use that pid to
>>> issue
>>> kill().
>> Seems vaguely sensible.
>>
>> If I were designing this type of system, I'd have a manager process in
>> each namespace running as PID 1, though -- PID 1 is special and needs
>> to understand what's going on anyway. Then PID 1 would do the kill()
>> calls and wouldn't need translate_pid().
>
> Yes, this has been tried out with the prototype use of PID namespaces
> in the DB.
> It works, but would be slow as the manager would have to exchange
> messages with the
> controlling processes which would be in the parent namespace.
> DB could use the api to convert the pid.
>
Powered by blists - more mailing lists