lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <63f1cf0d-28d5-b648-be22-caf7bdb96fdf@intel.com>
Date:   Thu, 16 Mar 2023 14:41:09 -0700
From:   Reinette Chatre <reinette.chatre@...el.com>
To:     Shawn Wang <shawnwang@...ux.alibaba.com>, <fenghua.yu@...el.com>
CC:     <tglx@...utronix.de>, <mingo@...hat.com>, <bp@...en8.de>,
        <dave.hansen@...ux.intel.com>, <james.morse@....com>,
        <x86@...nel.org>, <hpa@...or.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86/resctrl: Only show tasks' pids in current pid
 namespace

Hi Shawn,

On 3/15/2023 8:06 AM, Shawn Wang wrote:
> On 2/16/23 5:43 AM, Reinette Chatre wrote:
>> On 1/15/2023 11:12 PM, Shawn Wang wrote:
>>> When writing a task id to the "tasks" file in an rdtgroup,
>>> rdtgroup_tasks_write() treats the pid as a number in the current pid
>>> namespace. But when reading the "tasks" file, rdtgroup_tasks_show() shows
>>> the list of global pids from the init namespace. If current pid namespace
>>> is not the init namespace, pids in "tasks" will be confusing and incorrect.
>>>
>>> To be more robust, let the "tasks" file only show pids in the current pid
>>> namespace.
>>>
>>
>> Is it possible to elaborate more on the use case that this is aiming to
>> address? It is unexpected to me that resource management is approached from
>> within a container. My expectation is that the resource management and monitoring
>> is done from the host.
> 
> We have a scenario where we only want to mount the resctrl filesystem under a specific container.

This scenario is interesting to me. My assumption has always been that the resource
management is done from the host and not a container. Especially since a container
can only add its own tasks to resource groups.

> And We found that the pids in the tasks under resctrl are inconsistent with the pids obtained by top.

Indeed.

> 
> Besides, current rdtgroup_move_task() uses the find_task_by_vpid() to get the real pid.
> Our modification is also to maintain symmetry with the rdtgroup_move_task().

I understand, thank you for looking into this.

> 
>>> ---
>>>   arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++++--
>>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> index 5993da21d822..9e97ae24c159 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>>> @@ -718,11 +718,15 @@ static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of,
>>>   static void show_rdt_tasks(struct rdtgroup *r, struct seq_file *s)
>>>   {
>>>       struct task_struct *p, *t;
>>> +    pid_t pid;
>>>         rcu_read_lock();
>>>       for_each_process_thread(p, t) {
>>> -        if (is_closid_match(t, r) || is_rmid_match(t, r))
>>> -            seq_printf(s, "%d\n", t->pid);
>>> +        if (is_closid_match(t, r) || is_rmid_match(t, r)) {
>>> +            pid = task_pid_vnr(t);
>>> +            if (pid)
>>> +                seq_printf(s, "%d\n", pid);
>>> +        }
>>>       }
>>>       rcu_read_unlock();
>>>   }
>>
>> This looks like it would solve the stated problem. Does it slow down
>> reading a tasks file in a measurable way?
> 
> We didn't test it, but it is proportional to the number of pids in the group.
> In addition, only an if statement is added here, and actually the reading of
> the tasks interface will not be called frequently, so it will not be a bottleneck.

It adds more than an if statement and for a default root group task_pid_vnr() will
be called for every task on the host. I am not familiar with namespaces so my concern
was the additional task_pid_vnr() call. This does seem to be the custom though and does
what's needed to return the correct data.

I did test this and can confirm that when bind mounting /sys/fs/resctrl into the container
the container's view of /sys/fs/resctrl/tasks only shows its own tasks with the pids as seen
by it. Without this patch both the container and the host shows the same data, which are the
pids from the host namespace.

Tested-by: Reinette Chatre <reinette.chatre@...el.com>
Acked-by: Reinette Chatre <reinette.chatre@...el.com>

When you no longer expect any more feedback I'd recommend that you resubmit this
patch with the new tags to make it easier for the next level maintainers to notice
it and pick it up. To ensure accurate references to discussions you can add a
"Link:" to this email.

Thank you very much

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ