linux-kernel - Re: [PATCH 5/7] devcg: device cgroup's extension for RDMA resource.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <55EEE793.9020105@mellanox.com>
Date:	Tue, 8 Sep 2015 16:50:11 +0300
From:	Haggai Eran <haggaie@...lanox.com>
To:	Parav Pandit <pandit.parav@...il.com>
CC:	<cgroups@...r.kernel.org>, <linux-doc@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <linux-rdma@...r.kernel.org>,
	<tj@...nel.org>, <lizefan@...wei.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Doug Ledford <dledford@...hat.com>,
	Jonathan Corbet <corbet@....net>, <james.l.morris@...cle.com>,
	<serge@...lyn.com>, Or Gerlitz <ogerlitz@...lanox.com>,
	Matan Barak <matanb@...lanox.com>, <raindel@...lanox.com>,
	<akpm@...ux-foundation.org>,
	<linux-security-module@...r.kernel.org>
Subject: Re: [PATCH 5/7] devcg: device cgroup's extension for RDMA resource.

On 08/09/2015 13:18, Parav Pandit wrote:
>> >
>>> >> + * RDMA resource limits are hierarchical, so the highest configured limit of
>>> >> + * the hierarchy is enforced. Allowing resource limit configuration to default
>>> >> + * cgroup allows fair share to kernel space ULPs as well.
>> > In what way is the highest configured limit of the hierarchy enforced? I
>> > would expect all the limits along the hierarchy to be enforced.
>> >
> In  hierarchy, of say 3 cgroups, the smallest limit of the cgroup is applied.
> 
> Lets take example to clarify.
> Say cg_A, cg_B, cg_C
> Role              name                           limit
> Parent           cg_A                           100
> Child_level1  cg_B (child of cg_A)    20
> Child_level2: cg_C (child of cg_B)    50
> 
> If the process allocating rdma resource belongs to cg_C, limit lowest
> limit in the hierarchy is applied during charge() stage.
> If cg_A limit happens to be 10, since 10 is lowest, its limit would be
> applicable as you expected.

Looking at the code, the usage in every level is charged. This is what I
would expect. I just think the comment is a bit misleading.

>>> +int devcgroup_rdma_get_max_resource(struct seq_file *sf, void *v)
>>> +{
>>> +     struct dev_cgroup *dev_cg = css_to_devcgroup(seq_css(sf));
>>> +     int type = seq_cft(sf)->private;
>>> +     u32 usage;
>>> +
>>> +     if (dev_cg->rdma.tracker[type].limit == DEVCG_RDMA_MAX_RESOURCES) {
>>> +             seq_printf(sf, "%s\n", DEVCG_RDMA_MAX_RESOURCE_STR);
>> I'm not sure hiding the actual number is good, especially in the
>> show_usage case.
> 
> This is similar to following other controller same as newly added PID
> subsystem in showing max limit.

Okay.

>>> +void devcgroup_rdma_uncharge_resource(struct ib_ucontext *ucontext,
>>> +                                   enum devcgroup_rdma_rt type, int num)
>>> +{
>>> +     struct dev_cgroup *dev_cg, *p;
>>> +     struct task_struct *ctx_task;
>>> +
>>> +     if (!num)
>>> +             return;
>>> +
>>> +     /* get cgroup of ib_ucontext it belong to, to uncharge
>>> +      * so that when its called from any worker tasks or any
>>> +      * other tasks to which this resource doesn't belong to,
>>> +      * it can be uncharged correctly.
>>> +      */
>>> +     if (ucontext)
>>> +             ctx_task = get_pid_task(ucontext->tgid, PIDTYPE_PID);
>>> +     else
>>> +             ctx_task = current;
>>> +     dev_cg = task_devcgroup(ctx_task);
>>> +
>>> +     spin_lock(&ctx_task->rdma_res_counter->lock);
>> Don't you need an rcu read lock and rcu_dereference to access
>> rdma_res_counter?
> 
> I believe, its not required because when uncharge() is happening, it
> can happen only from 3 contexts.
> (a) from the caller task context, who has made allocation call, so no
> synchronizing needed.
> (b) from the dealloc resource context, again this is from the same
> task context which allocated, it so this is single threaded, no need
> to syncronize.
I don't think it is true. You can access uverbs from multiple threads.
What may help your case here I think is the fact that only when the last
ucontext is released you can change the rdma_res_counter field, and
ucontext release takes the ib_uverbs_file->mutex.

Still, I think it would be best to use rcu_dereference(), if only for
documentation and sparse.

> (c) from the fput() context when process is terminated abruptly or as
> part of differed cleanup, when this is happening there cannot be
> allocator task anyway.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/