linux-kernel - Re: [RFC][PATCH 2/4] cgroup ID

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6599ad830812050311m3728ab69v465ed5d032792973@mail.gmail.com>
Date:	Fri, 5 Dec 2008 03:11:23 -0800
From:	Paul Menage <menage@...gle.com>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"nishimura@....nes.nec.co.jp" <nishimura@....nes.nec.co.jp>,
	"balbir@...ux.vnet.ibm.com" <balbir@...ux.vnet.ibm.com>,
	"lizf@...fujitsu.com" <lizf@...fujitsu.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFC][PATCH 2/4] cgroup ID

Hi Kamezawa,

I definitely agree with the idea of being able to traverse the cgroup
hierarchy without doing a cgroup_lock() and I've included some
comments below. But having said that, maybe there's a simpler
solution?

A while ago I posted some patches that added a per-hierarchy lock
which could be taken to prevent creation or destruction of cgroups in
a given hierarchy; it was lighter-weight than the full cgroup_lock().
Is that sufficient to avoid the deadlock that you mentioned in your
patch description?

The idea of having a short id for each cgroup to save space in the
swap cgroup sounds sensible - but I'm not sure that we need the RCU
support to make the id persist beyond the lifetime of the cgroup
itself.

On Fri, Dec 5, 2008 at 12:29 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@...fujitsu.com> wrote:
>
> +/*
> + * Cgroup ID for *internal* identification and lookup. For user-land,"path"
> + * of cgroup works well.
> + */

This comment seems misplaced and possibly unnecessary. Should it be
with the struct cgroup_id definition in cgroup.c?

>
> +/*
> + * For supporting cgroup lookup and hierarchy management.
> + */

A lot more commenting would be useful here.

> +/* An interface for usual lookup */
> +struct cgroup *cgroup_lookup(int id);
> +/* get next cgroup under tree (for scan) */
> +struct cgroup *
> +cgroup_get_next(int id, int rootid, int depth, int *foundid);
> +/* get id and depth of cgroup */
> +int cgroup_id(struct cgroup *cgroup);
> +int cgroup_depth(struct cgroup *cgroup);
> +/* For delayed freeing of IDs */
> +int cgroup_id_tryget(int id);
> +void cgroup_id_put(int id);
> +
>  #else /* !CONFIG_CGROUPS */
>
>  /*
> + * CGROUP ID
> + */

More comments needed about the exact semantics of these fields.

> +struct cgroup_id {
> +       struct cgroup *myself;

Can you call this cgroup for consistency with other struct cgroup pointers?

> +       unsigned int  id;
> +       unsigned int  depth;
> +       atomic_t      refcnt;
> +       struct rcu_head rcu_head;
> +       unsigned int  hierarchy_code[MAX_CGROUP_DEPTH];

How about "stack" for this array?

> +};
> +
> +void free_cgroupid_cb(struct rcu_head *head)
> +{
> +       struct cgroup_id *id;
> +
> +       id = container_of(head, struct cgroup_id, rcu_head);
> +       kfree(id);
> +}
> +
> +void free_cgroupid(struct cgroup_id *id)
> +{
> +       call_rcu(&id->rcu_head, free_cgroupid_cb);
> +}
> +

Rather than having a separate RCU callback for the cgroup_id
structure, how about marking it as "dead" when you unlink the cgroup
from the tree, and freeing it in the cgroup_diput() callback at the
same time the struct cgroup is freed? Or is the issue that you need
the id to persist longer than the cgroup itself, to prevent re-use?

> +static DEFINE_IDR(cgroup_idr);
> +DEFINE_SPINLOCK(cgroup_idr_lock);

Any reason to not have a separate idr and idr_lock per hierarchy?

> +
> +static int cgrouproot_setup_idr(struct cgroupfs_root *root)
> +{
> +       struct cgroup_id *newid;
> +       int err = -ENOMEM;
> +       int myid;
> +
> +       newid = kzalloc(sizeof(*newid), GFP_KERNEL);
> +       if (!newid)
> +               goto out;
> +       if (!idr_pre_get(&cgroup_idr, GFP_KERNEL))
> +               goto free_out;
> +
> +       spin_lock_irq(&cgroup_idr_lock);
> +       err = idr_get_new_above(&cgroup_idr, newid, 1, &myid);
> +       spin_unlock_irq(&cgroup_idr_lock);
> +
> +       /* This one is new idr....*/
> +       BUG_ON(err);

There's really no way this can fail?

> +/*
> + * should be called while "cgrp" is valid.
> + */

Can you be more specific here? Clearly calling a function with a
pointer to an object that might have been freed is a bad idea; if
that's all you mean then I don't think it needs to be called out in a
comment.

> +static int cgroup_prepare_id(struct cgroup *parent, struct cgroup_id **id)
> +{
> +       struct cgroup_id *newid;
> +       int myid, error;
> +
> +       /* check depth */
> +       if (parent->id->depth + 1 >= MAX_CGROUP_DEPTH)
> +               return -ENOSPC;
> +       newid = kzalloc(sizeof(*newid), GFP_KERNEL);
> +       if (!newid)
> +               return -ENOMEM;
> +       /* get id */
> +       if (unlikely(!idr_pre_get(&cgroup_idr, GFP_KERNEL))) {
> +               error = -ENOMEM;
> +               goto err_out;
> +       }
> +       spin_lock_irq(&cgroup_idr_lock);
> +       /* Don't use 0 */
> +       error = idr_get_new_above(&cgroup_idr, newid, 1, &myid);
> +       spin_unlock_irq(&cgroup_idr_lock);
> +       if (error)
> +               goto err_out;

This code is pretty similar to a big chunk of cgrouproot_setup_idr() -
can they share the common code?

> +static void cgroup_id_attach(struct cgroup_id *cgid,
> +                            struct cgroup *cg, struct cgroup *parent)
> +{
> +       struct cgroup_id *parent_id = rcu_dereference(parent->id);

It doesn't seem as though it should be necessary to rcu_dereference()
parent->id - parent can't be going away in this case.

> +       int i;
> +
> +       cgid->depth = parent_id->depth + 1;
> +       /* Inherit hierarchy code from parent */
> +       for (i = 0; i < cgid->depth; i++) {
> +               cgid->hierarchy_code[i] =
> +                       parent_id->hierarchy_code[i];
> +               cgid->hierarchy_code[cgid->depth] = cgid->id;

I think this line is supposed to be outside the for() loop.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/