[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <530bba0f-9e13-5308-fc93-d0dab0c56fcc@fb.com>
Date: Thu, 23 May 2019 05:33:45 +0000
From: Yonghong Song <yhs@...com>
To: Roman Gushchin <guro@...com>, Alexei Starovoitov <ast@...nel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>
CC: Daniel Borkmann <daniel@...earbox.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Tejun Heo <tj@...nel.org>, Kernel Team <Kernel-team@...com>,
"cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
Stanislav Fomichev <sdf@...ichev.me>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"jolsa@...hat.com" <jolsa@...hat.com>
Subject: Re: [PATCH v2 bpf-next 1/4] bpf: decouple the lifetime of cgroup_bpf
from cgroup itself
On 5/22/19 4:20 PM, Roman Gushchin wrote:
> Currently the lifetime of bpf programs attached to a cgroup is bound
> to the lifetime of the cgroup itself. It means that if a user
> forgets (or intentionally avoids) to detach a bpf program before
> removing the cgroup, it will stay attached up to the release of the
> cgroup. Since the cgroup can stay in the dying state (the state
> between being rmdir()'ed and being released) for a very long time, it
> leads to a waste of memory. Also, it blocks a possibility to implement
> the memcg-based memory accounting for bpf objects, because a circular
> reference dependency will occur. Charged memory pages are pinning the
> corresponding memory cgroup, and if the memory cgroup is pinning
> the attached bpf program, nothing will be ever released.
>
> A dying cgroup can not contain any processes, so the only chance for
> an attached bpf program to be executed is a live socket associated
> with the cgroup. So in order to release all bpf data early, let's
> count associated sockets using a new percpu refcounter. On cgroup
> removal the counter is transitioned to the atomic mode, and as soon
> as it reaches 0, all bpf programs are detached.
>
> The reference counter is not socket specific, and can be used for any
> other types of programs, which can be executed from a cgroup-bpf hook
> outside of the process context, had such a need arise in the future.
>
> Signed-off-by: Roman Gushchin <guro@...com>
> Cc: jolsa@...hat.com
The logic looks sound to me. With one nit below,
Acked-by: Yonghong Song <yhs@...com>
> ---
> include/linux/bpf-cgroup.h | 8 ++++++--
> include/linux/cgroup.h | 18 ++++++++++++++++++
> kernel/bpf/cgroup.c | 25 ++++++++++++++++++++++---
> kernel/cgroup/cgroup.c | 11 ++++++++---
> 4 files changed, 54 insertions(+), 8 deletions(-)
>
[...]
> @@ -167,7 +178,12 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
> */
> #define NR ARRAY_SIZE(cgrp->bpf.effective)
> struct bpf_prog_array __rcu *arrays[NR] = {};
> - int i;
> + int ret, i;
> +
> + ret = percpu_ref_init(&cgrp->bpf.refcnt, cgroup_bpf_release, 0,
> + GFP_KERNEL);
> + if (ret)
> + return -ENOMEM;
Maybe return "ret" here instead of -ENOMEM. Currently, percpu_ref_init
only return error code is -ENOMEM. But in the future, it could
change?
>
> for (i = 0; i < NR; i++)
> INIT_LIST_HEAD(&cgrp->bpf.progs[i]);
> @@ -183,6 +199,9 @@ int cgroup_bpf_inherit(struct cgroup *cgrp)
> cleanup:
> for (i = 0; i < NR; i++)
> bpf_prog_array_free(arrays[i]);
> +
> + percpu_ref_exit(&cgrp->bpf.refcnt);
> +
> return -ENOMEM;
> }
>
[...]
Powered by blists - more mailing lists