lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <B3C675EC-C458-4A15-B91D-5B17E3D6319C@fb.com>
Date:   Wed, 14 Mar 2018 21:30:53 +0000
From:   Song Liu <songliubraving@...com>
To:     LKML <linux-kernel@...r.kernel.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "jolsa@...hat.com" <jolsa@...hat.com>
CC:     Kernel Team <Kernel-team@...com>, Ephraim Park <ephiepark@...com>
Subject: Re: [PATCH v2] perf: update perf_cgroup time for ancestor cgroup(s)

Dear Peter, 

Could you please share your comments/suggestions on this patch? We would
like to fix this issue in our kernel, as we are using perf events with 
nested cgroups. 

Thanks,
Song

> On Mar 12, 2018, at 9:59 AM, Song Liu <songliubraving@...com> wrote:
> 
> When a perf_event is attached to parent cgroup, it should count events
> for all children cgroups:
> 
> parent_group   <---- perf_event
>   \
>    - child_group  <---- process(es)
> 
> However, in our tests, we found this perf_event cannot report reliable
> results. Here is an example case:
> 
>  # create cgroups
>  mkdir -p /sys/fs/cgroup/p/c
>  # start perf for parent group
>  perf stat -e instructions -G "p"
> 
>  # on another console, run test process in child cgroup:
>  stressapptest -s 2 -M 1000 & echo $! > /sys/fs/cgroup/p/c/cgroup.procs
> 
>  # after the test process is done, stop perf in the first console shows
> 
>       <not counted>      instructions              p
> 
> The instruction should not be "not counted" as the process runs in the
> child cgroup.
> 
> We found this is because perf_event->cgrp and cpuctx->cgrp are not
> identical, thus perf_event->cgrp are not updated properly.
> 
> This patch fixes this by updating perf_cgroup properly for ancestor
> cgroup(s).
> 
> Signed-off-by: Song Liu <songliubraving@...com>
> Reported-by: Ephraim Park <ephiepark@...com>
> ---
> kernel/events/core.c | 20 +++++++++++++++-----
> 1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5789810..6f015ff 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -724,9 +724,14 @@ static inline void __update_cgrp_time(struct perf_cgroup *cgrp)
> 
> static inline void update_cgrp_time_from_cpuctx(struct perf_cpu_context *cpuctx)
> {
> -	struct perf_cgroup *cgrp_out = cpuctx->cgrp;
> -	if (cgrp_out)
> -		__update_cgrp_time(cgrp_out);
> +	struct perf_cgroup *cgrp = cpuctx->cgrp;
> +	struct cgroup_subsys_state *css;
> +
> +	if (cgrp)
> +		for (css = &cgrp->css; css; css = css->parent) {
> +			cgrp = container_of(css, struct perf_cgroup, css);
> +			__update_cgrp_time(cgrp);
> +		}
> }
> 
> static inline void update_cgrp_time_from_event(struct perf_event *event)
> @@ -754,6 +759,7 @@ perf_cgroup_set_timestamp(struct task_struct *task,
> {
> 	struct perf_cgroup *cgrp;
> 	struct perf_cgroup_info *info;
> +	struct cgroup_subsys_state *css;
> 
> 	/*
> 	 * ctx->lock held by caller
> @@ -764,8 +770,12 @@ perf_cgroup_set_timestamp(struct task_struct *task,
> 		return;
> 
> 	cgrp = perf_cgroup_from_task(task, ctx);
> -	info = this_cpu_ptr(cgrp->info);
> -	info->timestamp = ctx->timestamp;
> +
> +	for (css = &cgrp->css; css; css = css->parent) {
> +		cgrp = container_of(css, struct perf_cgroup, css);
> +		info = this_cpu_ptr(cgrp->info);
> +		info->timestamp = ctx->timestamp;
> +	}
> }
> 
> static DEFINE_PER_CPU(struct list_head, cgrp_cpuctx_list);
> -- 
> 2.9.5
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ