linux-kernel - Re: [PATCH RESEND v5] perf/core: Fix installing arbitrary cgroup event into cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180306115010.GC25201@hirez.programming.kicks-ass.net>
Date:   Tue, 6 Mar 2018 12:50:10 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     linxiulei@...il.com
Cc:     jolsa@...hat.com, mingo@...hat.com, acme@...nel.org,
        alexander.shishkin@...ux.intel.com, linux-kernel@...r.kernel.org,
        tglx@...utronix.de, eranian@...il.com,
        torvalds@...ux-foundation.org, linux-perf-users@...r.kernel.org,
        brendan.d.gregg@...il.com, yang_oliver@...mail.com,
        "leilei.lin" <leilei.lin@...baba-inc.com>
Subject: Re: [PATCH RESEND v5] perf/core: Fix installing arbitrary cgroup
 event into cpu

On Tue, Mar 06, 2018 at 05:36:37PM +0800, linxiulei@...il.com wrote:
> From: "leilei.lin" <leilei.lin@...baba-inc.com>
> 
> Do not install cgroup event into the CPU context and schedule it
> if the cgroup is not running on this CPU

OK, so far so good, this explains the bit in
__perf_install_in_context().

> While there is no task of cgroup running specified CPU, current
> kernel still install cgroup event into CPU context that causes
> another cgroup event can't be installed into this CPU.
> 
> This patch prevent scheduling events at __perf_install_in_context()
> and installing events at list_update_cgroup_event() if cgroup isn't
> running on specified CPU.

This bit doesn't make sense, you don't in fact avoid anything in
list_update_cgroup_event(), you do more, not less.

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 4df5b69..f3ffa70 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -933,31 +933,45 @@ list_update_cgroup_event(struct perf_event *event,
>  {
>  	struct perf_cpu_context *cpuctx;
>  	struct list_head *cpuctx_entry;
> +	struct perf_cgroup *cgrp;
>  
>  	if (!is_cgroup_event(event))
>  		return;
>  
>  	/*
>  	 * Because cgroup events are always per-cpu events,
>  	 * this will always be called from the right CPU.
>  	 */
>  	cpuctx = __get_cpu_context(ctx);
> +	cgrp = perf_cgroup_from_task(current, ctx);
> +
> +	/*
> +	 * if only the cgroup is running on this cpu
> +	 * and cpuctx->cgrp == NULL (otherwise it would've
> +	 * been set with running cgroup), we put this cgroup
> +	 * into cpu context. Or it would case mismatch in
> +	 * following cgroup events at event_filter_match()
> +	 */

This is utterly incomprehensible, what?

> +	if (add && !cpuctx->cgrp &&
> +			cgroup_is_descendant(cgrp->css.cgroup,
> +			event->cgrp->css.cgroup)) {
> +		cpuctx->cgrp = cgrp;
> +	}

And that's just horrible coding style. Maybe something like:

	if (add && cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup)) {
		if (cpuctx->cgrp)
			WARN_ON_ONCE(cpuctx->cgrp != cgrp);
		cpuctx->cgrp = cgrp;
	}

that? But that still needs a comment to explain _why_ we do that here.
Under what condition would we fail to have cpuctx->cgrp set while
ctx->nr_cgroups. Your comment doesn't explain nor does your Changelog.

> +
> +	if (add && ctx->nr_cgroups++)
> +		return;
> +	else if (!add && --ctx->nr_cgroups)
> +		return;
>  
> +	/* no cgroup running */
> +	if (!add)
> +		cpuctx->cgrp = NULL;
> +
> +	cpuctx_entry = &cpuctx->cgrp_cpuctx_entry;
> +	if (add)
>  		list_add(cpuctx_entry, this_cpu_ptr(&cgrp_cpuctx_list));
> +	else
>  		list_del(cpuctx_entry);
>  }
>  
>  #else /* !CONFIG_CGROUP_PERF */
> @@ -2311,6 +2325,20 @@ static int  __perf_install_in_context(void *info)
>  		raw_spin_lock(&task_ctx->lock);
>  	}
>  
> +#ifdef CONFIG_CGROUP_PERF
> +	if (is_cgroup_event(event)) {
> +		/*
> +		 * Only care about cgroup events.
> +		 *

That bit is entirely spurious, if it right after if (is_cgroup_event()),
obviously this block is only for cgroup events.

> +		 * If only the task belongs to cgroup of this event,
> +		 * we will continue the installment

And that isn't really english. I think you meant to write something
like:

		/*
		 * If the current cgroup doesn't match the event's
		 * cgroup, we should not try to schedule it.
		 */

> +		 */
> +		struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
> +		reprogram = cgroup_is_descendant(cgrp->css.cgroup,
> +					event->cgrp->css.cgroup);
> +	}
> +#endif
> +
>  	if (reprogram) {
>  		ctx_sched_out(ctx, cpuctx, EVENT_TIME);
>  		add_event_to_ctx(event, ctx);
> -- 
> 2.8.4.31.g9ed660f
>