linux-kernel - Re: [PATCH RESEND v5] perf/core: Fix installing arbitrary cgroup event into cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 7 Mar 2018 19:19:15 +0800
From:   Lin Xiulei <linxiulei@...il.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Jiri Olsa <jolsa@...hat.com>, mingo@...hat.com, acme@...nel.org,
        alexander.shishkin@...ux.intel.com, linux-kernel@...r.kernel.org,
        tglx@...utronix.de, Stephane Eranian <eranian@...il.com>,
        torvalds@...ux-foundation.org, linux-perf-users@...r.kernel.org,
        Brendan Gregg <brendan.d.gregg@...il.com>,
        yang_oliver@...mail.com, "leilei.lin" <leilei.lin@...baba-inc.com>
Subject: Re: [PATCH RESEND v5] perf/core: Fix installing arbitrary cgroup
 event into cpu

2018-03-06 19:50 GMT+08:00 Peter Zijlstra <peterz@...radead.org>:
> On Tue, Mar 06, 2018 at 05:36:37PM +0800, linxiulei@...il.com wrote:
>> From: "leilei.lin" <leilei.lin@...baba-inc.com>
>>
>> Do not install cgroup event into the CPU context and schedule it
>> if the cgroup is not running on this CPU
>
> OK, so far so good, this explains the bit in
> __perf_install_in_context().
>

Actually, the new codes in __perf_install_in_context() only takes care whether
if events should be scheduled with PMU.

>> While there is no task of cgroup running specified CPU, current
>> kernel still install cgroup event into CPU context that causes
>> another cgroup event can't be installed into this CPU.
>>
>> This patch prevent scheduling events at __perf_install_in_context()
>> and installing events at list_update_cgroup_event() if cgroup isn't
>> running on specified CPU.
>
> This bit doesn't make sense, you don't in fact avoid anything in
> list_update_cgroup_event(), you do more, not less.
>

And the new codes in list_update_cgroup_event() don't want cpuctx->cgrp
to be set arbitrarily. The more logic, you mentioned, was added for making
sure cpuctx->cgrp is set consistently with the cgroup running on the cpu.

>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 4df5b69..f3ffa70 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -933,31 +933,45 @@ list_update_cgroup_event(struct perf_event *event,
>>  {
>>       struct perf_cpu_context *cpuctx;
>>       struct list_head *cpuctx_entry;
>> +     struct perf_cgroup *cgrp;
>>
>>       if (!is_cgroup_event(event))
>>               return;
>>
>>       /*
>>        * Because cgroup events are always per-cpu events,
>>        * this will always be called from the right CPU.
>>        */
>>       cpuctx = __get_cpu_context(ctx);
>> +     cgrp = perf_cgroup_from_task(current, ctx);
>> +
>> +     /*
>> +      * if only the cgroup is running on this cpu
>> +      * and cpuctx->cgrp == NULL (otherwise it would've
>> +      * been set with running cgroup), we put this cgroup
>> +      * into cpu context. Or it would case mismatch in
>> +      * following cgroup events at event_filter_match()
>> +      */
>
> This is utterly incomprehensible, what?

Yes, this is bit messy. I should've made it clear. This comment was supposed
to explain the reason why I modified the if statement below.

And the logic is

1) when cpuctx-> cgrp is NULL, we __must__ take care of how to set it
appropriately, that means, we __have to__ check if the cgroup is running
on the cpu

2) when cpuctx-> cgrp is __NOT__ NULL. It means cpuctx->cgrp had been
set appropriately by cgroup_switch() or list_update_cgroup_event() before.
Therefore, We do __nothing__ here

>
>> +     if (add && !cpuctx->cgrp &&
>> +                     cgroup_is_descendant(cgrp->css.cgroup,
>> +                     event->cgrp->css.cgroup)) {
>> +             cpuctx->cgrp = cgrp;
>> +     }
>
> And that's just horrible coding style. Maybe something like:
>
>         if (add && cgroup_is_descendant(cgrp->css.cgroup, event->cgrp->css.cgroup)) {
>                 if (cpuctx->cgrp)
>                         WARN_ON_ONCE(cpuctx->cgrp != cgrp);
>                 cpuctx->cgrp = cgrp;
>         }
>
> that? But that still needs a comment to explain _why_ we do that here.
> Under what condition would we fail to have cpuctx->cgrp set while
> ctx->nr_cgroups. Your comment doesn't explain nor does your Changelog.
>

        if (cpuctx->cgrp == NULL) /* As I said above, we only take
care this case. */
             if (add && cgroup_is_descendant(cgrp->css.cgroup,
event->cgrp->css.cgroup)) {
                      /* only when this cgroup is running */
                      cpuctx->cgrp = cgrp;
         }

>> +
>> +     if (add && ctx->nr_cgroups++)
>> +             return;
>> +     else if (!add && --ctx->nr_cgroups)
>> +             return;
>>
>> +     /* no cgroup running */
>> +     if (!add)
>> +             cpuctx->cgrp = NULL;
>> +
>> +     cpuctx_entry = &cpuctx->cgrp_cpuctx_entry;
>> +     if (add)
>>               list_add(cpuctx_entry, this_cpu_ptr(&cgrp_cpuctx_list));
>> +     else
>>               list_del(cpuctx_entry);
>>  }
>>
>>  #else /* !CONFIG_CGROUP_PERF */
>> @@ -2311,6 +2325,20 @@ static int  __perf_install_in_context(void *info)
>>               raw_spin_lock(&task_ctx->lock);
>>       }
>>
>> +#ifdef CONFIG_CGROUP_PERF
>> +     if (is_cgroup_event(event)) {
>> +             /*
>> +              * Only care about cgroup events.
>> +              *
>
> That bit is entirely spurious, if it right after if (is_cgroup_event()),
> obviously this block is only for cgroup events.
>

Totally, : )

>> +              * If only the task belongs to cgroup of this event,
>> +              * we will continue the installment
>
> And that isn't really english. I think you meant to write something
> like:
>
>                 /*
>                  * If the current cgroup doesn't match the event's
>                  * cgroup, we should not try to schedule it.
>                  */
>

Totally again, : ) Thanks

>> +              */
>> +             struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
>> +             reprogram = cgroup_is_descendant(cgrp->css.cgroup,
>> +                                     event->cgrp->css.cgroup);
>> +     }
>> +#endif
>> +
>>       if (reprogram) {
>>               ctx_sched_out(ctx, cpuctx, EVENT_TIME);
>>               add_event_to_ctx(event, ctx);
>> --
>> 2.8.4.31.g9ed660f
>>