linux-kernel - Re: [PATCH] perf/core: Fix cgroup events tracking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <5646805c-972d-d1b2-eb81-54cd015735ad@bytedance.com>
Date:   Wed, 7 Dec 2022 19:19:45 +0800
From:   Chengming Zhou <zhouchengming@...edance.com>
To:     Ravi Bangoria <ravi.bangoria@....com>
Cc:     linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        peterz@...radead.org, mingo@...hat.com, acme@...nel.org,
        mark.rutland@....com, alexander.shishkin@...ux.intel.com,
        jolsa@...nel.org, namhyung@...nel.org
Subject: Re: [PATCH] perf/core: Fix cgroup events tracking

On 2022/12/7 18:41, Ravi Bangoria wrote:
> On 06-Dec-22 8:20 AM, Chengming Zhou wrote:
>> We encounter perf warnings when using cgroup events like:
>> ```
>> cd /sys/fs/cgroup
>> mkdir test
>> perf stat -e cycles -a -G test
>> ```
>>
>> WARNING: CPU: 0 PID: 690 at kernel/events/core.c:849 perf_cgroup_switch+0xb2/0xc0
>> [   91.393417] Call Trace:
>> [   91.393772]  <TASK>
>> [   91.394080]  __schedule+0x4ae/0x9f0
>> [   91.394535]  ? _raw_spin_unlock_irqrestore+0x23/0x40
>> [   91.395145]  ? __cond_resched+0x18/0x20
>> [   91.395622]  preempt_schedule_common+0x2d/0x70
>> [   91.396163]  __cond_resched+0x18/0x20
>> [   91.396621]  wait_for_completion+0x2f/0x160
>> [   91.397137]  ? cpu_stop_queue_work+0x9e/0x130
>> [   91.397665]  affine_move_task+0x18a/0x4f0
> 
> nit: These timestamps can be removed in commit log.

Ok, will remove.

> 
>>
>> WARNING: CPU: 0 PID: 690 at kernel/events/core.c:829 ctx_sched_in+0x1cf/0x1e0
>> [   91.430151] Call Trace:
>> [   91.430490]  <TASK>
>> [   91.430793]  ? ctx_sched_out+0xb7/0x1b0
>> [   91.431274]  perf_cgroup_switch+0x88/0xc0
>> [   91.431778]  __schedule+0x4ae/0x9f0
>> [   91.432215]  ? _raw_spin_unlock_irqrestore+0x23/0x40
>> [   91.432825]  ? __cond_resched+0x18/0x20
>> [   91.433299]  preempt_schedule_common+0x2d/0x70
>> [   91.433839]  __cond_resched+0x18/0x20
>> [   91.434298]  wait_for_completion+0x2f/0x160
>> [   91.434808]  ? cpu_stop_queue_work+0x9e/0x130
>> [   91.435334]  affine_move_task+0x18a/0x4f0
>>
>> The above two warnings are not complete here since I remove other
>> unimportant information. The problem is caused by the perf cgroup
>> events tracking:
>>
>> CPU0					CPU1
>> perf_event_open()
>>   perf_event_alloc()
>>     account_event()
>>       account_event_cpu()
>>         atomic_inc(perf_cgroup_events)
>> 					__perf_event_task_sched_out()
>> 					  if (atomic_read(perf_cgroup_events))
>> 					    perf_cgroup_switch()
>> 					      // kernel/events/core.c:849
>> 					      WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0)
>> 					      if (READ_ONCE(cpuctx->cgrp) == cgrp) // false
>> 					        return
>> 					      perf_ctx_lock()
>> 					      ctx_sched_out()
>> 					      cpuctx->cgrp = cgrp
>> 					      ctx_sched_in()
>> 					        perf_cgroup_set_timestamp()
>> 					          // kernel/events/core.c:829
>> 					          WARN_ON_ONCE(!ctx->nr_cgroups)
>> 					      perf_ctx_unlock()
>>   perf_install_in_context()
>>     add_event_to_ctx()
>>       list_add_event()
>>         perf_cgroup_event_enable()
>>           ctx->nr_cgroups++
>>           cpuctx->cgrp = X
> 
> IIUC, since it's a cgroup event, perf_install_in_context() will do:
> cpu_function_call(cpu, __perf_install_in_context, event). And thus,
> callchain starting with add_event_to_ctx() will be executed on CPU1,
> not on CPU0.

Right, will fix it next version.

> 
>> We can see from above that we wrongly use percpu atomic perf_cgroup_events
>> to check if we need to perf_cgroup_switch(), which should only be used
>> when we know this CPU has cgroup events enabled.
>>
>> The commit bd2756811766 ("perf: Rewrite core context handling") change
>> to have only one context per-CPU, so we can just use cpuctx->cgrp to
>> check if this CPU has cgroup events enabled.
>>
>> So percpu atomic perf_cgroup_events is not needed.
>>
>> Signed-off-by: Chengming Zhou <zhouchengming@...edance.com>
> 
> Fixes: bd2756811766 ("perf: Rewrite core context handling")
> 
> Otherwise looks good.
> Tested-by: Ravi Bangoria <ravi.bangoria@....com>

Ok, will add Fixes tag next version.

Thanks!

> 
> Thanks,
> Ravi