linux-kernel - Re: [External] Re: [PATCH v2 2/6] perf/core: Introduce percpu perf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c4fea458-b0e1-9842-91ea-5a1c6f7e8387@bytedance.com>
Date:   Wed, 23 Mar 2022 21:37:16 +0800
From:   Chengming Zhou <zhouchengming@...edance.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     mingo@...hat.com, acme@...nel.org, mark.rutland@....com,
        alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
        namhyung@...nel.org, eranian@...gle.com,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        duanxiongchun@...edance.com, songmuchun@...edance.com
Subject: Re: [External] Re: [PATCH v2 2/6] perf/core: Introduce percpu
 perf_cgroup

On 2022/3/23 9:17 下午, Peter Zijlstra wrote:
> On Wed, Mar 23, 2022 at 09:07:01PM +0800, Chengming Zhou wrote:
>> On 2022/3/23 8:51 下午, Peter Zijlstra wrote:
>>> On Tue, Mar 22, 2022 at 08:08:30PM +0800, Chengming Zhou wrote:
>>>
>>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>>> index 8b5cf2aedfe6..848a3bfa9513 100644
>>>> --- a/kernel/events/core.c
>>>> +++ b/kernel/events/core.c
>>>
>>>> @@ -843,11 +845,21 @@ static void perf_cgroup_switch(struct task_struct *task)
>>>>  	 */
>>>>  	local_irq_save(flags);
>>>>  
>>>> +	cgrp = perf_cgroup_from_task(task, NULL);
>>>> +	if (cgrp == __this_cpu_read(cpu_perf_cgroup))
>>>> +		goto out;
> 
> So this compares the cpu thing against the task thing, if matching, we
> bail.
> 
>>>> +
>>>> +	__this_cpu_write(cpu_perf_cgroup, cgrp);
> 
> Then we set cpu thing.
> 
>>>> +
>>>>  	list = this_cpu_ptr(&cgrp_cpuctx_list);
>>>>  	list_for_each_entry_safe(cpuctx, tmp, list, cgrp_cpuctx_entry) {
>>>>  		WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0);
>>>>  
>>>>  		perf_ctx_lock(cpuctx, cpuctx->task_ctx);
>>>> +
>>>> +		if (cpuctx->cgrp == cgrp)
>>>> +			continue;
>>>> +
>>>>  		perf_pmu_disable(cpuctx->ctx.pmu);
>>>>  
>>>>  		cpu_ctx_sched_out(cpuctx, EVENT_ALL);
> 
>>>> +		cpuctx->cgrp = cgrp
> 
> But here we already have exactly the same pattern, we match cpuctx thing
> against task thing and skip/set etc.

Yes, cpu_perf_cgroup is just "cache" which cgrp assigned to cpuctx->cgrp.

> 
>>> Also, I really don't see the point of cpu_perf_cgroup, cpuctx->cgrp
>>> should be able to do this.
>>
>> But the problem is that we have two cpuctx on the percpu list, do you
>> mean we should use perf_cgroup of the first cpuctx to compare with
>> the next task's perf_cgroup ?
>>
>> Or we should delete the cgrp in cpuctx, and use this new percpu cpu_perf_cgroup?
> 
> I'm a bit confused, per the above, you already do exactly what the new
> cpu_perf_cgroup does on the cpuctx->cgrp variable. AFAICT the only think
> the new per-cpu variable does is avoid a lock, howveer:

we have cgrp in cpuctx make me confused at first too.

perf_cgroup_event_enable()
	if (ctx->is_active && !cpuctx->cgrp)
		if ...
			cpuctx->cgrp = cgrp;   -->  set cpuctx->cgrp when enable event

	list_add(&cpuctx->cgrp_cpuctx_entry,
			per_cpu_ptr(&cgrp_cpuctx_list, event->cpu))  --> add cpuctx on percpu list

But we have two (hw and sw) cpuctx, so these two cpuctx->cgrp maybe different...

This is the reason why I want to create a percpu cpu_perf_cgroup,
just "cache" cgrp last scheduled, to compare with next task to decide
whether perf_cgroup_switch() can be skipped.

But you are right, having cpuctx->cgrp and cpu_perf_cgroup make things confused..
maybe we can delete cpuctx->cgrp, change to use percpu cpu_perf_cgroup?

> 
> 
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -833,6 +833,7 @@ static DEFINE_PER_CPU(struct list_head,
>>>   */
>>>  static void perf_cgroup_switch(struct task_struct *task)
>>>  {
>>> +	struct perf_cgroup *cgrp;
>>>  	struct perf_cpu_context *cpuctx, *tmp;
>>>  	struct list_head *list;
>>>  	unsigned long flags;
>>> @@ -843,11 +844,20 @@ static void perf_cgroup_switch(struct ta
>>>  	 */
>>>  	local_irq_save(flags);
>>>  
>>> +	cgrp = perf_cgroup_from_task(task, NULL);
>>> +
>>>  	list = this_cpu_ptr(&cgrp_cpuctx_list);
>>>  	list_for_each_entry_safe(cpuctx, tmp, list, cgrp_cpuctx_entry) {
>>>  		WARN_ON_ONCE(cpuctx->ctx.nr_cgroups == 0);
>>>  
>>> +		if (READ_ONCE(cpuctx->cgrp == cgrp))
>>> +			continue
> 
> I think we can avoid that by doing an early check, hmm?

perf_cgroup_switch() can be called from context_switch(), or __perf_cgroup_move() from IPI.
Say if in context_switch() already set cpuctx->cgrp to the new task->cgroups, then context_switch()
finish, handle IPI in __perf_cgroup_move(), we don't need to do sched_out/in again.

Thanks.

> 
>>> +
>>>  		perf_ctx_lock(cpuctx, cpuctx->task_ctx);
>>> +
>>> +		if (cpuctx->cgrp == cgrp)
>>> +			goto next;
>>> +
>>>  		perf_pmu_disable(cpuctx->ctx.pmu);
>>>  
>>>  		cpu_ctx_sched_out(cpuctx, EVENT_ALL);
>>> @@ -855,50 +865,22 @@ static void perf_cgroup_switch(struct ta
>>>  		 * must not be done before ctxswout due
>>>  		 * to event_filter_match() in event_sched_out()
>>>  		 */
>>> -		cpuctx->cgrp = perf_cgroup_from_task(task,
>>> -						     &cpuctx->ctx);
>>> +		WRITE_ONCE(cpuctx->cgrp, cgrp);