lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cf2cdf28-8678-8e61-9992-a460e61d3ce2@amd.com>
Date:   Wed, 24 Aug 2022 10:37:36 +0530
From:   Ravi Bangoria <ravi.bangoria@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     acme@...nel.org, alexander.shishkin@...ux.intel.com,
        jolsa@...hat.com, namhyung@...nel.org, songliubraving@...com,
        eranian@...gle.com, alexey.budankov@...ux.intel.com,
        ak@...ux.intel.com, mark.rutland@....com, megha.dey@...el.com,
        frederic@...nel.org, maddy@...ux.ibm.com, irogers@...gle.com,
        kim.phillips@....com, linux-kernel@...r.kernel.org,
        santosh.shukla@....com, ravi.bangoria@....com
Subject: Re: [RFC v2] perf: Rewrite core context handling

On 23-Aug-22 2:27 PM, Peter Zijlstra wrote:
> On Tue, Aug 02, 2022 at 11:46:32AM +0530, Ravi Bangoria wrote:
>> On 13-Jun-22 8:13 PM, Peter Zijlstra wrote:
>>> On Mon, Jun 13, 2022 at 04:35:11PM +0200, Peter Zijlstra wrote:
> 
>>>> +static void ctx_pinned_sched_in(struct perf_event_context *ctx, struct pmu *pmu)
>>>>  {
>>>> +	struct perf_event_pmu_context *pmu_ctx;
>>>>  	int can_add_hw = 1;
>>>>  
>>>> -	if (ctx != &cpuctx->ctx)
>>>> -		cpuctx = NULL;
>>>> -
>>>> -	visit_groups_merge(cpuctx, &ctx->pinned_groups,
>>>> -			   smp_processor_id(),
>>>> -			   merge_sched_in, &can_add_hw);
>>>> +	if (pmu) {
>>>> +		visit_groups_merge(ctx, &ctx->pinned_groups,
>>>> +				   smp_processor_id(), pmu,
>>>> +				   merge_sched_in, &can_add_hw);
>>>> +	} else {
>>>> +		/*
>>>> +		 * XXX: This can be optimized for per-task context by calling
>>>> +		 * visit_groups_merge() only once with:
>>>> +		 *   1) pmu=NULL
>>>> +		 *   2) Ignoring pmu in perf_event_groups_cmp() when it's NULL
>>>> +		 *   3) Making can_add_hw a per-pmu variable
>>>> +		 *
>>>> +		 * Though, it can not be opimized for per-cpu context because
>>>> +		 * per-cpu rb-tree consist of pmu-subtrees and pmu-subtrees
>>>> +		 * consist of cgroup-subtrees. i.e. a cgroup events of same
>>>> +		 * cgroup but different pmus are seperated out into respective
>>>> +		 * pmu-subtrees.
>>>> +		 */
>>>> +		list_for_each_entry(pmu_ctx, &ctx->pmu_ctx_list, pmu_ctx_entry) {
>>>> +			can_add_hw = 1;
>>>> +			visit_groups_merge(ctx, &ctx->pinned_groups,
>>>> +					   smp_processor_id(), pmu_ctx->pmu,
>>>> +					   merge_sched_in, &can_add_hw);
>>>> +		}
>>>> +	}
>>>>  }
>>>
>>> I'm not sure I follow.. task context can have multiple PMUs just the
>>> same as CPU context can, that's more or less the entire point of the
>>> patch.
>>
>> Current rbtree key is {cpu, cgroup_id, group_idx}. However, effective key for
>> task specific context is {cpu, group_idx} because cgroup_id is always 0. And
>> effective key for cpu specific context is {cgroup_id, group_idx} because cpu
>> is same for entire rbtree.
>>
>> With New design, rbtree key will be {cpu, pmu, cgroup_id, group_idx}. But as
>> explained above, effective key for task specific context will be {cpu, pmu,
>> group_idx}. Thus, we can handle pmu=NULL in visit_groups_merge(), same as you
>> did in the very first RFC[1]. (This may make things more complicated though
>> because we might also need to increase heap size to accommodate all pmu events
>> in single heap. Current heap size is 2 for task specific context, which is
>> sufficient if we iterate over all pmus).
>>
>> Same optimization won't work for cpu specific context because, it's effective
>> key would be {pmu, cgroup_id, group_idx} i.e. each pmu subtree is made up of
>> cgroup subtrees.
> 
> Agreed, new order is: {cpu, pmu, cgroup_id, group_idx}
> 
> Event scheduling looks at the {cpu, pmu, cgroup_id} subtree to find the
> leftmost group_idx event to schedule next.
> 
> However, since cgroup events are per-cpu events, per-task events will
> always have cgroup=NULL. Resulting in the subtrees:
> 
>   {-1, pmu, NULL} and {cpu, pmu, NULL}
> 
> Which is what the code does, it iterates ctx->pmu_ctx_list to find all
> @pmu values and then for each does the schedule dance.
> 
> Now, I suppose making that:
> 
>   {-1, NULL, NULL}, {cpu, NULL, NULL}
> 
> could work, but wouldn't iterating the the tree be more expensive than
> just finding the sub-trees as we do now?

pmu=NULL can be used while scheduling entire context. We can just traverse
through all pmu events of both cpu subtrees.

> 
> You also talk about extending extending the heap, which I read like
> doing the heap-merge over:
> 
>  {-1, pmu0, NULL}, {-1, pmu1, NULL}, ...
>  {cpu, pmu0, NULL}, ...
> 
> But that doesn't make sense, the schedule dance is per-pmu.
> 
> Or am I just still not getting it?

Ok. Let's not complicate the design. We can go with current approach of
iterating over all pmus in the first phase and think about optimizing it
later.

Thanks,
Ravi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ