[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5c6a53dc-99f2-cb82-5aee-05d9b5061b42@amd.com>
Date: Mon, 31 Jan 2022 10:13:09 +0530
From: Ravi Bangoria <ravi.bangoria@....com>
To: peterz@...radead.org
Cc: acme@...nel.org, alexander.shishkin@...ux.intel.com,
jolsa@...hat.com, namhyung@...nel.org, songliubraving@...com,
eranian@...gle.com, alexey.budankov@...ux.intel.com,
ak@...ux.intel.com, mark.rutland@....com, megha.dey@...el.com,
frederic@...nel.org, maddy@...ux.ibm.com, irogers@...gle.com,
kim.phillips@....com, linux-kernel@...r.kernel.org,
santosh.shukla@....com
Subject: Re: [RFC v2] perf: Rewrite core context handling
On 13-Jan-22 7:17 PM, Ravi Bangoria wrote:
> From: Peter Zijlstra <peterz@...radead.org>
>
> This is the 2nd version of RFC originally posted by Peter[1].
>
> There have been various issues and limitations with the way perf uses
> (task) contexts to track events. Most notable is the single hardware
> PMU task context, which has resulted in a number of yucky things (both
> proposed and merged).
>
> Notably:
> - HW breakpoint PMU
> - ARM big.little PMU / Intel ADL PMU
> - Intel Branch Monitoring PMU
> - AMD IBS
>
> Current design:
> ---------------
> Currently we have a per task and per cpu perf_event_contexts:
>
> task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context
> ^ | ^ |
> `---------------------------------' | `--> pmu
> v ^
> perf_event ------'
>
> Each task has an array of pointers to a perf_event_context. Each
> perf_event_context has a direct relation to a PMU and a group of
> events for that PMU. The task related perf_event_context's have a
> pointer back to that task.
>
> Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which
> includes a perf_event_context, which again has a direct relation to
> that PMU, and a group of events for that PMU.
>
> The perf_cpu_context also tracks which task context is currently
> associated with that CPU and includes a few other things like the
> hrtimer for rotation etc.
>
> Each perf_event is then associated with its PMU and one
> perf_event_context.
>
> Proposed design:
> ----------------
> New design proposed by this patch reduce to a single task context and
> a single CPU context but adds some intermediate data-structures:
>
> task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context
> ^ | ^ ^
> `---------------------------------' | |
> | | perf_cpu_pmu_context
> | `----. ^
> | | |
> | v v
> | ,--> perf_event_pmu_context
> | | ^
> | | |
> v v v
> perf_event ---> pmu
>
> With new design, perf_event_context will hold all pmu events in the
> respective(pinned/flexible) rbtrees. This can be achieved by adding
> pmu to rbtree key:
>
> {cpu, pmu, cgroup_id, group_index}
>
> Each perf_event_context carry a list of perf_event_pmu_context which
> is used to hold per-pmu-per-context state. For ex, it keeps track of
> currently active events for that pmu, a pmu specific task_ctx_data,
> a flag to tell whether rotation is required or not etc.
>
> Similarly perf_cpu_pmu_context is used to hold per-pmu-per-cpu state
> like hrtimer details to drive the event rotation, a pointer to
> perf_event_pmu_context of currently running task and some other
> ancillary information.
>
> Each perf_event is associated to it's pmu, perf_event_context and
> perf_event_pmu_context.
>
> Original RFC -> RFC v2:
> -----------------------
> In addition to porting the patch to latest (v5.16-rc6) kernel, here
> are some of the major changes between two revisions:
>
> - There were quite a bit of fundamental changes since original patch.
> Most notably a rbtree key has changed from {cpu,group_index} to
> {cpu,cgroup_id,group_index}. Adding a pmu key in between as proposed
> in original patch is not straight forward as it will break cgroup
> specific optimization. Hence we need to iterate over all pmu_ctx
> for a given ctx and call visit_groups_merge() one by one.
> - Enabled cgroup support (CGROUP_PERF).
> - Some changes wrt multiplexing events as with new design the rotation
> happens at cgroup subtree unlike at pmu subtree in original patch.
>
> Because of additional complexity above changes bring in, I thought to
> get initial review about the overall approach before starting to make it
> upstream ready. Hence this patch just provides an idea of the direction
> we will head toward. Many loose ends in the patch rightnow. Like, I've
> not paid much attention to synchronization related aspects. Similarly,
> some of the issues marked in original patch (XXX) haven't been fixed.
>
> A simple perf stat/record/top survives with the patch but machine
> crashes with first run of perf test (stale cpc->task_epc causing the
> crash). Lockdep is also screaming a lot :)
Hi Peter, can you please review this.
Thanks,
Ravi
Powered by blists - more mailing lists