linux-kernel - Re: [RFC v2] perf: Rewrite core context handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5c6a53dc-99f2-cb82-5aee-05d9b5061b42@amd.com>
Date:   Mon, 31 Jan 2022 10:13:09 +0530
From:   Ravi Bangoria <ravi.bangoria@....com>
To:     peterz@...radead.org
Cc:     acme@...nel.org, alexander.shishkin@...ux.intel.com,
        jolsa@...hat.com, namhyung@...nel.org, songliubraving@...com,
        eranian@...gle.com, alexey.budankov@...ux.intel.com,
        ak@...ux.intel.com, mark.rutland@....com, megha.dey@...el.com,
        frederic@...nel.org, maddy@...ux.ibm.com, irogers@...gle.com,
        kim.phillips@....com, linux-kernel@...r.kernel.org,
        santosh.shukla@....com
Subject: Re: [RFC v2] perf: Rewrite core context handling



On 13-Jan-22 7:17 PM, Ravi Bangoria wrote:
> From: Peter Zijlstra <peterz@...radead.org>
> 
> This is the 2nd version of RFC originally posted by Peter[1].
> 
> There have been various issues and limitations with the way perf uses
> (task) contexts to track events. Most notable is the single hardware
> PMU task context, which has resulted in a number of yucky things (both
> proposed and merged).
> 
> Notably:
>  - HW breakpoint PMU
>  - ARM big.little PMU / Intel ADL PMU
>  - Intel Branch Monitoring PMU
>  - AMD IBS
> 
> Current design:
> ---------------
> Currently we have a per task and per cpu perf_event_contexts:
> 
>   task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context
>        ^                                 |    ^     |
>        `---------------------------------'    |     `--> pmu
>                                               v           ^
>                                          perf_event ------'
> 
> Each task has an array of pointers to a perf_event_context. Each
> perf_event_context has a direct relation to a PMU and a group of
> events for that PMU. The task related perf_event_context's have a
> pointer back to that task.
> 
> Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which
> includes a perf_event_context, which again has a direct relation to
> that PMU, and a group of events for that PMU.
> 
> The perf_cpu_context also tracks which task context is currently
> associated with that CPU and includes a few other things like the
> hrtimer for rotation etc.
> 
> Each perf_event is then associated with its PMU and one
> perf_event_context.
> 
> Proposed design:
> ----------------
> New design proposed by this patch reduce to a single task context and
> a single CPU context but adds some intermediate data-structures:
> 
>   task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context
>        ^                                 |    ^ ^
>        `---------------------------------'    | |
>                                               | |    perf_cpu_pmu_context
>                                               | `----.    ^
>                                               |      |    |
>                                               |      v    v
>                                               | ,--> perf_event_pmu_context
>                                               | |         ^
>                                               | |         |
>                                               v v         v
>                                          perf_event ---> pmu
> 
> With new design, perf_event_context will hold all pmu events in the
> respective(pinned/flexible) rbtrees. This can be achieved by adding
> pmu to rbtree key:
> 
>   {cpu, pmu, cgroup_id, group_index}
> 
> Each perf_event_context carry a list of perf_event_pmu_context which
> is used to hold per-pmu-per-context state. For ex, it keeps track of
> currently active events for that pmu, a pmu specific task_ctx_data,
> a flag to tell whether rotation is required or not etc.
> 
> Similarly perf_cpu_pmu_context is used to hold per-pmu-per-cpu state
> like hrtimer details to drive the event rotation, a pointer to
> perf_event_pmu_context of currently running task and some other
> ancillary information.
> 
> Each perf_event is associated to it's pmu, perf_event_context and
> perf_event_pmu_context.
> 
> Original RFC -> RFC v2:
> -----------------------
> In addition to porting the patch to latest (v5.16-rc6) kernel, here
> are some of the major changes between two revisions:
> 
> - There were quite a bit of fundamental changes since original patch.
>   Most notably a rbtree key has changed from {cpu,group_index} to
>   {cpu,cgroup_id,group_index}. Adding a pmu key in between as proposed
>   in original patch is not straight forward as it will break cgroup
>   specific optimization. Hence we need to iterate over all pmu_ctx
>   for a given ctx and call visit_groups_merge() one by one.
> - Enabled cgroup support (CGROUP_PERF).
> - Some changes wrt multiplexing events as with new design the rotation
>   happens at cgroup subtree unlike at pmu subtree in original patch.
> 
> Because of additional complexity above changes bring in, I thought to
> get initial review about the overall approach before starting to make it
> upstream ready. Hence this patch just provides an idea of the direction
> we will head toward. Many loose ends in the patch rightnow. Like, I've
> not paid much attention to synchronization related aspects. Similarly,
> some of the issues marked in original patch (XXX) haven't been fixed.
> 
> A simple perf stat/record/top survives with the patch but machine
> crashes with first run of perf test (stale cpc->task_epc causing the
> crash). Lockdep is also screaming a lot :)

Hi Peter, can you please review this.

Thanks,
Ravi