[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AF0FCD08-E014-4D29-B023-EA780056DBB5@fb.com>
Date: Mon, 15 Oct 2018 22:09:05 +0000
From: Song Liu <songliubraving@...com>
To: Peter Zijlstra <peterz@...radead.org>
CC: Alexey Budankov <alexey.budankov@...ux.intel.com>,
Ingo Molnar <mingo@...nel.org>,
lkml <linux-kernel@...r.kernel.org>,
"acme@...nel.org" <acme@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
Stephane Eranian <eranian@...gle.com>,
"Thomas Gleixner" <tglx@...utronix.de>,
"mark.rutland@....com" <mark.rutland@....com>,
"megha.dey@...el.com" <megha.dey@...el.com>,
"frederic@...nel.org" <frederic@...nel.org>
Subject: Re: [RFC][PATCH] perf: Rewrite core context handling
> On Oct 15, 2018, at 1:34 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Mon, Oct 15, 2018 at 10:26:06AM +0300, Alexey Budankov wrote:
>> Hi,
>>
>> On 10.10.2018 13:45, Peter Zijlstra wrote:
>>> Hi all,
>>>
>>> There have been various issues and limitations with the way perf uses
>>> (task) contexts to track events. Most notable is the single hardware PMU
>>> task context, which has resulted in a number of yucky things (both
>>> proposed and merged).
>>>
>>> Notably:
>>>
>>> - HW breakpoint PMU
>>> - ARM big.little PMU
>>> - Intel Branch Monitoring PMU
>>>
>>> Since we now track the events in RB trees, we can 'simply' add a pmu
>>> order to them and have them grouped that way, reducing to a single
>>> context. Of course, reality never quite works out that simple, and below
>>> ends up adding an intermediate data structure to bridge the context ->
>>> pmu mapping.
>>>
>>> Something a little like:
>>>
>>> ,------------------------[1:n]---------------------.
>>> V V
>>> perf_event_context <-[1:n]-> perf_event_pmu_context <--- perf_event
>>> ^ ^ | |
>>> `--------[1:n]---------' `-[n:1]-> pmu <-[1:n]-'
>>>
>>> This patch builds (provided you disable CGROUP_PERF), boots and survives
>>> perf-top without the machine catching fire.
>>>
>>> There's still a fair bit of loose ends (look for XXX), but I think this
>>> is the direction we should be going.
>>>
>>> Comments?
>>>
>>> Not-Quite-Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
>>> ---
>>> arch/powerpc/perf/core-book3s.c | 4
>>> arch/x86/events/core.c | 4
>>> arch/x86/events/intel/core.c | 6
>>> arch/x86/events/intel/ds.c | 6
>>> arch/x86/events/intel/lbr.c | 16
>>> arch/x86/events/perf_event.h | 6
>>> include/linux/perf_event.h | 80 +-
>>> include/linux/sched.h | 2
>>> kernel/events/core.c | 1412 ++++++++++++++++++++--------------------
>>> 9 files changed, 815 insertions(+), 721 deletions(-)
>>
>> Rewrite is impressive however it doesn't result in code base reduction as it is.
>
> Yeah.. that seems to be nature of these things ..
>
>> Nonetheless there is a clear demand for per pmu events groups tracking and rotation
>> in single cpu context (HW breakpoints, ARM big.little, Intel LBRs) and there is
>> a supply thru groups ordering on RB-tree.
>>
>> This might be driven into the kernel by some new Perf features that would base on
>> that RB-tree groups ordering or by refactoring of existing code but in the way it
>> would result in overall code base reduction thus lowering support cost.
>
> If you have a concrete suggestion on how to reduce complexity? I tried,
> but couldn't find any (without breaking something).
>
> The active lists and pmu_ctx_list could arguably be replaced with
> (slower) iteratons over the RB tree, but you'll still need the per pmu
> nr_events/nr_active counts to determine if rotation is required at all.
>
> And like you know, performance is quite important here too. I'd love to
> reduce complexity while maintaining or improve performance, but that
> rarely if ever happens :/
How about this:
1. Keep multiple perf_cpu_context per CPU, just like before this patch.
2. For perf_event_context, add PMU as an order for the RB tree.
3. (hw) pmu->perf_cpu_context->ctx only has events for this PMU (and sw
events moved to this context).
4. task->perf_event_ctxp has events for all PMUs.
With this path, we keep the existing perf_cpu_context/perf_event_context
logic as-is, which I think is simp.ler than the new logic (with extra
*_pmu_context). And it should also solve the problem.
Does this make sense? If this doesn't look too broken, I am happy to
draft RFC for it.
Thanks,
Song
Powered by blists - more mailing lists