[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fa9ffa1f-21d0-b918-d66f-b0a20af00eab@linux.intel.com>
Date: Tue, 22 Oct 2019 15:49:09 +0300
From: Alexey Budankov <alexey.budankov@...ux.intel.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
Ingo Molnar <mingo@...hat.com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
Namhyung Kim <namhyung@...nel.org>,
Andi Kleen <ak@...ux.intel.com>,
Kan Liang <kan.liang@...ux.intel.com>,
Stephane Eranian <eranian@...gle.com>,
Ian Rogers <irogers@...gle.com>,
Song Liu <songliubraving@...com>,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 4/4] perf/core,x86: synchronize PMU task contexts on
optimized context switches
On 22.10.2019 12:43, Peter Zijlstra wrote:
> On Tue, Oct 22, 2019 at 09:01:11AM +0300, Alexey Budankov wrote:
>
>> swap(ctx->task_ctx_data, next_ctx->task_ctx_data);
>>
>> + /*
>> + * PMU specific parts of task perf context can require
>> + * additional synchronization which makes sense only if
>> + * both next_ctx->task_ctx_data and ctx->task_ctx_data
>> + * pointers are allocated. As an example of such
>> + * synchronization see implementation details of Intel
>> + * LBR call stack data profiling;
>> + */
>> + if (ctx->task_ctx_data && next_ctx->task_ctx_data)
>> + pmu->sync_task_ctx(next_ctx->task_ctx_data,
>> + ctx->task_ctx_data);
>
> This still does not check if pmu->sync_task_ctx is set. If any other
> arch ever uses task_ctx_data without then also supplying this method
> things will go *bang*.
>
> Also, I think I prefer the variant I gave you yesterday:
>
> https://lkml.kernel.org/r/20191021103745.GF1800@hirez.programming.kicks-ass.net
>
> if (pmu->swap_task_ctx)
> pmu->swap_task_ctx(ctx, next_ctx);
> else
> swap(ctx->task_ctx_data, next_ctx->task_ctx_data);
>
> That also unconfuses the argument order in your above patch (where you
> have to undo thw swap).
>
> Alternatively, since there currently is no other arch using
> task_ctx_data, we can make the pmu::swap_task_ctx() thing mandatory when
> having it and completely replace the swap(), write it like so:
>
>
> - swap(ctx->task_ctx_data, next_ctx->task_ctx_data);
It still has to be swapped unconditionally. Thus, it will be a part of
architecture specific implementation:
void intel_pmu_lbr_sync_task_ctx(struct x86_perf_task_context **prev,
struct x86_perf_task_context **next)
{
if (*prev && *next) {
swap(*prev->lbr_callstack_users, *next->lbr_callstack_users);
...
}
swap(prev, next);
}
> + if (pmu->swap_task_ctx)
> + pmu->swap_task_ctx(ctx, next_ctx);
>
> Hmm?
This option above looks attractive because it pushes complexity down
towards architecture specific implementation.
However, in order to keep the existing performance at the same level
if (ctx->task_ctx_data && next_ctx->task_ctx_data) check has to be
preserved as closer to the top layer as possible. So the fastest version
appears to look like this:
swap(ctx->task_ctx_data, next_ctx->task_ctx_data);
if (ctx->task_ctx_data && next_ctx->task_ctx_data && pmu->sync_task_ctx)
pmu->sync_task_ctx(next_ctx->task_ctx_data, ctx->task_ctx_data);
If some architecture needs specific synchronization then it is enough
to implement sync_task_ctx() without changing the core.
~Alexey
Powered by blists - more mailing lists