lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 22 Oct 2019 15:49:09 +0300
From:   Alexey Budankov <alexey.budankov@...ux.intel.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Andi Kleen <ak@...ux.intel.com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Stephane Eranian <eranian@...gle.com>,
        Ian Rogers <irogers@...gle.com>,
        Song Liu <songliubraving@...com>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 4/4] perf/core,x86: synchronize PMU task contexts on
 optimized context switches


On 22.10.2019 12:43, Peter Zijlstra wrote:
> On Tue, Oct 22, 2019 at 09:01:11AM +0300, Alexey Budankov wrote:
> 
>>  			swap(ctx->task_ctx_data, next_ctx->task_ctx_data);
>>  
>> +			/*
>> +			 * PMU specific parts of task perf context can require
>> +			 * additional synchronization which makes sense only if
>> +			 * both next_ctx->task_ctx_data and ctx->task_ctx_data
>> +			 * pointers are allocated. As an example of such
>> +			 * synchronization see implementation details of Intel
>> +			 * LBR call stack data profiling;
>> +			 */
>> +			if (ctx->task_ctx_data && next_ctx->task_ctx_data)
>> +				pmu->sync_task_ctx(next_ctx->task_ctx_data,
>> +						   ctx->task_ctx_data);
> 
> This still does not check if pmu->sync_task_ctx is set. If any other
> arch ever uses task_ctx_data without then also supplying this method
> things will go *bang*.
> 
> Also, I think I prefer the variant I gave you yesterday:
> 
>   https://lkml.kernel.org/r/20191021103745.GF1800@hirez.programming.kicks-ass.net
> 
> 	if (pmu->swap_task_ctx)
> 		pmu->swap_task_ctx(ctx, next_ctx);
> 	else
> 		swap(ctx->task_ctx_data, next_ctx->task_ctx_data);
> 
> That also unconfuses the argument order in your above patch (where you
> have to undo thw swap).
> 
> Alternatively, since there currently is no other arch using
> task_ctx_data, we can make the pmu::swap_task_ctx() thing mandatory when
> having it and completely replace the swap(), write it like so:
> 
> 
> -	swap(ctx->task_ctx_data, next_ctx->task_ctx_data);

It still has to be swapped unconditionally. Thus, it will be a part of 
architecture specific implementation:

void intel_pmu_lbr_sync_task_ctx(struct x86_perf_task_context **prev,
				 struct x86_perf_task_context **next)
{
	if (*prev && *next) {
		swap(*prev->lbr_callstack_users, *next->lbr_callstack_users);
                ...
        }
	swap(prev, next);
}


> +	if (pmu->swap_task_ctx)
> +		pmu->swap_task_ctx(ctx, next_ctx);
> 
> Hmm?

This option above looks attractive because it pushes complexity down 
towards architecture specific implementation.

However, in order to keep the existing performance at the same level
if (ctx->task_ctx_data && next_ctx->task_ctx_data) check has to be 
preserved as closer to the top layer as possible. So the fastest version
appears to look like this:

swap(ctx->task_ctx_data, next_ctx->task_ctx_data);
if (ctx->task_ctx_data && next_ctx->task_ctx_data && pmu->sync_task_ctx)
	pmu->sync_task_ctx(next_ctx->task_ctx_data, ctx->task_ctx_data);

If some architecture needs specific synchronization then it is enough 
to implement sync_task_ctx() without changing the core.

~Alexey

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ