linux-kernel - Fwd: Re: [RFC 0/6] optimize ctx switch with rb-tree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <c53a14b2-4745-eb2c-b5e3-582479ae5501@linux.intel.com>
Date:   Thu, 11 May 2017 15:13:03 +0300
From:   Alexey Budankov <alexey.budankov@...ux.intel.com>
To:     kan.liang@...el.com, linux-kernel@...r.kernel.org, x86@...nel.org,
        mingo@...hat.com, tglx@...utronix.de, ak@...ux.intel.com,
        peterz@...radead.org, bp@...e.de,
        srinivas.pandruvada@...ux.intel.com, dave.hansen@...ux.intel.com,
        vikas.shivappa@...ux.intel.com, mark.rutland@....com,
        acme@...nel.org, vince@...ter.net, pjt@...gle.com,
        eranian@...gle.com, Dmitry.Prohorov@...el.com,
        Valery.Cherepennikov@...el.com
Subject: Fwd: Re: [RFC 0/6] optimize ctx switch with rb-tree




-------- Forwarded Message --------
Subject: Re: [RFC 0/6] optimize ctx switch with rb-tree
Date: Thu, 11 May 2017 14:53:48 +0300
From: Alexey Budankov <alexey.budankov@...ux.intel.com>
Organization: Intel Corp.
To: davidcc@...gle.com
CC: alexey.budankov@...ux.intel.com

On 02.05.2017 23:59, Budankov, Alexey wrote:

> Subject: Re: [RFC 0/6] optimize ctx switch with rb-tree
> 
> On Wed, Apr 26, 2017 at 3:34 AM, Budankov, Alexey <alexey.budankov@...el.com> wrote:
>> Hi David,
>>
>> I would like to take over on the patches development relying on your help with reviews.
> 
> Sounds good.

Hi,

Sorry for long reply due to Russian holidays joined with vacations.

So, I see that event_sched_out() function (4.11.0-rc6+) additionally to 
disabling an active (PERF_EVENT_STATE_ACTIVE) event in HW also performs 
updates of tstamp fields for inactive (PERF_EVENT_STATE_INACTIVE) events 
assigned to "the other" cpus (different from the one that is executing 
the function):

static void
event_sched_out(struct perf_event *event,
		  struct perf_cpu_context *cpuctx,
		  struct perf_event_context *ctx)
{
	u64 tstamp = perf_event_time(event);
	u64 delta;

	WARN_ON_ONCE(event->ctx != ctx);
	lockdep_assert_held(&ctx->lock);

	/*
	 * An event which could not be activated because of
	 * filter mismatch still needs to have its timings
	 * maintained, otherwise bogus information is return
	 * via read() for time_enabled, time_running:
	 */
->	if (event->state == PERF_EVENT_STATE_INACTIVE &&
->	    !event_filter_match(event)) {
->		delta = tstamp - event->tstamp_stopped;
->		event->tstamp_running += delta;
->		event->tstamp_stopped = tstamp;
->	}
->
->	if (event->state != PERF_EVENT_STATE_ACTIVE)
->		return;

I suggest moving this updating work to the context of 
perf_event_task_tick() callback. It goes per-cpu with 1ms interval by 
default so the inactive events will get their tstamp fields updated 
aside of this critical path:

event_sched_out()
     group_sched_out()
         ctx_sched_out()
             perf_rotate_context()
                 perf_mux_hrtimer_handler()

This change will shorten performance critical processing to active 
events only as the amount of the events is always HW-limited.

> 
>> Could you provide me with the cumulative patch set to expedite the ramp up?
> 
> This RFC is my latest version. I did not have a good solution on how to solve the problem of handling failure of PMUs that share contexts, and to activate/inactivate them.
> 
> Some things to keep in mind when dealing with task-contexts are:
>    1. The number of PMUs is large and growing, iterating over all PMUs may be expensive (see https://lkml.org/lkml/2017/1/18/859 ).
>    2. event_filter_match in this RFC is only used because I did not find a better ways to filter out events with the rb-tree. It would be nice if we wouldn't have to check event->cpu != -1 && event->cpu ==
> smp_processor_id() and cgroup stuff for every event in task contexts.

Checking an event for cpu affinity will be avoided, at least for the 
critical path mentioned above.

>    3. I used the inactive events list in this RFC as a cheaper alternative to threading the rb-tree but it has the problem that events that are removed due to conflict would be placed at the end of the list even if didn't run. I cannot recall if that ever happens. > Using this list also causes problem (2.) maybe threading the tree isa better alternative?

The simplest RB-tree key for the change being suggested could be like 
{event->cpu, event->id} so events could be organized into per-cpu 
sub-trees for fast search and enumeration. All software event could be 
grouped under event->cpu == -1.

>    4. Making the key in task-events to be {PMU,CPU,last_time_scheduled} (as opposed to {CPU,last_time_scheduled} in the RFC) may simplify sched in by helping to iterate over all events in same PMU at once, simplifying the activation/inactivation of the PMU and making it simple to move to the next PMU on pmu::add errors. The problem with this approach is to find only the PMUs with inactive events without traversing a list of all PMUs. Maybe a per-context list of active PMUs may help (see 1.).
> 
> cpu-contexts are much simpler and I think work well with what the RFC does (they are per-pmu already).
> 
> This thread has Peter and Mark's original discussion of the rb-tree (https://patchwork.kernel.org/patch/9176121/).
> 
> Thanks,
> David
> 

What do you think?

Thanks,
Alexey