[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <c53a14b2-4745-eb2c-b5e3-582479ae5501@linux.intel.com>
Date: Thu, 11 May 2017 15:13:03 +0300
From: Alexey Budankov <alexey.budankov@...ux.intel.com>
To: kan.liang@...el.com, linux-kernel@...r.kernel.org, x86@...nel.org,
mingo@...hat.com, tglx@...utronix.de, ak@...ux.intel.com,
peterz@...radead.org, bp@...e.de,
srinivas.pandruvada@...ux.intel.com, dave.hansen@...ux.intel.com,
vikas.shivappa@...ux.intel.com, mark.rutland@....com,
acme@...nel.org, vince@...ter.net, pjt@...gle.com,
eranian@...gle.com, Dmitry.Prohorov@...el.com,
Valery.Cherepennikov@...el.com
Subject: Fwd: Re: [RFC 0/6] optimize ctx switch with rb-tree
-------- Forwarded Message --------
Subject: Re: [RFC 0/6] optimize ctx switch with rb-tree
Date: Thu, 11 May 2017 14:53:48 +0300
From: Alexey Budankov <alexey.budankov@...ux.intel.com>
Organization: Intel Corp.
To: davidcc@...gle.com
CC: alexey.budankov@...ux.intel.com
On 02.05.2017 23:59, Budankov, Alexey wrote:
> Subject: Re: [RFC 0/6] optimize ctx switch with rb-tree
>
> On Wed, Apr 26, 2017 at 3:34 AM, Budankov, Alexey <alexey.budankov@...el.com> wrote:
>> Hi David,
>>
>> I would like to take over on the patches development relying on your help with reviews.
>
> Sounds good.
Hi,
Sorry for long reply due to Russian holidays joined with vacations.
So, I see that event_sched_out() function (4.11.0-rc6+) additionally to
disabling an active (PERF_EVENT_STATE_ACTIVE) event in HW also performs
updates of tstamp fields for inactive (PERF_EVENT_STATE_INACTIVE) events
assigned to "the other" cpus (different from the one that is executing
the function):
static void
event_sched_out(struct perf_event *event,
struct perf_cpu_context *cpuctx,
struct perf_event_context *ctx)
{
u64 tstamp = perf_event_time(event);
u64 delta;
WARN_ON_ONCE(event->ctx != ctx);
lockdep_assert_held(&ctx->lock);
/*
* An event which could not be activated because of
* filter mismatch still needs to have its timings
* maintained, otherwise bogus information is return
* via read() for time_enabled, time_running:
*/
-> if (event->state == PERF_EVENT_STATE_INACTIVE &&
-> !event_filter_match(event)) {
-> delta = tstamp - event->tstamp_stopped;
-> event->tstamp_running += delta;
-> event->tstamp_stopped = tstamp;
-> }
->
-> if (event->state != PERF_EVENT_STATE_ACTIVE)
-> return;
I suggest moving this updating work to the context of
perf_event_task_tick() callback. It goes per-cpu with 1ms interval by
default so the inactive events will get their tstamp fields updated
aside of this critical path:
event_sched_out()
group_sched_out()
ctx_sched_out()
perf_rotate_context()
perf_mux_hrtimer_handler()
This change will shorten performance critical processing to active
events only as the amount of the events is always HW-limited.
>
>> Could you provide me with the cumulative patch set to expedite the ramp up?
>
> This RFC is my latest version. I did not have a good solution on how to solve the problem of handling failure of PMUs that share contexts, and to activate/inactivate them.
>
> Some things to keep in mind when dealing with task-contexts are:
> 1. The number of PMUs is large and growing, iterating over all PMUs may be expensive (see https://lkml.org/lkml/2017/1/18/859 ).
> 2. event_filter_match in this RFC is only used because I did not find a better ways to filter out events with the rb-tree. It would be nice if we wouldn't have to check event->cpu != -1 && event->cpu ==
> smp_processor_id() and cgroup stuff for every event in task contexts.
Checking an event for cpu affinity will be avoided, at least for the
critical path mentioned above.
> 3. I used the inactive events list in this RFC as a cheaper alternative to threading the rb-tree but it has the problem that events that are removed due to conflict would be placed at the end of the list even if didn't run. I cannot recall if that ever happens. > Using this list also causes problem (2.) maybe threading the tree isa better alternative?
The simplest RB-tree key for the change being suggested could be like
{event->cpu, event->id} so events could be organized into per-cpu
sub-trees for fast search and enumeration. All software event could be
grouped under event->cpu == -1.
> 4. Making the key in task-events to be {PMU,CPU,last_time_scheduled} (as opposed to {CPU,last_time_scheduled} in the RFC) may simplify sched in by helping to iterate over all events in same PMU at once, simplifying the activation/inactivation of the PMU and making it simple to move to the next PMU on pmu::add errors. The problem with this approach is to find only the PMUs with inactive events without traversing a list of all PMUs. Maybe a per-context list of active PMUs may help (see 1.).
>
> cpu-contexts are much simpler and I think work well with what the RFC does (they are per-pmu already).
>
> This thread has Peter and Mark's original discussion of the rb-tree (https://patchwork.kernel.org/patch/9176121/).
>
> Thanks,
> David
>
What do you think?
Thanks,
Alexey
Powered by blists - more mailing lists