[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALcN6mgbT2j_sb4L-SXt9KNvZUk6pE9hRoHtPn_quVUEaixxXQ@mail.gmail.com>
Date: Wed, 31 May 2017 14:33:02 -0700
From: David Carrillo-Cisneros <davidcc@...gle.com>
To: Alexey Budankov <alexey.budankov@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Andi Kleen <ak@...ux.intel.com>,
Kan Liang <kan.liang@...el.com>,
Dmitri Prokhorov <Dmitry.Prohorov@...el.com>,
Valery Cherepennikov <valery.cherepennikov@...el.com>,
Stephane Eranian <eranian@...gle.com>,
Mark Rutland <mark.rutland@....com>,
linux-kernel <linux-kernel@...r.kernel.org>,
Arun Kalyanasundaram <arunkaly@...gle.com>
Subject: Re: [PATCH v2]: perf/core: addressing 4x slowdown during per-process,
profiling of STREAM benchmark on Intel Xeon Phi
On Sat, May 27, 2017 at 4:19 AM, Alexey Budankov
<alexey.budankov@...ux.intel.com> wrote:
> Motivation:
>
> The issue manifests like 4x slowdown when profiling single thread STREAM
> benchmark on Intel Xeon Phi running RHEL7.2 (Intel MPSS distribution).
> Perf profiling is done in per-process mode and involves about 30 core
> events. In case the benchmark is OpenMP based and runs under profiling in
> 272 threads the overhead becomes even more dramatic: 512.144s against
> 36.192s (with this patch).
How long does it take without any perf monitoring? Could you provide
more details about the benchmark? how many CPUs are being monitored?
SNIP
> different from the one executing the handler. Additionally for every
> filtered out group group_sched_out() updates tstamps values to the current
> interrupt time. This updating work is now done only once by
> update_context_time() called by ctx_sched_out() before cpu groups
> iteration.
I don't see this. e.g.:
in your patch task_ctx_sched_out calls ctx_sched_out with mux == 0,
that path does the exact same thing as before your patch.
I understand why you want to move the event's times to a different
structure and keep a pointer in the event, but I don't see that you
are avoiding the update of the times of unscheduled events.
>
> static u64 perf_event_time(struct perf_event *event)
> @@ -1424,16 +1428,16 @@ static void update_event_times(struct perf_event
> *event)
> else if (ctx->is_active)
> run_end = ctx->time;
> else
> - run_end = event->tstamp_stopped;
> + run_end = event->tstamp->stopped;
>
> - event->total_time_enabled = run_end - event->tstamp_enabled;
> + event->total_time_enabled = run_end - event->tstamp->enabled;
>
> if (event->state == PERF_EVENT_STATE_INACTIVE)
> - run_end = event->tstamp_stopped;
> + run_end = event->tstamp->stopped;
> else
> run_end = perf_event_time(event);
>
> - event->total_time_running = run_end - event->tstamp_running;
> + event->total_time_running = run_end - event->tstamp->running;
FWICT, this is run for ALL events in context with matching CPU.
SNIP
> }
> @@ -3051,9 +3277,9 @@ void __perf_event_task_sched_out(struct task_struct
> *task,
> * Called with IRQs disabled
> */
> static void cpu_ctx_sched_out(struct perf_cpu_context *cpuctx,
> - enum event_type_t event_type)
> + enum event_type_t event_type, int mux)
> {
> - ctx_sched_out(&cpuctx->ctx, cpuctx, event_type);
> + ctx_sched_out(&cpuctx->ctx, cpuctx, event_type, mux);
> }
>
> static void
> @@ -3061,29 +3287,8 @@ ctx_pinned_sched_in(struct perf_event_context *ctx,
> struct perf_cpu_context *cpuctx)
> {
> struct perf_event *event;
> -
> - list_for_each_entry(event, &ctx->pinned_groups, group_entry) {
> - if (event->state <= PERF_EVENT_STATE_OFF)
> - continue;
> - if (!event_filter_match(event))
> - continue;
Could we remove or simplify the tests in event_filter_match since the
rb-tree filters by cpu?
> -
> - /* may need to reset tstamp_enabled */
> - if (is_cgroup_event(event))
> - perf_cgroup_mark_enabled(event, ctx);
> -
> - if (group_can_go_on(event, cpuctx, 1))
> - group_sched_in(event, cpuctx, ctx);
> -
> - /*
> - * If this pinned group hasn't been scheduled,
> - * put it in error state.
> - */
> - if (event->state == PERF_EVENT_STATE_INACTIVE) {
> - update_group_times(event);
> - event->state = PERF_EVENT_STATE_ERROR;
> - }
> - }
> + list_for_each_entry(event, &ctx->pinned_groups, group_entry)
> + ctx_sched_in_pinned_group(ctx, cpuctx, event);
> }
>
> static void
> @@ -3092,37 +3297,19 @@ ctx_flexible_sched_in(struct perf_event_context
> *ctx,
> {
> struct perf_event *event;
> int can_add_hw = 1;
> -
> - list_for_each_entry(event, &ctx->flexible_groups, group_entry) {
> - /* Ignore events in OFF or ERROR state */
> - if (event->state <= PERF_EVENT_STATE_OFF)
> - continue;
> - /*
> - * Listen to the 'cpu' scheduling filter constraint
> - * of events:
> - */
> - if (!event_filter_match(event))
> - continue;
same as before.
Powered by blists - more mailing lists