[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM9d7ci9PpT3A7AC65dmZbqPM8V1wGzvV_9hbdDKbRK=7q0j2Q@mail.gmail.com>
Date: Mon, 5 Aug 2024 23:25:50 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: Mingwei Zhang <mizhang@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
Ravi Bangoria <ravi.bangoria@....com>, Kan Liang <kan.liang@...ux.intel.com>,
Stephane Eranian <eranian@...gle.com>, Ian Rogers <irogers@...gle.com>
Subject: Re: [PATCH] perf/core: Optimize event reschedule for a PMU
Hi Mingwei,
On Mon, Aug 5, 2024 at 9:57 AM Mingwei Zhang <mizhang@...gle.com> wrote:
>
> On Tue, Jul 30, 2024 at 12:19 PM Namhyung Kim <namhyung@...nel.org> wrote:
> >
> > Current ctx_resched() reschedules every events in all PMUs in the
> > context even if it only needs to do it for a single event. This is the
> > case when it opens a new event or enables an existing one. What we
> > want is to reschedule events in the PMU only. Also perf_pmu_resched()
> > currently calls ctx_resched() without PMU information.
> >
> > Let's add __perf_pmu_resched() to do the work for the given PMU only.
> > The context time should be updated by ctx_sched_{out,in}(EVENT_TIME)
> > outside from it. And change the __pmu_ctx_sched_in() to be symmetrical
> > to the _sched_out() for its arguments so that it can be called easily
> > in the __perf_pmu_resched().
> >
> > Note that __perf_install_in_context() should call ctx_resched() for the
> > very first event in the context in order to set ctx->is_active. Later
> > events can be handled by __perf_pmu_resched().
> >
> > Care should be taken when it installs a task event for a PMU and
> > there's no CPU event for the PMU. __perf_pmu_resched() will ask the
> > CPU PMU context to schedule any events in it according to the group
> > info. But as the PMU context was not activated, it didn't set the
> > event context pointer. So I added new NULL checks in the
> > __pmu_ctx_sched_{in,out}.
> >
> > With this change I can get 4x speed up (but actually it's proportional
> > to the number of uncore PMU events) on a 2-socket Intel EMR machine in
> > opening and closing a perf event for the core PMU in a loop while there
> > are a bunch of uncore PMU events active on the CPU. The test code
> > (stress-pmu) follows below.
> >
> > Before)
> > # ./stress-pmu
> > delta: 0.087068 sec (870 usec/op)
>
> Hi Namhyung,
>
> I wonder how I could test the performance boost on the virtualized
> environment. So, I assume this will have a better performance by
> reducing the number of wrmsrs to event selectors and counters?
Right.
>
> I wonder if I need to run multiple instances of stress-pmu to increase
> the number of PMU context switches?
Yep, I think it'd work. Basically anything that opens more events in
different PMUs. But make sure the vcpu thread is running on the
affected CPU (60 in my test).
Thanks,
Namhyung
Powered by blists - more mailing lists