[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKTKpr6m7YzqJ7U2icNHq7ZwoG0pw8ws_EHcLR+-T6ZeEfe15Q@mail.gmail.com>
Date: Fri, 23 Aug 2019 16:13:46 +0530
From: Ganapatrao Kulkarni <gklkml16@...il.com>
To: Ian Rogers <irogers@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
Namhyung Kim <namhyung@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Kan Liang <kan.liang@...ux.intel.com>,
Andi Kleen <ak@...ux.intel.com>,
Stephane Eranian <eranian@...gle.com>,
Ganapatrao Kulkarni <gkulkarni@...vell.com>,
Jayachandran Chandrasekharan Nair <jnair@...vell.com>
Subject: Re: [PATCH] perf cgroups: Don't rotate events for cgroups unnecessarily
Hi,
We are seeing regression with our uncore perf driver(Marvell's
ThunderX2, ARM64 server platform) on 5.3-Rc1.
After bisecting, it turned out to be this patch causing the issue.
Test case:
Load module and run perf for more than 4 events( we have 4 counters,
event multiplexing takes place for more than 4 events), then unload
module.
With this sequence of testing, the system hangs(soft lockup) after 2
or 3 iterations. Same test runs for hours on 5.2.
while [ 1 ]
do
rmmod thunderx2_pmu
modprobe thunderx2_pmu
perf stat -a -e \
uncore_dmc_0/cnt_cycles/,\
uncore_dmc_0/data_transfers/,\
uncore_dmc_0/read_txns/,\
uncore_dmc_0/config=0xE/,\
uncore_dmc_0/write_txns/ sleep 1
sleep 2
done
Is this patch tested adequately?
On Fri, Jun 28, 2019 at 3:18 AM Ian Rogers <irogers@...gle.com> wrote:
>
> group_index On Mon, Jun 24, 2019 at 12:55 AM Peter Zijlstra
> <peterz@...radead.org> wrote:
> >
> > On Fri, Jun 21, 2019 at 11:01:29AM -0700, Ian Rogers wrote:
> > > On Fri, Jun 21, 2019 at 1:24 AM Peter Zijlstra <peterz@...radead.org> wrote:
> > > >
> > > > On Sat, Jun 01, 2019 at 01:27:22AM -0700, Ian Rogers wrote:
> > > > > @@ -3325,6 +3331,15 @@ static int flexible_sched_in(struct perf_event *event, void *data)
> > > > > sid->can_add_hw = 0;
> > > > > }
> > > > >
> > > > > + /*
> > > > > + * If the group wasn't scheduled then set that multiplexing is necessary
> > > > > + * for the context. Note, this won't be set if the event wasn't
> > > > > + * scheduled due to event_filter_match failing due to the earlier
> > > > > + * return.
> > > > > + */
> > > > > + if (event->state == PERF_EVENT_STATE_INACTIVE)
> > > > > + sid->ctx->rotate_necessary = 1;
> > > > > +
> > > > > return 0;
> > > > > }
> > > >
> > > > That looked odd; which had me look harder at this function which
> > > > resulted in the below. Should we not terminate the context interation
> > > > the moment one flexible thingy fails to schedule?
> > >
> > > If we knew all the events were hardware events then this would be
> > > true, as there may be software events that always schedule then the
> > > continued iteration is necessary.
> >
> > But this is the 'old' code, where this is guaranteed by the context.
> > That is, if this is a hardware context; there wil only be software
> > events due to them being in a group with hardware events.
> >
> > If this is a software group, then we'll never fail to schedule and we'll
> > not get in this branch to begin with.
> >
> > Or am I now confused for having been staring at two different code-bases
> > at the same time?
>
> I believe you're right and I'd overlooked this. I think there is a
> more efficient version of this code possible that I can follow up
> with. There are 3 perf_event_contexts, potentially a sw and hw context
> within the task_struct and one per-CPU in perf_cpu_context. With this
> change I'm focussed on improving rotation of cgroup events that appear
> system wide within the per-CPU context. Without cgroup events the
> system wide events don't need to be iterated over during scheduling
> in. The branch that can set rotate_necessary will only be necessary
> for the task hardware events. For system wide events, considered with
> cgroup mode scheduling in, the branch is necessary as rotation may be
> necessary. It'd be possible to have variants of flexible_sched_in that
> behave differently in the task software event context, and the system
> wide and task hardware contexts.
>
> I have a series of patches related to Kan Liang's cgroup
> perf_event_groups improvements. I'll mail these out and see if I can
> avoid the branch in the task software event context case.
>
> Thanks,
> Ian
Thanks,
Ganapat
Powered by blists - more mailing lists