linux-kernel - Re: [PATCH] perf cgroups: Don't rotate events for cgroups unnecessarily

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKTKpr6m7YzqJ7U2icNHq7ZwoG0pw8ws_EHcLR+-T6ZeEfe15Q@mail.gmail.com>
Date:   Fri, 23 Aug 2019 16:13:46 +0530
From:   Ganapatrao Kulkarni <gklkml16@...il.com>
To:     Ian Rogers <irogers@...gle.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Andi Kleen <ak@...ux.intel.com>,
        Stephane Eranian <eranian@...gle.com>,
        Ganapatrao Kulkarni <gkulkarni@...vell.com>,
        Jayachandran Chandrasekharan Nair <jnair@...vell.com>
Subject: Re: [PATCH] perf cgroups: Don't rotate events for cgroups unnecessarily

Hi,

We are seeing regression with our uncore perf driver(Marvell's
ThunderX2, ARM64 server platform) on 5.3-Rc1.
After bisecting, it turned out to be this patch causing the issue.

Test case:
Load module and run perf for more than 4 events( we have 4 counters,
event multiplexing takes place for more than 4 events), then unload
module.
With this sequence of testing, the system hangs(soft lockup) after 2
or 3 iterations. Same test runs for hours on 5.2.

while [ 1 ]
do
        rmmod thunderx2_pmu
        modprobe thunderx2_pmu
        perf stat -a -e \
        uncore_dmc_0/cnt_cycles/,\
        uncore_dmc_0/data_transfers/,\
        uncore_dmc_0/read_txns/,\
        uncore_dmc_0/config=0xE/,\
        uncore_dmc_0/write_txns/ sleep 1
        sleep 2
done

Is this patch tested adequately?

On Fri, Jun 28, 2019 at 3:18 AM Ian Rogers <irogers@...gle.com> wrote:
>
> group_index On Mon, Jun 24, 2019 at 12:55 AM Peter Zijlstra
> <peterz@...radead.org> wrote:
> >
> > On Fri, Jun 21, 2019 at 11:01:29AM -0700, Ian Rogers wrote:
> > > On Fri, Jun 21, 2019 at 1:24 AM Peter Zijlstra <peterz@...radead.org> wrote:
> > > >
> > > > On Sat, Jun 01, 2019 at 01:27:22AM -0700, Ian Rogers wrote:
> > > > > @@ -3325,6 +3331,15 @@ static int flexible_sched_in(struct perf_event *event, void *data)
> > > > >                       sid->can_add_hw = 0;
> > > > >       }
> > > > >
> > > > > +     /*
> > > > > +      * If the group wasn't scheduled then set that multiplexing is necessary
> > > > > +      * for the context. Note, this won't be set if the event wasn't
> > > > > +      * scheduled due to event_filter_match failing due to the earlier
> > > > > +      * return.
> > > > > +      */
> > > > > +     if (event->state == PERF_EVENT_STATE_INACTIVE)
> > > > > +             sid->ctx->rotate_necessary = 1;
> > > > > +
> > > > >       return 0;
> > > > >  }
> > > >
> > > > That looked odd; which had me look harder at this function which
> > > > resulted in the below. Should we not terminate the context interation
> > > > the moment one flexible thingy fails to schedule?
> > >
> > > If we knew all the events were hardware events then this would be
> > > true, as there may be software events that always schedule then the
> > > continued iteration is necessary.
> >
> > But this is the 'old' code, where this is guaranteed by the context.
> > That is, if this is a hardware context; there wil only be software
> > events due to them being in a group with hardware events.
> >
> > If this is a software group, then we'll never fail to schedule and we'll
> > not get in this branch to begin with.
> >
> > Or am I now confused for having been staring at two different code-bases
> > at the same time?
>
> I believe you're right and I'd overlooked this. I think there is a
> more efficient version of this code possible that I can follow up
> with. There are 3 perf_event_contexts, potentially a sw and hw context
> within the task_struct and one per-CPU in perf_cpu_context. With this
> change I'm focussed on improving rotation of cgroup events that appear
> system wide within the per-CPU context. Without cgroup events the
> system wide events don't need to be iterated over during scheduling
> in. The branch that can set rotate_necessary will only be necessary
> for the task hardware events. For system wide events, considered with
> cgroup mode scheduling in, the branch is necessary as rotation may be
> necessary. It'd be possible to have variants of flexible_sched_in that
> behave differently in the task software event context, and the system
> wide and task hardware contexts.
>
> I have a series of patches related to Kan Liang's cgroup
> perf_event_groups improvements. I'll mail these out and see if I can
> avoid the branch in the task software event context case.
>
> Thanks,
> Ian

Thanks,
Ganapat