linux-kernel - Re: [PATCH] perf/core: Introduce cpuctx->cgrp_ctx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM9d7cizC0J85ByuF5fBmc_Bqi=wpNJpiVsw+3F1Avusn2aQog@mail.gmail.com>
Date:   Wed, 4 Oct 2023 09:32:24 -0700
From:   Namhyung Kim <namhyung@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Stephane Eranian <eranian@...gle.com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Ravi Bangoria <ravi.bangoria@....com>, stable@...r.kernel.org
Subject: Re: [PATCH] perf/core: Introduce cpuctx->cgrp_ctx_list

Hi Peter,

On Wed, Oct 4, 2023 at 9:02 AM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Tue, Oct 03, 2023 at 09:08:44PM -0700, Namhyung Kim wrote:
>
> > But after the change, it ended up iterating all pmus/events in the cpu
> > context if there's a cgroup event somewhere on the cpu context.
> > Unfortunately it includes uncore pmus which have much longer latency to
> > control.
>
> Can you describe the problem in more detail please?

Sure.

>
> We have cgrp as part of the tree key: {cpu, pmu, cgroup, idx},
> so it should be possible to find a specific cgroup for a cpu and or skip
> to the next cgroup on that cpu in O(log n) time.

This is about a single (core) pmu when it has a lot of events.
But this problem is different, it's about accessing more pmus
unnecessarily.

Say we have the following events for CPU 0.

  sw: context-switches
  core: cycles, cycles-for-cgroup-A
  uncore: whatever

The cpu context has a cgroup event so it needs to call
perf_cgroup_switch() at every context switch.  But actually
it only needs to resched the 'core' pmu since it only has a
cgroup event.  Other pmu events (like context-switches or
any uncore event) should not be bothered by that.

But perf_cgroup_switch() calls the general functions which
iterate all pmus in the (cpu) context.

  cpuctx.ctx.pmu_ctx_list:
    +-> sw -> core -> uncore  (pmu_ctx_entry)

Then it disables pmus, sched-out current events, switch
cgroup pointer, sched-in new events and enable pmus.
This gives a lot more overhead when it has uncore pmus
since accessing MSRs for uncore pmus has longer latency.
But uncore pmus cannot have cgroup events in the first
place.

So we need a separate list to keep pmus that have
active cgroup events.

  cpuctx.cgrp_ctx_list:
    +-> core  (cgrp_ctx_entry)

And we also need a logic to do the same work only
for this list.

Hope this helps.

>
> > To fix the issue, I restored a linked list equivalent to cgrp_cpuctx_list
> > in the perf_cpu_context and link perf_cpu_pmu_contexts that have cgroup
> > events only.  Also add new helpers to enable/disable and does ctx sched
> > in/out for cgroups.
>
> Adding a list and duplicating the whole scheduling infrastructure seems
> 'unfortunate' at best.

Yeah, I know.. but I couldn't come up with a better solution.

Thanks,
Namhyung