linux-kernel - Re: [PATCH v3 5/7] MIPS: perf: Allocate per-core counters on demand

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 16 May 2018 19:05:20 +0100
From:   James Hogan <jhogan@...nel.org>
To:     Matt Redfearn <matt.redfearn@...s.com>
Cc:     Ralf Baechle <ralf@...ux-mips.org>,
        Florian Fainelli <f.fainelli@...il.com>,
        linux-mips@...ux-mips.org, Namhyung Kim <namhyung@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>
Subject: Re: [PATCH v3 5/7] MIPS: perf: Allocate per-core counters on demand

On Fri, Apr 20, 2018 at 11:23:07AM +0100, Matt Redfearn wrote:
> Previously when performance counters are per-core, rather than
> per-thread, the number available were divided by 2 on detection, and the
> counters used by each thread in a core were "swizzled" to ensure
> separation. However, this solution is suboptimal since it relies on a
> couple of assumptions:
> a) Always having 2 VPEs / core (number of counters was divided by 2)
> b) Always having a number of counters implemented in the core that is
>    divisible by 2. For instance if an SoC implementation had a single
>    counter and 2 VPEs per core, then this logic would fail and no
>    performance counters would be available.
> The mechanism also does not allow for one VPE in a core using more than
> it's allocation of the per-core counters to count multiple events even
> though other VPEs may not be using them.
> 
> Fix this situation by instead allocating (and releasing) per-core
> performance counters when they are requested. This approach removes the
> above assumptions and fixes the shortcomings.
> 
> In order to do this:
> Add additional logic to mipsxx_pmu_alloc_counter() to detect if a
> sibling is using a per-core counter, and to allocate a per-core counter
> in all sibling CPUs.
> Similarly, add a mipsxx_pmu_free_counter() function to release a
> per-core counter in all sibling CPUs when it is finished with.
> A new spinlock, core_counters_lock, is introduced to ensure exclusivity
> when allocating and releasing per-core counters.
> Since counters are now allocated per-core on demand, rather than being
> reserved per-thread at boot, all of the "swizzling" of counters is
> removed.
> 
> The upshot is that in an SoC with 2 counters / thread, counters are
> reported as:
> Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
> available to each CPU, irq 18
> 
> Running an instance of a test program on each of 2 threads in a
> core, both threads can use their 2 counters to count 2 events:
> 
> taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
> perf stat -e instructions:u,branches:u ./test_prog
> 
>  Performance counter stats for './test_prog':
> 
>              30002      instructions:u
>              10000      branches:u
> 
>        0.005164264 seconds time elapsed
>  Performance counter stats for './test_prog':
> 
>              30002      instructions:u
>              10000      branches:u
> 
>        0.006139975 seconds time elapsed
> 
> In an SoC with 2 counters / core (which can be forced by setting
> cpu_has_mipsmt_pertccounters = 0), counters are reported as:
> Performance counters: mips/interAptiv PMU enabled, 2 32-bit counters
> available to each core, irq 18
> 
> Running an instance of a test program on each of 2 threads in a
> core, now only one thread manages to secure the performance counters to
> count 2 events. The other thread does not get any counters.
> 
> taskset 4 perf stat -e instructions:u,branches:u ./test_prog & taskset 8
> perf stat -e instructions:u,branches:u ./test_prog
> 
>  Performance counter stats for './test_prog':
> 
>      <not counted>       instructions:u
>      <not counted>       branches:u
> 
>        0.005179533 seconds time elapsed
> 
>  Performance counter stats for './test_prog':
> 
>              30002      instructions:u
>              10000      branches:u
> 
>        0.005179467 seconds time elapsed
> 
> Signed-off-by: Matt Redfearn <matt.redfearn@...s.com>

While this sounds like an improvement in practice, being able to use
more counters on single threaded stuff than otherwise, I'm a little
concerned what would happen if a task was migrated to a different CPU
and the perf counters couldn't be obtained on the new CPU due to
counters already being in use. Would the values be incorrectly small?

Cheers
James

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)