linux-kernel - Re: [PATCH] perf/core: Introduce cpuctx->cgrp_ctx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZR0TtjhGT+Em+/ti@gmail.com>
Date:   Wed, 4 Oct 2023 09:26:46 +0200
From:   Ingo Molnar <mingo@...nel.org>
To:     Namhyung Kim <namhyung@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Stephane Eranian <eranian@...gle.com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Ravi Bangoria <ravi.bangoria@....com>, stable@...r.kernel.org
Subject: Re: [PATCH] perf/core: Introduce cpuctx->cgrp_ctx_list


* Namhyung Kim <namhyung@...nel.org> wrote:

> AFAIK we don't have a tool to measure the context switch overhead
> directly.  (I think I should add one to perf ftrace latency).  But I can
> see it with a simple perf bench command like this.
> 
>   $ perf bench sched pipe -l 100000
>   # Running 'sched/pipe' benchmark:
>   # Executed 100000 pipe operations between two processes
> 
>        Total time: 0.650 [sec]
> 
>          6.505740 usecs/op
>            153710 ops/sec
> 
> It runs two tasks communicate each other using a pipe so it should
> stress the context switch code.  This is the normal numbers on my
> system.  But after I run these two perf stat commands in background,
> the numbers vary a lot.
> 
>   $ sudo perf stat -a -e cycles -G user.slice -- sleep 100000 &
>   $ sudo perf stat -a -e uncore_imc/cas_count_read/ -- sleep 10000 &
> 
> I will show the last two lines of perf bench sched pipe output for
> three runs.
> 
>         58.597060 usecs/op    # run 1
>             17065 ops/sec
> 
>         11.329240 usecs/op    # run 2
>             88267 ops/sec
> 
>         88.481920 usecs/op    # run 3
>             11301 ops/sec
> 
> I think the deviation comes from the fact that uncore events are managed
> a certain number of cpus only.  If the target process runs on a cpu that
> manages uncore pmu, it'd take longer.  Otherwise it won't affect the
> performance much.

The numbers of pipe-message context switching will vary a lot depending on 
CPU migration patterns as well.

The best way to measure context-switch overhead is to pin that task
to a single CPU with something like:

   $ taskset 1 perf stat --null --repeat 10 perf bench sched pipe -l 10000 >/dev/null

   Performance counter stats for 'perf bench sched pipe -l 10000' (10 runs):

            0.049798 +- 0.000102 seconds time elapsed  ( +-  0.21% )

as you can see the 0.21% stddev is pretty low.

If we allow 2 CPUs, both runtime and stddev is much higher:

   $ taskset 3 perf stat --null --repeat 10 perf bench sched pipe -l 10000 >/dev/null

   Performance counter stats for 'perf bench sched pipe -l 10000' (10 runs):

              1.4835 +- 0.0383 seconds time elapsed  ( +-  2.58% )

Thanks,

	Ingo