linux-kernel - Re: [PATCH] perf/core: Fix small negative period being ignored

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66cdc5f9-a09a-4841-8fca-252b7c78114b@intel.com>
Date: Fri, 19 Jan 2024 09:36:06 +0200
From: Adrian Hunter <adrian.hunter@...el.com>
To: Luo Gengkun <luogengkun2@...wei.com>
Cc: mingo@...hat.com, acme@...nel.org, mark.rutland@....com,
 alexander.shishkin@...ux.intel.com, jolsa@...nel.org, namhyung@...nel.org,
 irogers@...gle.com, linux-perf-users@...r.kernel.org,
 linux-kernel@...r.kernel.org, peterz@...radead.org
Subject: Re: [PATCH] perf/core: Fix small negative period being ignored

On 16/01/24 10:39, Luo Gengkun wrote:
> In perf_adjust_period, we will first calculate period, and then use
> this period to calculate delta. However, when delta is less than 0,
> there will be a deviation compared to when delta is greater than or
> equal to 0. For example, when delta is in the range of [-14,-1], the
> range of delta = delta + 7 is between [-7,6], so the final value of
> delta/8 is 0. Therefore, the impact of -1 and -2 will be ignored.
> This is unacceptable when the target period is very short, because
> we will lose a lot of samples.
> 
> Here are some tests and analyzes:
> before:
>   # perf record -e cs -F 1000  ./a.out
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 0.022 MB perf.data (518 samples) ]
> 
>   # perf script
>   ...
>   a.out     396   257.956048:         23 cs:  ffffffff81f4eeec schedul>
>   a.out     396   257.957891:         23 cs:  ffffffff81f4eeec schedul>
>   a.out     396   257.959730:         23 cs:  ffffffff81f4eeec schedul>
>   a.out     396   257.961545:         23 cs:  ffffffff81f4eeec schedul>
>   a.out     396   257.963355:         23 cs:  ffffffff81f4eeec schedul>
>   a.out     396   257.965163:         23 cs:  ffffffff81f4eeec schedul>
>   a.out     396   257.966973:         23 cs:  ffffffff81f4eeec schedul>
>   a.out     396   257.968785:         23 cs:  ffffffff81f4eeec schedul>
>   a.out     396   257.970593:         23 cs:  ffffffff81f4eeec schedul>
>   ...
> 
> after:
>   # perf record -e cs -F 1000  ./a.out
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 0.058 MB perf.data (1466 samples) ]
> 
>   # perf script
>   ...
>   a.out     395    59.338813:         11 cs:  ffffffff81f4eeec schedul>
>   a.out     395    59.339707:         12 cs:  ffffffff81f4eeec schedul>
>   a.out     395    59.340682:         13 cs:  ffffffff81f4eeec schedul>
>   a.out     395    59.341751:         13 cs:  ffffffff81f4eeec schedul>
>   a.out     395    59.342799:         12 cs:  ffffffff81f4eeec schedul>
>   a.out     395    59.343765:         11 cs:  ffffffff81f4eeec schedul>
>   a.out     395    59.344651:         11 cs:  ffffffff81f4eeec schedul>
>   a.out     395    59.345539:         12 cs:  ffffffff81f4eeec schedul>
>   a.out     395    59.346502:         13 cs:  ffffffff81f4eeec schedul>
>   ...
> 
> test.c
> 
> int main() {
>         for (int i = 0; i < 20000; i++)
>                 usleep(10);
> 
>         return 0;
> }
> 
>   # time ./a.out
>   real    0m1.583s
>   user    0m0.040s
>   sys     0m0.298s
> 
> The above results were tested on x86-64 qemu with KVM enabled using
> test.c as test program. Ideally, we should have around 1500 samples,
> but the previous algorithm had only about 500, whereas the modified
> algorithm now has about 1400. Further more, the new version shows 1
> sample per 0.001s, while the previous one is 1 sample per 0.002s.This
> indicates that the new algorithm is more sensitive to small negative
> values compared to old algorithm.
> 
> Fixes: bd2b5b12849a ("perf_counter: More aggressive frequency adjustment")
> Signed-off-by: Luo Gengkun <luogengkun2@...wei.com>

It seems better, and the maths makes sense, so:

Reviewed-by: Adrian Hunter <adrian.hunter@...el.com>


But the test case still seems to give unexpected results. Usually:

  # time taskset --cpu 1 ./test
  real    0m 1.25s
  user    0m 0.03s
  sys     0m 0.00
  # taskset --cpu 0 perf record -F 1000 -e cs -- taskset --cpu 1 ./test
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.051 MB perf.data (1290 samples) ]

But occasionally:

  # taskset --cpu 0 perf record -F 1000 -e cs -- taskset --cpu 1 ./test
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.010 MB perf.data (204 samples) ]
  # perf script
  ...
  test   865   265.377846:         16 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.378900:         15 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.379845:         14 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.380770:         14 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.381647:         15 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.382638:         16 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.383647:         16 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.384704:         15 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.385649:         14 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.386578:        152 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.396383:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.406183:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.415839:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.425445:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.435052:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.444708:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.454314:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.463970:        154 cs:  ffffffff832e927b schedule+0x2b
  test   865   265.473577:        154 cs:  ffffffff832e927b schedule+0x2b
  ...



> ---
>  kernel/events/core.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 683dc086ef10..cad50d3439f1 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4078,7 +4078,11 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo
>  	period = perf_calculate_period(event, nsec, count);
>  
>  	delta = (s64)(period - hwc->sample_period);
> -	delta = (delta + 7) / 8; /* low pass filter */
> +	if (delta >= 0)
> +		delta += 7;
> +	else
> +		delta -= 7;
> +	delta /= 8; /* low pass filter */
>  
>  	sample_period = hwc->sample_period + delta;
>