linux-kernel - Re: [PATCH v3 2/2] perf/core: Fix incorrected time diff in tick adjust period

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f9442d81-d92d-4b91-87cf-4beb4a41de22@intel.com>
Date: Wed, 21 Aug 2024 14:32:40 +0300
From: Adrian Hunter <adrian.hunter@...el.com>
To: Luo Gengkun <luogengkun@...weicloud.com>, peterz@...radead.org
Cc: mingo@...hat.com, acme@...nel.org, mark.rutland@....com,
 alexander.shishkin@...ux.intel.com, jolsa@...nel.org, namhyung@...nel.org,
 irogers@...gle.com, linux-perf-users@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 2/2] perf/core: Fix incorrected time diff in tick
 adjust period

On 10/08/24 13:24, Luo Gengkun wrote:
> Adrian found that there is a probability that the number of samples
> is small, which is caused by the unreasonable large sampling period.

Subject: incorrected -> incorrect

Note, the patch now needs to be re-based.

Also maybe tidy up the commit message e.g.

perf events has the notion of sampling frequency which is implemented in
software by dynamically adjusting the counter period so that samples occur
at approximately the target frequency.  Period adjustment is done in 2
places:
 - when the counter overflows (and a sample is recorded)
 - each timer tick, when the event is active
The later case is slightly flawed because it assumes that the time since
the last timer-tick period adjustment is 1 tick, whereas the event may not
have been active (e.g. for a task that is sleeping).

Fix by using jiffies to determine the elapsed time in that case.

> 
>  # taskset --cpu 0 perf record -F 1000 -e cs -- taskset --cpu 1 ./test
>  [ perf record: Woken up 1 times to write data ]
>  [ perf record: Captured and wrote 0.010 MB perf.data (204 samples) ]
>  # perf script
>  ...
>  test   865   265.377846:         16 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.378900:         15 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.379845:         14 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.380770:         14 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.381647:         15 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.382638:         16 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.383647:         16 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.384704:         15 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.385649:         14 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.386578:        152 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.396383:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.406183:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.415839:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.425445:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.435052:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.444708:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.454314:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.463970:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.473577:        154 cs:  ffffffff832e927b schedule+0x2b
>  ...
> 
> And the reason is perf_adjust_freq_unthr_events() calculates a value that is too
> big because it incorrectly assumes the count has accumulated only since the last
> tick, whereas it can have been much longer. To fix this problem, perf can calculate
> the tick interval by itself. For perf_adjust_freq_unthr_events we can use jiffies
> to calculate the tick interval more efficiently, as sugguested by Adrian.
> 
> Signed-off-by: Luo Gengkun <luogengkun@...weicloud.com>
> ---
>  include/linux/perf_event.h |  1 +
>  kernel/events/core.c       | 16 +++++++++++++---
>  2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index afb028c54f33..2708f1d0692c 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -265,6 +265,7 @@ struct hw_perf_event {
>  	 * State for freq target events, see __perf_event_overflow() and
>  	 * perf_adjust_freq_unthr_context().
>  	 */
> +	u64				freq_tick_stamp;
>  	u64				freq_time_stamp;
>  	u64				freq_count_stamp;
>  #endif
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index cad50d3439f1..309af5520f52 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -55,6 +55,7 @@
>  #include <linux/pgtable.h>
>  #include <linux/buildid.h>
>  #include <linux/task_work.h>
> +#include <linux/jiffies.h>
>  
>  #include "internal.h"
>  
> @@ -4112,7 +4113,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>  {
>  	struct perf_event *event;
>  	struct hw_perf_event *hwc;
> -	u64 now, period = TICK_NSEC;
> +	u64 now, period, tick_stamp;
>  	s64 delta;
>  
>  	/*
> @@ -4151,6 +4152,10 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>  		 */
>  		event->pmu->stop(event, PERF_EF_UPDATE);
>  
> +		tick_stamp = jiffies64_to_nsecs(get_jiffies_64());
> +		period = tick_stamp - hwc->freq_tick_stamp;
> +		hwc->freq_tick_stamp = tick_stamp;
> +
>  		now = local64_read(&event->count);
>  		delta = now - hwc->freq_count_stamp;
>  		hwc->freq_count_stamp = now;
> @@ -4162,8 +4167,13 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>  		 * to perf_adjust_period() to avoid stopping it
>  		 * twice.
>  		 */
> -		if (delta > 0)
> -			perf_adjust_period(event, period, delta, false);
> +		if (delta > 0) {
> +			/*
> +			 * we skip first tick adjust period
> +			 */

Could be a single line comment.

> +			if (likely(period != tick_stamp))

Kernel style is to combine if-statements if possible i.e.

		/* Skip if no delta or it is the first tick adjust period */
		if (delta > 0 && likely(period != tick_stamp))

> +				perf_adjust_period(event, period, delta, false);
> +		}
>  
>  		event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
>  	next: