linux-kernel - Re: [PATCH RESEND v2 2/2] perf/core: Fix incorrected time diff in tick adjust period

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c667516-b5a2-48e1-afd8-e45906e69053@intel.com>
Date: Mon, 17 Jun 2024 16:42:13 +0300
From: Adrian Hunter <adrian.hunter@...el.com>
To: Luo Gengkun <luogengkun@...weicloud.com>, peterz@...radead.org
Cc: mingo@...hat.com, acme@...nel.org, mark.rutland@....com,
 alexander.shishkin@...ux.intel.com, jolsa@...nel.org, namhyung@...nel.org,
 irogers@...gle.com, linux-perf-users@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH RESEND v2 2/2] perf/core: Fix incorrected time diff in
 tick adjust period

On 17/04/24 14:54, Luo Gengkun wrote:
> Adrian found that there is a probability that the number of samples
> is small, which is caused by the unreasonable large sampling period.
> 
>  # taskset --cpu 0 perf record -F 1000 -e cs -- taskset --cpu 1 ./test
>  [ perf record: Woken up 1 times to write data ]
>  [ perf record: Captured and wrote 0.010 MB perf.data (204 samples) ]
>  # perf script
>  ...
>  test   865   265.377846:         16 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.378900:         15 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.379845:         14 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.380770:         14 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.381647:         15 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.382638:         16 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.383647:         16 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.384704:         15 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.385649:         14 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.386578:        152 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.396383:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.406183:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.415839:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.425445:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.435052:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.444708:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.454314:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.463970:        154 cs:  ffffffff832e927b schedule+0x2b
>  test   865   265.473577:        154 cs:  ffffffff832e927b schedule+0x2b
>  ...
> 
> It seems that the Timer Interrupts is not coming every TICK_NSEC when

No, the period is not adjusted unless the event is active i.e. scheduled in.
So an event in a task context where the task sleeps a lot will
likely not be adjusted every tick.

> system is idle. For example, counter increase n during 2 * TICK_NSEC,
> and it call perf_adjust_period using n and TICK_NSEC, so the final period
> we calculated will be bigger than expected one. What's more, if the
> the overflow time is larger than 2 * TICK_NSEC we cannot tune the period
> using __perf_event_account_interrupt. To fix this problem, perf can
> calculate the tick interval by itself.

Yes, the period can get stuck being too big:

	perf_adjust_freq_unthr_events() calculates a value that is
	too big because it incorrectly assumes the count has
	accumulated only since the last tick, whereas it can have
	been much longer.

	__perf_event_account_interrupt() has an unexplained limit
	(2*TICK_NSEC) on the count delta, and won't adjust the
	period if that is exceeded.

> 
> Signed-off-by: Luo Gengkun <luogengkun@...weicloud.com>
> ---
>  include/linux/perf_event.h |  1 +
>  kernel/events/core.c       | 15 ++++++++++++---
>  2 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index afb028c54f33..2708f1d0692c 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -265,6 +265,7 @@ struct hw_perf_event {
>  	 * State for freq target events, see __perf_event_overflow() and
>  	 * perf_adjust_freq_unthr_context().
>  	 */
> +	u64				freq_tick_stamp;
>  	u64				freq_time_stamp;
>  	u64				freq_count_stamp;
>  #endif
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index cad50d3439f1..0f2025d631aa 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4112,7 +4112,7 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>  {
>  	struct perf_event *event;
>  	struct hw_perf_event *hwc;
> -	u64 now, period = TICK_NSEC;
> +	u64 now, period, tick_stamp;
>  	s64 delta;
>  
>  	/*
> @@ -4151,6 +4151,10 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>  		 */
>  		event->pmu->stop(event, PERF_EF_UPDATE);
>  
> +		tick_stamp = perf_clock();

Perhaps jiffies would work just as well, but be
more efficient.

> +		period = tick_stamp - hwc->freq_tick_stamp;
> +		hwc->freq_tick_stamp = tick_stamp;
> +
>  		now = local64_read(&event->count);
>  		delta = now - hwc->freq_count_stamp;
>  		hwc->freq_count_stamp = now;
> @@ -4162,8 +4166,13 @@ perf_adjust_freq_unthr_context(struct perf_event_context *ctx, bool unthrottle)
>  		 * to perf_adjust_period() to avoid stopping it
>  		 * twice.
>  		 */
> -		if (delta > 0)
> -			perf_adjust_period(event, period, delta, false);
> +		if (delta > 0) {
> +			/*
> +			 * we skip first tick adjust period
> +			 */
> +			if (likely(period != tick_stamp))
> +				perf_adjust_period(event, period, delta, false);
> +		}
>  
>  		event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
>  	next: