[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eb37b77d-58ed-4b79-a942-7c249cb5050b@huaweicloud.com>
Date: Thu, 29 Aug 2024 22:19:45 +0800
From: Luo Gengkun <luogengkun@...weicloud.com>
To: "Liang, Kan" <kan.liang@...ux.intel.com>,
Adrian Hunter <adrian.hunter@...el.com>, peterz@...radead.org
Cc: mingo@...hat.com, acme@...nel.org, namhyung@...nel.org,
mark.rutland@....com, alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
irogers@...gle.com, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 2/2] perf/core: Fix incorrect time diff in tick adjust
period
On 2024/8/29 21:46, Liang, Kan wrote:
>
> On 2024-08-27 9:10 p.m., Adrian Hunter wrote:
>> On 27/08/24 23:06, Liang, Kan wrote:
>>>
>>> On 2024-08-27 1:16 p.m., Adrian Hunter wrote:
>>>> On 27/08/24 19:42, Liang, Kan wrote:
>>>>>
>>>>> On 2024-08-21 9:42 a.m., Luo Gengkun wrote:
>>>>>> Perf events has the notion of sampling frequency which is implemented in
>>>>>> software by dynamically adjusting the counter period so that samples occur
>>>>>> at approximately the target frequency. Period adjustment is done in 2
>>>>>> places:
>>>>>> - when the counter overflows (and a sample is recorded)
>>>>>> - each timer tick, when the event is active
>>>>>> The later case is slightly flawed because it assumes that the time since
>>>>>> the last timer-tick period adjustment is 1 tick, whereas the event may not
>>>>>> have been active (e.g. for a task that is sleeping).
>>>>>>
>>>>> Do you have a real-world example to demonstrate how bad it is if the
>>>>> algorithm doesn't take sleep into account?
>>>>>
>>>>> I'm not sure if introducing such complexity in the critical path is
>>>>> worth it.
>>>>>
>>>>>> Fix by using jiffies to determine the elapsed time in that case.
>>>>>>
>>>>>> Signed-off-by: Luo Gengkun <luogengkun@...weicloud.com>
>>>>>> ---
>>>>>> include/linux/perf_event.h | 1 +
>>>>>> kernel/events/core.c | 11 ++++++++---
>>>>>> 2 files changed, 9 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>>>>>> index 1a8942277dda..d29b7cf971a1 100644
>>>>>> --- a/include/linux/perf_event.h
>>>>>> +++ b/include/linux/perf_event.h
>>>>>> @@ -265,6 +265,7 @@ struct hw_perf_event {
>>>>>> * State for freq target events, see __perf_event_overflow() and
>>>>>> * perf_adjust_freq_unthr_context().
>>>>>> */
>>>>>> + u64 freq_tick_stamp;
>>>>>> u64 freq_time_stamp;
>>>>>> u64 freq_count_stamp;
>>>>>> #endif
>>>>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>>>>> index a9395bbfd4aa..86e80e3ef6ac 100644
>>>>>> --- a/kernel/events/core.c
>>>>>> +++ b/kernel/events/core.c
>>>>>> @@ -55,6 +55,7 @@
>>>>>> #include <linux/pgtable.h>
>>>>>> #include <linux/buildid.h>
>>>>>> #include <linux/task_work.h>
>>>>>> +#include <linux/jiffies.h>
>>>>>>
>>>>>> #include "internal.h"
>>>>>>
>>>>>> @@ -4120,7 +4121,7 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
>>>>>> {
>>>>>> struct perf_event *event;
>>>>>> struct hw_perf_event *hwc;
>>>>>> - u64 now, period = TICK_NSEC;
>>>>>> + u64 now, period, tick_stamp;
>>>>>> s64 delta;
>>>>>>
>>>>>> list_for_each_entry(event, event_list, active_list) {
>>>>>> @@ -4148,6 +4149,10 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
>>>>>> */
>>>>>> event->pmu->stop(event, PERF_EF_UPDATE);
>>>>>>
>>>>>> + tick_stamp = jiffies64_to_nsecs(get_jiffies_64());
>>>>> Seems it only needs to retrieve the time once at the beginning, not for
>>>>> each event.
>>>>>
>>>>> There is a perf_clock(). It's better to use it for the consistency.
>>>> perf_clock() is much slower, and for statistical sampling it doesn't
>>>> have to be perfect.
>>> Because of rdtsc?
>> Yes
> OK. I'm not worry about it too much as long as it's only invoked once in
> each tick.
>
>>> If it is only used here, it should be fine. What I'm worried about is
>>> that someone may use it with other timestamp in perf later. Anyway, it's
>>> not a big deal.
>>>
>>> The main concern I have is that do we really need the patch?
>> The current code is wrong.
>>
>>> It seems can only bring us a better guess of the period for the sleep
>>> test. Then we have to do all the calculate for each tick.
>> Or any workload that sleeps periodically.
>>
>> Another option is to remove the period adjust on tick entirely.
>> Although arguably the calculation at a tick is better because
>> it probably covers a longer period.
> Or we may remove the period adjust on overflow.
>
> As my understanding, the period adjust on overflow is to handle the case
> while the overflow happens very frequently (< 2 ticks). It is mainly
> caused by the very low start period (1).
> I'm working on a patch to set a larger start period, which should
> minimize the usage of the period adjust on overflow.
I think it's hard to choose a nice initial period, it may require a lot
of testing, good luck.
>
> Anyway, based on the current code, I agree that adding a new
> freq_tick_stamp should be required. But it doesn't need to read the time
> for each event. I think reading the time once at the beginning should be
> good enough for the period adjust/estimate algorithm.
That's a good idea, do you think it's appropriate to move this line here?
Thanks,
Gengkun
@@ -4126,6 +4126,8 @@ perf_adjust_freq_unthr_context(struct
perf_event_context *ctx, bool unthrottle)
raw_spin_lock(&ctx->lock);
+ tick_stamp = jiffies64_to_nsecs(get_jiffies_64());
+
list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
if (event->state != PERF_EVENT_STATE_ACTIVE)
continue;
@@ -4152,7 +4154,6 @@ perf_adjust_freq_unthr_context(struct
perf_event_context *ctx, bool unthrottle)
*/
event->pmu->stop(event, PERF_EF_UPDATE);
- tick_stamp = jiffies64_to_nsecs(get_jiffies_64());
period = tick_stamp - hwc->freq_tick_stamp;
hwc->freq_tick_stamp = tick_stamp;
>
> Thanks,
> Kan
>
>>> Thanks,
>>> Kan
>>>>> Thanks,
>>>>> Kan
>>>>>> + period = tick_stamp - hwc->freq_tick_stamp;
>>>>>> + hwc->freq_tick_stamp = tick_stamp;
>>>>>> +
>>>>>> now = local64_read(&event->count);
>>>>>> delta = now - hwc->freq_count_stamp;
>>>>>> hwc->freq_count_stamp = now;
>>>>>> @@ -4157,9 +4162,9 @@ static void perf_adjust_freq_unthr_events(struct list_head *event_list)
>>>>>> * reload only if value has changed
>>>>>> * we have stopped the event so tell that
>>>>>> * to perf_adjust_period() to avoid stopping it
>>>>>> - * twice.
>>>>>> + * twice. And skip if it is the first tick adjust period.
>>>>>> */
>>>>>> - if (delta > 0)
>>>>>> + if (delta > 0 && likely(period != tick_stamp))
>>>>>> perf_adjust_period(event, period, delta, false);>
>>>>>> event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
>>>>
>>
Powered by blists - more mailing lists