[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5c6491f72aa134c0470fea2c742ed14b198f051e.camel@redhat.com>
Date: Tue, 06 Sep 2022 15:16:50 +0300
From: ypodemsk@...hat.com
To: x86@...nel.org, tglx@...utronix.de, mingo@...hat.com,
peterz@...radead.org, rafael.j.wysocki@...el.com, pauld@...hat.com,
frederic@...nel.org, ggherdovich@...e.cz,
linux-kernel@...r.kernel.org, lenb@...nel.org, vschneid@...hat.com,
jlelli@...hat.com, mtosatti@...hat.com, ppandit@...hat.com,
alougovs@...hat.com, lcapitul@...hat.com, nsaenz@...nel.org
Subject: Re: [PATCH] x86/aperfmperf: Fix arch_scale_freq_tick() on tickless
systems
Friendly ping?
On Thu, 2022-08-04 at 16:17 +0300, Yair Podemsky wrote:
> In order for the scheduler to be frequency invariant we measure the
> ratio between the maximum cpu frequency and the actual cpu frequency.
> During long tickless periods of time the calculations that keep track
> of that might overflow, in the function scale_freq_tick():
>
> if (check_shl_overflow(acnt, 2*SCHED_CAPACITY_SHIFT, &acnt))
> ยป goto error;
>
> eventually forcing the kernel to disable the feature with the
> message "Scheduler frequency invariance went wobbly, disabling!".
> Let's avoid that by detecting long tickless periods and bypassing
> the calculation for that tick.
>
> This calculation updates the value of arch_freq_scale, used by the
> capacity-aware scheduler to correct cpu duty cycles:
> task_util_freq_inv(p) = duty_cycle(p) * (curr_frequency(cpu) /
> max_frequency(cpu))
>
> However Consider a long tickless period, It takes should take 60
> minutes
> for a tickless CPU running at 5GHz to trigger the acnt overflow,
> pick 10 minutes as a staleness threshold to be on the safe side,
> In our testing it took over 30 minutes for the overflow to happen,
> but since it's frequency/platform dependent we choose a smaller value
> to be on the safe side.
>
> Fixes: e2b0d619b400 ("x86, sched: check for counters overflow in
> frequency invariant accounting")
> Signed-off-by: Yair Podemsky <ypodemsk@...hat.com>
> ---
> arch/x86/kernel/cpu/aperfmperf.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/aperfmperf.c
> b/arch/x86/kernel/cpu/aperfmperf.c
> index 1f60a2b27936..dfe356034a60 100644
> --- a/arch/x86/kernel/cpu/aperfmperf.c
> +++ b/arch/x86/kernel/cpu/aperfmperf.c
> @@ -23,6 +23,13 @@
>
> #include "cpu.h"
>
> +/*
> + * Samples older then 10 minutes should not be proccessed,
> + * This time is long enough to prevent unneeded drops of data
> + * But short enough to prevent overflows
> + */
> +#define MAX_SAMPLE_AGE_NOHZ ((unsigned long)HZ * 600)
> +
> struct aperfmperf {
> seqcount_t seq;
> unsigned long last_update;
> @@ -373,6 +380,7 @@ static inline void scale_freq_tick(u64 acnt, u64
> mcnt) { }
> void arch_scale_freq_tick(void)
> {
> struct aperfmperf *s = this_cpu_ptr(&cpu_samples);
> + unsigned long last = s->last_update;
> u64 acnt, mcnt, aperf, mperf;
>
> if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF))
> @@ -392,7 +400,12 @@ void arch_scale_freq_tick(void)
> s->mcnt = mcnt;
> raw_write_seqcount_end(&s->seq);
>
> - scale_freq_tick(acnt, mcnt);
> + /*
> + * Avoid calling scale_freq_tick() when the last update was too
> long ago,
> + * as it might overflow during calulation.
> + */
> + if ((jiffies - last) <= MAX_SAMPLE_AGE_NOHZ)
> + scale_freq_tick(acnt, mcnt);
> }
>
> /*
Powered by blists - more mailing lists