[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220804131728.58513-1-ypodemsk@redhat.com>
Date: Thu, 4 Aug 2022 16:17:28 +0300
From: Yair Podemsky <ypodemsk@...hat.com>
To: x86@...nel.org, tglx@...utronix.de, mingo@...hat.com,
peterz@...radead.org, rafael.j.wysocki@...el.com, pauld@...hat.com,
frederic@...nel.org, ggherdovich@...e.cz,
linux-kernel@...r.kernel.org, lenb@...nel.org, vschneid@...hat.com,
jlelli@...hat.com, mtosatti@...hat.com, ppandit@...hat.com,
alougovs@...hat.com, lcapitul@...hat.com, nsaenz@...nel.org
Cc: ypodemsk@...hat.com
Subject: [PATCH] x86/aperfmperf: Fix arch_scale_freq_tick() on tickless systems
In order for the scheduler to be frequency invariant we measure the
ratio between the maximum cpu frequency and the actual cpu frequency.
During long tickless periods of time the calculations that keep track
of that might overflow, in the function scale_freq_tick():
if (check_shl_overflow(acnt, 2*SCHED_CAPACITY_SHIFT, &acnt))
ยป goto error;
eventually forcing the kernel to disable the feature with the
message "Scheduler frequency invariance went wobbly, disabling!".
Let's avoid that by detecting long tickless periods and bypassing
the calculation for that tick.
This calculation updates the value of arch_freq_scale, used by the
capacity-aware scheduler to correct cpu duty cycles:
task_util_freq_inv(p) = duty_cycle(p) * (curr_frequency(cpu) /
max_frequency(cpu))
However Consider a long tickless period, It takes should take 60 minutes
for a tickless CPU running at 5GHz to trigger the acnt overflow,
pick 10 minutes as a staleness threshold to be on the safe side,
In our testing it took over 30 minutes for the overflow to happen,
but since it's frequency/platform dependent we choose a smaller value
to be on the safe side.
Fixes: e2b0d619b400 ("x86, sched: check for counters overflow in frequency invariant accounting")
Signed-off-by: Yair Podemsky <ypodemsk@...hat.com>
---
arch/x86/kernel/cpu/aperfmperf.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
index 1f60a2b27936..dfe356034a60 100644
--- a/arch/x86/kernel/cpu/aperfmperf.c
+++ b/arch/x86/kernel/cpu/aperfmperf.c
@@ -23,6 +23,13 @@
#include "cpu.h"
+/*
+ * Samples older then 10 minutes should not be proccessed,
+ * This time is long enough to prevent unneeded drops of data
+ * But short enough to prevent overflows
+ */
+#define MAX_SAMPLE_AGE_NOHZ ((unsigned long)HZ * 600)
+
struct aperfmperf {
seqcount_t seq;
unsigned long last_update;
@@ -373,6 +380,7 @@ static inline void scale_freq_tick(u64 acnt, u64 mcnt) { }
void arch_scale_freq_tick(void)
{
struct aperfmperf *s = this_cpu_ptr(&cpu_samples);
+ unsigned long last = s->last_update;
u64 acnt, mcnt, aperf, mperf;
if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF))
@@ -392,7 +400,12 @@ void arch_scale_freq_tick(void)
s->mcnt = mcnt;
raw_write_seqcount_end(&s->seq);
- scale_freq_tick(acnt, mcnt);
+ /*
+ * Avoid calling scale_freq_tick() when the last update was too long ago,
+ * as it might overflow during calulation.
+ */
+ if ((jiffies - last) <= MAX_SAMPLE_AGE_NOHZ)
+ scale_freq_tick(acnt, mcnt);
}
/*
--
2.31.1
Powered by blists - more mailing lists