linux-kernel - [PATCH] x86/aperfmperf: Fix arch_scale_freq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20220804131728.58513-1-ypodemsk@redhat.com>
Date:   Thu,  4 Aug 2022 16:17:28 +0300
From:   Yair Podemsky <ypodemsk@...hat.com>
To:     x86@...nel.org, tglx@...utronix.de, mingo@...hat.com,
        peterz@...radead.org, rafael.j.wysocki@...el.com, pauld@...hat.com,
        frederic@...nel.org, ggherdovich@...e.cz,
        linux-kernel@...r.kernel.org, lenb@...nel.org, vschneid@...hat.com,
        jlelli@...hat.com, mtosatti@...hat.com, ppandit@...hat.com,
        alougovs@...hat.com, lcapitul@...hat.com, nsaenz@...nel.org
Cc:     ypodemsk@...hat.com
Subject: [PATCH] x86/aperfmperf: Fix arch_scale_freq_tick() on tickless systems

In order for the scheduler to be frequency invariant we measure the
ratio between the maximum cpu frequency and the actual cpu frequency.
During long tickless periods of time the calculations that keep track
of that might overflow, in the function scale_freq_tick():

if (check_shl_overflow(acnt, 2*SCHED_CAPACITY_SHIFT, &acnt))
»       goto error;

eventually forcing the kernel to disable the feature with the
message "Scheduler frequency invariance went wobbly, disabling!".
Let's avoid that by detecting long tickless periods and bypassing
the calculation for that tick.

This calculation updates the value of arch_freq_scale, used by the
capacity-aware scheduler to correct cpu duty cycles:
task_util_freq_inv(p) = duty_cycle(p) * (curr_frequency(cpu) /
max_frequency(cpu))

However Consider a long tickless period, It takes should take 60 minutes
for a tickless CPU running at 5GHz to trigger the acnt overflow,
pick 10 minutes as a staleness threshold to be on the safe side,
In our testing it took over 30 minutes for the overflow to happen,
but since it's frequency/platform dependent we choose a smaller value
to be on the safe side.

Fixes: e2b0d619b400 ("x86, sched: check for counters overflow in frequency invariant accounting")
Signed-off-by: Yair Podemsky <ypodemsk@...hat.com>
---
 arch/x86/kernel/cpu/aperfmperf.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
index 1f60a2b27936..dfe356034a60 100644
--- a/arch/x86/kernel/cpu/aperfmperf.c
+++ b/arch/x86/kernel/cpu/aperfmperf.c
@@ -23,6 +23,13 @@
 
 #include "cpu.h"
 
+/*
+ * Samples older then 10 minutes should not be proccessed,
+ * This time is long enough to prevent unneeded drops of data
+ * But short enough to prevent overflows
+ */
+#define MAX_SAMPLE_AGE_NOHZ	((unsigned long)HZ * 600)
+
 struct aperfmperf {
 	seqcount_t	seq;
 	unsigned long	last_update;
@@ -373,6 +380,7 @@ static inline void scale_freq_tick(u64 acnt, u64 mcnt) { }
 void arch_scale_freq_tick(void)
 {
 	struct aperfmperf *s = this_cpu_ptr(&cpu_samples);
+	unsigned long last  = s->last_update;
 	u64 acnt, mcnt, aperf, mperf;
 
 	if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF))
@@ -392,7 +400,12 @@ void arch_scale_freq_tick(void)
 	s->mcnt = mcnt;
 	raw_write_seqcount_end(&s->seq);
 
-	scale_freq_tick(acnt, mcnt);
+	/*
+	 * Avoid calling scale_freq_tick() when the last update was too long ago,
+	 * as it might overflow during calulation.
+	 */
+	if ((jiffies - last) <= MAX_SAMPLE_AGE_NOHZ)
+		scale_freq_tick(acnt, mcnt);
 }
 
 /*
-- 
2.31.1