lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220804131728.58513-1-ypodemsk@redhat.com>
Date:   Thu,  4 Aug 2022 16:17:28 +0300
From:   Yair Podemsky <ypodemsk@...hat.com>
To:     x86@...nel.org, tglx@...utronix.de, mingo@...hat.com,
        peterz@...radead.org, rafael.j.wysocki@...el.com, pauld@...hat.com,
        frederic@...nel.org, ggherdovich@...e.cz,
        linux-kernel@...r.kernel.org, lenb@...nel.org, vschneid@...hat.com,
        jlelli@...hat.com, mtosatti@...hat.com, ppandit@...hat.com,
        alougovs@...hat.com, lcapitul@...hat.com, nsaenz@...nel.org
Cc:     ypodemsk@...hat.com
Subject: [PATCH] x86/aperfmperf: Fix arch_scale_freq_tick() on tickless systems

In order for the scheduler to be frequency invariant we measure the
ratio between the maximum cpu frequency and the actual cpu frequency.
During long tickless periods of time the calculations that keep track
of that might overflow, in the function scale_freq_tick():

if (check_shl_overflow(acnt, 2*SCHED_CAPACITY_SHIFT, &acnt))
ยป       goto error;

eventually forcing the kernel to disable the feature with the
message "Scheduler frequency invariance went wobbly, disabling!".
Let's avoid that by detecting long tickless periods and bypassing
the calculation for that tick.

This calculation updates the value of arch_freq_scale, used by the
capacity-aware scheduler to correct cpu duty cycles:
task_util_freq_inv(p) = duty_cycle(p) * (curr_frequency(cpu) /
max_frequency(cpu))

However Consider a long tickless period, It takes should take 60 minutes
for a tickless CPU running at 5GHz to trigger the acnt overflow,
pick 10 minutes as a staleness threshold to be on the safe side,
In our testing it took over 30 minutes for the overflow to happen,
but since it's frequency/platform dependent we choose a smaller value
to be on the safe side.

Fixes: e2b0d619b400 ("x86, sched: check for counters overflow in frequency invariant accounting")
Signed-off-by: Yair Podemsky <ypodemsk@...hat.com>
---
 arch/x86/kernel/cpu/aperfmperf.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
index 1f60a2b27936..dfe356034a60 100644
--- a/arch/x86/kernel/cpu/aperfmperf.c
+++ b/arch/x86/kernel/cpu/aperfmperf.c
@@ -23,6 +23,13 @@
 
 #include "cpu.h"
 
+/*
+ * Samples older then 10 minutes should not be proccessed,
+ * This time is long enough to prevent unneeded drops of data
+ * But short enough to prevent overflows
+ */
+#define MAX_SAMPLE_AGE_NOHZ	((unsigned long)HZ * 600)
+
 struct aperfmperf {
 	seqcount_t	seq;
 	unsigned long	last_update;
@@ -373,6 +380,7 @@ static inline void scale_freq_tick(u64 acnt, u64 mcnt) { }
 void arch_scale_freq_tick(void)
 {
 	struct aperfmperf *s = this_cpu_ptr(&cpu_samples);
+	unsigned long last  = s->last_update;
 	u64 acnt, mcnt, aperf, mperf;
 
 	if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF))
@@ -392,7 +400,12 @@ void arch_scale_freq_tick(void)
 	s->mcnt = mcnt;
 	raw_write_seqcount_end(&s->seq);
 
-	scale_freq_tick(acnt, mcnt);
+	/*
+	 * Avoid calling scale_freq_tick() when the last update was too long ago,
+	 * as it might overflow during calulation.
+	 */
+	if ((jiffies - last) <= MAX_SAMPLE_AGE_NOHZ)
+		scale_freq_tick(acnt, mcnt);
 }
 
 /*
-- 
2.31.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ