linux-kernel - [BUG] bad runqueue clock/ktime value at init?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <CAE2eyx1mdFtv_YXCKdd4tZu1A7_3t09=Xdw6=uPwyLMSBwH_LA@mail.gmail.com>
Date:   Tue, 29 Nov 2016 17:41:42 +0100
From:   Nicolas Morey-Chaisemartin <nicolas.morey.chaisemartin@...il.com>
To:     linux-kernel@...r.kernel.org
Subject: [BUG] bad runqueue clock/ktime value at init?

Hi everyone,

After upgrading my worksation (ASUS Rampage IV GENE motherboard,
Core(TM) i7-3820) to a kernel >= 4.6, I noticed bad performances and
htop showing "0" CPU usage on all processes.
/sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq showed <unknown>
for all cores.

Bisect pointed me to a commit in cpufreq that has nothing to do with
the initial issue but caused the unknown to appear and all those
symptoms.

Adding some logs showed that intel_pstate_update_util was called using
always the same time parameter (one per core, but always the same one
on the core).

I compiled a small module to regularly dump sched_cpu_clock() value in
dmesg, and its value was smaller than the one provided to
intel_pstate_update_util.

>From what I could understand, the runqueue clock is monotonic and
computed using sched_cpu_clock().
After a while (when sched_cpu_clock() becomes greater than the
runqueue clock which took between 10 and 30 minutes), things go back
to normal, and cpufreq gets working again.

Looking into dmesg, it seems the TSC was broken on my system (BIOS
issue), so the kernel is using another source for clocks.

Is it expected in this case (no TSC) that sched_cpu_clock "rewinds"
sometime after the boot?

Upgrading the BIOS fixed the TSC issue and solved the bug for me. So
this is not critical, but I've seen a few posts here and there about
people that hit the same bug.

Nicolas

P.S.: I have other workstation with the former BIOS version so I can
try out patches and give more info if needed.,