lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170316140103.GU12825@kernel.org>
Date:   Thu, 16 Mar 2017 11:01:03 -0300
From:   Arnaldo Carvalho de Melo <acme@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>,
        Adrian Hunter <adrian.hunter@...el.com>
Cc:     Jiri Olsa <jolsa@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
        Wang Nan <wangnan0@...wei.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: 'perf test tsc' failing, bisected to "sched/clock: Provide better
 clock continuity"

Hi, this entry is failing for a while:

[root@...et ~]# perf test -v tsc
55: Convert perf time to TSC                   :
--- start ---
test child forked, pid 3008
mmap size 528384B
1st event perf time 93133455486631 tsc 15369449468752
rdtsc          time 93133464598760 tsc 15369473104358
2nd event perf time 93133455506961 tsc 15369449521485
test child finished with -1
---- end ----
Convert perf time to TSC: FAILED!
[root@...et ~]#

I bisected it to the following kernel change, ideas?

[acme@...icio linux]$ git bisect good
5680d8094ffa9e5cfc81afdd865027ee6417c263 is the first bad commit
commit 5680d8094ffa9e5cfc81afdd865027ee6417c263
Author: Peter Zijlstra <peterz@...radead.org>
Date:   Thu Dec 15 13:36:17 2016 +0100

    sched/clock: Provide better clock continuity
    
    When switching between the unstable and stable variants it is
    currently possible that clock discontinuities occur.
    
    And while these will mostly be 'small', attempt to do better.
    
    As observed on my IVB-EP, the sched_clock() is ~1.5s ahead of the
    ktime_get_ns() based timeline at the point of switchover
    (sched_clock_init_late()) after SMP bringup.
    
    Equally, when the TSC is later found to be unstable -- typically
    because SMM tries to hide its SMI latencies by mucking with the TSC --
    we want to avoid large jumps.
    
    Since the clocksource watchdog reports the issue after the fact we
    cannot exactly fix up time, but since SMI latencies are typically
    small (~10ns range), the discontinuity is mainly due to drift between
    sched_clock() and ktime_get_ns() (which on my desktop is ~79s over
    24days).
    
    I dislike this patch because it adds overhead to the good case in
    favour of dealing with badness. But given the widespread failure of
    TSC stability this is worth it.
    
    Note that in case the TSC makes drastic jumps after SMP bringup we're
    still hosed. There's just not much we can do in that case without
    stupid overhead.
    
    If we were to somehow expose tsc_clocksource_reliable (which is hard
    because this code is also used on ia64 and parisc) we could avoid some
    of the newly introduced overhead.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
    Cc: Linus Torvalds <torvalds@...ux-foundation.org>
    Cc: Mike Galbraith <efault@....de>
    Cc: Peter Zijlstra <peterz@...radead.org>
    Cc: Thomas Gleixner <tglx@...utronix.de>
    Cc: linux-kernel@...r.kernel.org
    Signed-off-by: Ingo Molnar <mingo@...nel.org>

:040000 040000 152545abe3b879aaa3cf053cdd58ef998c285529 3afcd0a5bc643fdd0fc994ee11cbfd87cfe4c30f M	kernel
[acme@...icio linux]$ 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ