lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210430051059.GE975577@paulmck-ThinkPad-P17-Gen-1>
Date:   Thu, 29 Apr 2021 22:10:59 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Luming Yu <luming.yu@...il.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        John Stultz <john.stultz@...aro.org>, sboyd@...nel.org,
        corbet@....net, Mark.Rutland@....com, maz@...nel.org,
        kernel-team@...com, neeraju@...eaurora.org,
        Andi Kleen <ak@...ux.intel.com>, feng.tang@...el.com,
        zhengjun.xing@...el.com
Subject: Re: [PATCH V11 clocksource 0/6] Do not mark clocks unstable due to
 delays for v5.13

On Thu, Apr 29, 2021 at 07:13:40PM +0800, Luming Yu wrote:
> On Thu, Apr 29, 2021 at 9:30 AM Paul E. McKenney <paulmck@...nel.org> wrote:
> >
> > Hello!
> >
> > If there is a sufficient delay between reading the watchdog clock and the
> > clock under test, the clock under test will be marked unstable through no
> > fault of its own.  This series checks for this, doing limited retries
> > to get a good set of clock reads.  If the clock is marked unstable
> > and is marked as being per-CPU, cross-CPU synchronization is checked.
> > This series also provides delay injection, which may be enabled via
> > kernel boot parameters to test the checking for delays.
> >
> > Note that "sufficient delay" can be provided by SMIs, NMIs, and of course
> > vCPU preemption.
> >
> > 1.      Provide module parameters to inject delays in watchdog.
> >
> > 2.      Retry clock read if long delays detected.
> >
> > 3.      Check per-CPU clock synchronization when marked unstable.
> >
> > 4.      Provide a module parameter to fuzz per-CPU clock checking.
> >
> > 5.      Limit number of CPUs checked for clock synchronization.
> >
> > 6.      Reduce clocksource-skew threshold for TSC.
> >
> > Changes since v10, based on feedback from Thomas Gleixner, Peter Zijlstra,
> > Feng Tang, Andi Kleen, Luming Yu, Xing Zhengju, and the indefatigible
> > kernel test robot:
> >
> > o       Automatically compute the uncertainty margin for clocksource, and
> >         also allow them to be specified manually before that clocksource
> >         is registered.
> >
> > o       For the automatically computed uncertainty margins, bound them
> >         below by 100 microseconds (2 * WATCHDOG_MAX_SKEW).
> >
> > o       For the manually specified uncertainty margins, splat (but
> >         continue) if they are less than 100 microseconds (again 2 *
> >         WATCHDOG_MAX_SKEW).  The purpose of splatting is to discourage
> >         production use of this clock-skew-inducing debugging technique.
> >
> > o       Manually set the uncertainty margin for clocksource_jiffies
> >         (and thus refined_jiffies) to TICK_NSEC to compensate for the
> >         very low frequency of these clocks.
> >
> > o       Manually set the uncertainty margin for clocksource_tsc_early
> >         to 32 milliseconds.
> >
> > o       Apply numerous "Link:" fields to all patches.
> >
> > o       Add some acks and CCs.
> >
> > Changes since v9:
> >
> > o       Forgive tsc_early drift, based on feedback from Feng Tang; Xing,
> >         Zhengjun; and Thomas Gleixner.
> >
> > o       Improve CPU selection for clock-synchronization checking.
> >
> > Link: https://lore.kernel.org/lkml/20210419045155.GA596058@paulmck-ThinkPad-P17-Gen-1/
> >
> > Changes since v8, based on Thomas Gleixner feedback:
> >
> > o       Reduced clock-skew threshold to 200us and delay limit to 50us.
> >
> > o       Split out a cs_watchdog_read() function.
> >
> > o       Removed the pointless CLOCK_SOURCE_VERIFY_PERCPU from kvm_clock.
> >
> > o       Initialized cs_nsec_max and cs_nsec_min to avoid firsttime checks.
> >
> > Link: https://lore.kernel.org/lkml/20210414043435.GA2812539@paulmck-ThinkPad-P17-Gen-1/
> >
> > Changes since v7, based on Thomas Gleixner feedback:
> >
> > o       Fix embarrassing git-format-patch operator error.
> >
> > o       Merge pairwise clock-desynchronization checking into the checking
> >         of per-CPU clock synchronization when marked unstable.
> >
> > o       Do selective per-CPU checking rather than blindly checking all
> >         CPUs.  Provide a clocksource.verify_n_cpus kernel boot parameter
> >         to control this behavior, with the value -1 choosing the old
> >         check-all-CPUs behavior.  The default is to randomly check 8 CPUs.
> >
> > o       Fix the clock-desynchronization checking to avoid a potential
> >         use-after-free error for dynamically allocated clocksource
> >         structures.
> >
> > o       Remove redundance "wdagain_nsec < 0" from clocksource_watchdog()
> >         clocksource skew checking.
> >
> > o       Update commit logs and do code-style updates.
> >
> > Link: https://lore.kernel.org/lkml/20210106004013.GA11179@paulmck-ThinkPad-P72/
> >
> > Changes since v5:
> >
> > o       Rebased to v5.12-rc5.
> >
> > Changes since v4:
> >
> > o       Rebased to v5.12-rc1.
> >
> > Changes since v3:
> >
> > o       Rebased to v5.11.
> >
> > o       Apply Randy Dunlap feedback.
> >
> > Changes since v2:
> >
> > o       Rebased to v5.11-rc6.
> >
> > o       Updated Cc: list.
> >
> > Changes since v1:
> >
> > o       Applied feedback from Rik van Riel.
> >
> > o       Rebased to v5.11-rc3.
> >
> > o       Stripped "RFC" from the subject lines.
> >
> >                                                 Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> >  Documentation/admin-guide/kernel-parameters.txt   |   32 +++
> >  arch/x86/kernel/tsc.c                             |    1
> >  b/Documentation/admin-guide/kernel-parameters.txt |   21 ++
> >  b/arch/x86/kernel/tsc.c                           |    3
> >  b/include/linux/clocksource.h                     |    2
> >  b/kernel/time/clocksource.c                       |   23 ++
> >  b/kernel/time/jiffies.c                           |   15 -
> >  include/linux/clocksource.h                       |    3
> >  kernel/time/clocksource.c                         |  227 ++++++++++++++++++++--
> >  9 files changed, 304 insertions(+), 23 deletions(-)
> 
> Hi Paul,
> using the v11, I added a approve flag and made it work for my early
> inject test  where tsc is good
> through a cross tsc sync test. Ideally with the small tweak, we could
> get less tsc issues to debug.
>  And I'm not sure it would help in real trouble shooting cases. But we
> will see if it would help.

Thank you for the patch!

However, Thomas had me rework this code to put the error injection into
a kernel module, so this effect is now obtained in a different way.
So I am unable to make use of your patch.

							Thanx, Paul

> My hack patch snippet as below:
> 
> 
> 
>  static void __clocksource_unstable(struct clocksource *cs)
>  {
> +       if (!cs->approved)
> +               return;
> +
>         cs->flags &= ~(CLOCK_SOURCE_VALID_FOR_HRES | CLOCK_SOURCE_WATCHDOG);
>         cs->flags |= CLOCK_SOURCE_UNSTABLE;
> 
> @@ -366,9 +369,12 @@
>         if (!cpumask_empty(&cpus_behind))
>                 pr_warn("        CPUs %*pbl behind CPU %d for
> clocksource %s.\n",
>                         cpumask_pr_args(&cpus_behind), testcpu, cs->name);
> -       if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind))
> +       if (!cpumask_empty(&cpus_ahead) || !cpumask_empty(&cpus_behind)) {
>                 pr_warn("        CPU %d check durations %lldns -
> %lldns for clocksource %s.\n",
>                         testcpu, cs_nsec_min, cs_nsec_max, cs->name);
> +               cs->approved = true;
> +               __clocksource_unstable(cs);
> +       }
>  }
> 
>  static void clocksource_watchdog(struct timer_list *unused)
> @@ -396,6 +402,7 @@
> 
>                 if (!cs_watchdog_read(cs, &csnow, &wdnow)) {
>                         /* Clock readout unreliable, so give it up. */
> +                       cs->approved = false;
>                         __clocksource_unstable(cs);
>                         continue;
>                 }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ