lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJRGBZyFbvpdrbKmV9KrXz7VkMcTMqVp2PAcFRvroZue6b9tag@mail.gmail.com>
Date:   Wed, 28 Apr 2021 22:24:42 +0800
From:   Luming Yu <luming.yu@...il.com>
To:     paulmck@...nel.org
Cc:     Andi Kleen <ak@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>, john.stultz@...aro.org,
        sboyd@...nel.org, corbet@....net, Mark.Rutland@....com,
        maz@...nel.org, kernel-team@...com, neeraju@...eaurora.org,
        feng.tang@...el.com, zhengjun.xing@...el.com,
        Chris Mason <clm@...com>
Subject: Re: [PATCH v10 clocksource 1/7] clocksource: Provide module
 parameters to inject delays in watchdog

On Wed, Apr 28, 2021 at 9:57 PM Paul E. McKenney <paulmck@...nel.org> wrote:
>
> On Wed, Apr 28, 2021 at 12:49:12PM +0800, Luming Yu wrote:
> > We 'd expect to see clock_source watchdog can avoid to do wrong thing
> > due to the injected delay or
> > in real life delay by doing tsc sync-re-check by applying the patch-set.
> > However , the noise is still cause wrong actions and the patch doesn't
> > defeat the injected's delay
> > please correct me if  I'm wrong.
>
> Injecting delay is just a test.  In real life, if you got four delays
> in a row, the cause is likely that the clock read is broken and taking
> a very long time.  In which case marking that clock unstable is a
> reasonable response.
>
> Other causes include having an NMI or SMI storm, getting extremely
> unlucky with vCPU preemptions, and so on.  In these cases, you are not
> making much forward progress anyway, so marked-unstable clock is the
> least of your worries.
>
> I ran this (without injected delays) for more than a thousand hours on
> rcutorture guest OSes and didn't see any instances of even two consecutive
> bad reads.   There was the very occasional single instance of a bad read.
>
> Therefore, the code marks the clock unstable if it has four bad reads
> in a row, as it should.

The hard problem to solve is tsc is still in good shape and it can be verified
with a quick cross check with other thread/core's tsc counts in the
injected situation or in real life case
to prove if it is  truly a tsc problem or reference clock's problem of
the watchdog.

Ideally, we could factor out hard-to-debug unstable tsc problems from
clock source watchdog problems
and get less and less tsc sightings caused by clock source watchdog.

>
>                                                         Thanx, Paul
>
> > parameters]# cat *
> > 1
> > 1
> > -1
> > 3
> > 8
> >
> > [62939.809615] clocksource: clocksource_watchdog_inject_delay():
> > Injecting delay.
> > [62939.816867] clocksource: clocksource_watchdog_inject_delay():
> > Injecting delay.
> > [62939.824094] clocksource: clocksource_watchdog_inject_delay():
> > Injecting delay.
> > [62939.831314] clocksource: clocksource_watchdog_inject_delay():
> > Injecting delay.
> > [62939.838536] clocksource: timekeeping watchdog on CPU26: hpet
> > read-back delay of 7220833ns, attempt 4, marking unstable
> > [62939.849230] tsc: Marking TSC unstable due to clocksource watchdog
> > [62939.855340] TSC found unstable after boot, most likely due to
> > broken BIOS. Use 'tsc=unstable'.
> > [62939.863972] sched_clock: Marking unstable (62943398530130,
> > -3543150114)<-(62941186607503, -1331276112)
> > [62939.875104] clocksource: Checking clocksource tsc synchronization
> > from CPU 123 to CPUs 0,6,26,62,78,97-98,137.
> > [62939.886518] clocksource: Switched to clocksource hpet
> >
> > On Tue, Apr 27, 2021 at 2:27 AM Paul E. McKenney <paulmck@...nel.org> wrote:
> > >
> > > On Mon, Apr 26, 2021 at 10:56:27AM -0700, Andi Kleen wrote:
> > > > > ------------------------------------------------------------------------
> > > > >
> > > > > - module parameters
> > > > >
> > > > >   If the scope of the fault injection capability is limited to a
> > > > >   single kernel module, it is better to provide module parameters to
> > > > >   configure the fault attributes.
> > > > >
> > > > > ------------------------------------------------------------------------
> > > > >
> > > > > And in this case, the fault injection capability is in fact limited to
> > > > > kernel/clocksource.c.
> > > >
> > > >
> > > > I disagree with this recommendation because it prevents fuzzer coverage.
> > > >
> > > > Much better to have an uniform interface that can be automatically
> > > > explored.
> > >
> > > The permissions for these module parameters is 0644, so there is no
> > > reason why the fuzzers cannot use them via sysfs.
> > >
> > >                                                         Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ