lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6DB365C6-98F6-4C27-B0BE-0833E5D4962E@akamai.com>
Date:   Fri, 16 Nov 2018 18:46:36 +0000
From:   "Zhivich, Michael" <mzhivich@...mai.com>
To:     John Stultz <john.stultz@...aro.org>
CC:     lkml <linux-kernel@...r.kernel.org>,
        "tiny.windzz@...il.com" <tiny.windzz@...il.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        "alexander.levin@...izon.com" <alexander.levin@...izon.com>,
        "frederic@...nel.org" <frederic@...nel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Arnd Bergmann <arnd@...db.de>,
        "Ondrej Mosnacek" <omosnace@...hat.com>,
        Jason Wessel <jason.wessel@...driver.com>,
        "kreview@...mai.com" <kreview@...mai.com>
Subject: Re: [PATCH] softirq: don't push timer softirq handling to ksoftirqd

On 11/15/18, 12:17 PM, "John Stultz" <john.stultz@...aro.org> wrote:

    On Thu, Nov 15, 2018 at 9:07 AM, Michael Zhivich <mzhivich@...mai.com> wrote:
    > Require TIMER_SOFTIRQ to be handled immediately instead of delaying until
    > ksoftirqd runs, thus preventing problems with reading clocksources that
    > wrap often (e.g. acpi_pm).
    >
    > If acpi_pm is used as the clocksource watchdog, and machine is under heavy
    > load, the time period for the watchdog check may be significantly longer
    > than the requested 0.5 seconds.  If the watchdog check is delayed by 2
    > seconds (observed behavior), then acpi_pm time delta will be
    >
    >     2.5 sec * 3579545 ticks/sec = 8948863 = 0x888c3f
    >
    > which will be treated as negative (since acpi_pm is only 24-bits wide) and
    > truncated to 0.  This behavior will cause tsc to be incorrectly declared
    > unstable in clocksource_watchdog(), as it no longer agrees with acpi_pm.
    > If the clocksource watchdog check is delayed by more than 4.7 sec, then the
    > acpi_pm clocksource will wrap altogether and produce incorrect time delta.
    >
    > The likely cause of this delay is that timer interrupts are serviced in
    > ksoftirqd when the machine is very busy.
    >
    > Per Linus' comment in commit 3c53776e29f8 ("Mark HI and TASKLET softirq
    > synchronous"):
    >    ...
    >    We should probably also consider the timer softirqs to be synchronous
    >    and not be delayed to ksoftirqd (since they were the issue with the
    >    earlier watchdog problems), but that should be done as a separate patch.
    >    ...
    >
    > Signed-off-by: Michael Zhivich <mzhivich@...mai.com>
    > ---
    >  kernel/softirq.c | 3 ++-
    >  1 file changed, 2 insertions(+), 1 deletion(-)
    >
    > diff --git a/kernel/softirq.c b/kernel/softirq.c
    > index d28813306b2c..6d517ce0fba8 100644
    > --- a/kernel/softirq.c
    > +++ b/kernel/softirq.c
    > @@ -82,7 +82,8 @@ static void wakeup_softirqd(void)
    >   * right now. Let ksoftirqd handle this at its own rate, to get fairness,
    >   * unless we're doing some of the synchronous softirqs.
    >   */
    > -#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ))
    > +#define SOFTIRQ_NOW_MASK \
    > +       ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ) | (1 << TIMER_SOFTIRQ))
    >  static bool ksoftirqd_running(unsigned long pending)
    >  {
    >         struct task_struct *tsk = __this_cpu_read(ksoftirqd);
    
    Thanks so much for sending this along! Sorry I didn't get back to your
    mail earlier this week, I've been at Plumbers.
    
    So while this does try to attack the reliability issue w/ the
    clocksource watchdog being delayed, I worry this will have to many
    side-effects elsewhere.
    
    Would a more focused fix be to move the clocksource watchdog from a
    normal timer to a hrtimer?
    
    thanks
    -john
    
Hi John,

That's an interesting idea - it would get clocksource watchdog out of ksoftirqd.  However, clocksource watchdog iterates over available CPUs to check the TSC on each core (see add_timer_on() call in clocksource_watchdog()).  I'm not seeing an API to start an hrtimer on a specific CPU - is this possible and I'm missing it?  Or would something like this have to be added to hrtimer?

Thanks,
~ Michael

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ