lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 15 Nov 2018 09:17:50 -0800
From:   John Stultz <john.stultz@...aro.org>
To:     Michael Zhivich <mzhivich@...mai.com>
Cc:     lkml <linux-kernel@...r.kernel.org>, tiny.windzz@...il.com,
        Joel Fernandes <joel@...lfernandes.org>,
        alexander.levin@...izon.com, frederic@...nel.org,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Arnd Bergmann <arnd@...db.de>,
        Ondrej Mosnacek <omosnace@...hat.com>,
        Jason Wessel <jason.wessel@...driver.com>, kreview@...mai.com
Subject: Re: [PATCH] softirq: don't push timer softirq handling to ksoftirqd

On Thu, Nov 15, 2018 at 9:07 AM, Michael Zhivich <mzhivich@...mai.com> wrote:
> Require TIMER_SOFTIRQ to be handled immediately instead of delaying until
> ksoftirqd runs, thus preventing problems with reading clocksources that
> wrap often (e.g. acpi_pm).
>
> If acpi_pm is used as the clocksource watchdog, and machine is under heavy
> load, the time period for the watchdog check may be significantly longer
> than the requested 0.5 seconds.  If the watchdog check is delayed by 2
> seconds (observed behavior), then acpi_pm time delta will be
>
>     2.5 sec * 3579545 ticks/sec = 8948863 = 0x888c3f
>
> which will be treated as negative (since acpi_pm is only 24-bits wide) and
> truncated to 0.  This behavior will cause tsc to be incorrectly declared
> unstable in clocksource_watchdog(), as it no longer agrees with acpi_pm.
> If the clocksource watchdog check is delayed by more than 4.7 sec, then the
> acpi_pm clocksource will wrap altogether and produce incorrect time delta.
>
> The likely cause of this delay is that timer interrupts are serviced in
> ksoftirqd when the machine is very busy.
>
> Per Linus' comment in commit 3c53776e29f8 ("Mark HI and TASKLET softirq
> synchronous"):
>    ...
>    We should probably also consider the timer softirqs to be synchronous
>    and not be delayed to ksoftirqd (since they were the issue with the
>    earlier watchdog problems), but that should be done as a separate patch.
>    ...
>
> Signed-off-by: Michael Zhivich <mzhivich@...mai.com>
> ---
>  kernel/softirq.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index d28813306b2c..6d517ce0fba8 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -82,7 +82,8 @@ static void wakeup_softirqd(void)
>   * right now. Let ksoftirqd handle this at its own rate, to get fairness,
>   * unless we're doing some of the synchronous softirqs.
>   */
> -#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ))
> +#define SOFTIRQ_NOW_MASK \
> +       ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ) | (1 << TIMER_SOFTIRQ))
>  static bool ksoftirqd_running(unsigned long pending)
>  {
>         struct task_struct *tsk = __this_cpu_read(ksoftirqd);

Thanks so much for sending this along! Sorry I didn't get back to your
mail earlier this week, I've been at Plumbers.

So while this does try to attack the reliability issue w/ the
clocksource watchdog being delayed, I worry this will have to many
side-effects elsewhere.

Would a more focused fix be to move the clocksource watchdog from a
normal timer to a hrtimer?

thanks
-john

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ