lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1911051048190.17054@nanos.tec.linutronix.de>
Date:   Tue, 5 Nov 2019 10:53:21 +0100 (CET)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Scott Wood <swood@...hat.com>
cc:     Peter Zijlstra <peterz@...radead.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Ingo Molnar <mingo@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] timers/nohz: Update nohz load even if tick already
 stopped

On Tue, 5 Nov 2019, Scott Wood wrote:
> On Tue, 2019-11-05 at 00:43 +0100, Thomas Gleixner wrote:
> > As Peter pointed out to me privately we should rather go and analyze the
> > real thing instead of just applying duct tape.
> > 
> > /me drops the patch again.
> 
> The warning is due to kernel/sched/idle.c not updating curr->se.exec_start.
> 
> While debugging I noticed an issue with a particular load pattern.  The CPU
> goes non-nohz for a brief time at an interval very close to twice 
> tick_period.  When the tick is started, the timer expiration is more than
> tick_period in the past, so hrtimer_forward() tries to catch up by adding
> 2*tick_period to the expiration.  Then the tick is stopped before that new
> expiration, and when the tick is woken up the expiry is again advanced by
> 2*tick_period with the timer never actually running.  sched_tick_remote()
> does fire every second, but there are streaks of several seconds where it
> keeps catching the CPU in a non-nohz state, so neither the normal nor remote
> ticks are calling calc_load_nohz_remote().
> 
> Is there a reason to not just remove the hrtimer_forward() from
> tick_nohz_restart(), letting the timer fire if it's in the past, which will
> take care of doing hrtimer_forward()?

Well, no. tick_nohz_restart() can be invoked in a situation where the timer
is armed for something in the far future (or completelt disabled) due to
previously entering an estimated long idle (or user space execution on
NOHZ_FULL) period.

That means if the timer is not canceled, realigned to the current tick and
forwarded to the next due tick, the tick will not fire on time causing
another sort of trouble.

> As for the warning in sched_tick_remote(), it seems like a test for time
> since the last tick on this cpu (remote or otherwise) would be better than
> relying on curr->se.exec_start, in order to detect things like this.

Care to give that a shot?

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ