lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5214F825.8010504@linaro.org>
Date:	Wed, 21 Aug 2013 10:25:57 -0700
From:	John Stultz <john.stultz@...aro.org>
To:	Frederic Weisbecker <fweisbec@...il.com>
CC:	LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Don Zickus <dzickus@...hat.com>
Subject: Re: [RFC PATCH 6/6] timekeeping: Debug missing timekeeping updates

On 08/21/2013 09:42 AM, Frederic Weisbecker wrote:
> With the full dynticks feature and the tricky full system idle
> detection code that is coming soon, it becomes necessary to have
> some debug code that makes sure that the timekeeping is always
> maintained and moving forward as expected.
>
> This provides a simple detection of missing timekeeping updates
> inspired by the lockup detector's use of CPU cycles clock.
>
> The jiffies are compared to the cpu clock after several snapshots taken
> from NMIs that trigger after arbitrary CPU cycles period overflow.
>
> If the jiffies progression appears to drift too far away from the CPU
> clock's, this triggers a warning.
>
> We just make sure not to account the tiny code on irq entry that
> may have stale jiffies values before tick_check_nohz() is called
> after the CPU is woken up while the system went full idle for some
> time.
>
> Same goes for idle exit in case the tick were stopped but idle
> was polling on need_resched().

So you're using sched_clock to try to detect timekeeping
inconsistencies. Hrm.. Do you have some examples of where this debug
infrastructure helped out?

A few thoughts:

1) Why are you using jiffies as the timekeeping reference instead of
reading some of actual timekeeping values? Jiffies usage has been
intentionally on the decline, and since the dynticks infrastructure
landed, jiffies are just derived from the timekeeping core, so its so
its sort of strange to see it used for this.

2) This seems very similar to the old lost-ticks compensation code we
had prior to the clocksource infrastructure, and seems like it might
suffer from some of the issues seen there. For instance, sched_clock has
been historically looser in its correctness requirements then the
timekeeping code, so using it to validate the more strict timekeeping
code, makes me worry we might see cases of false positives.

3) I'm also curious (maybe skeptical) as if sched_clock is reliable
enough to use for validating time, then we likely are using that same
hardware as the timekeeping clocksource. Thus cases where I'd suspect
you'd see likely issues w/ nohz, like clocksource counter overflows
being missed on quick wrapping clcoksources wouldn't really apply.


Personally, I've been thinking the timekeeping update code could use
some improvements/warnings around cases where update delay is larger
then the clocksource max_deferment - possibly falling back to a slower
overflow-proof multiply as is done in the CLOCK_SOURCE_SUSPEND_NONSTOP
resume case. This would allow more robust behaivor in cases like kvm
guests being paused for unreasonable lengths of time, and could also
provide very similar NOHZ debug warnings (assuming the clocksource
doesn't wrap quickly - but again, in those cases, I'm not confident we
can trust sched_clock either).


thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ