linux-kernel - Re: [patch 4 14/22] timer: Switch to a non cascading wheel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20160812191411.GA12616@linux.vnet.ibm.com>
Date:	Fri, 12 Aug 2016 12:14:11 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Rik van Riel <riel@...hat.com>
Cc:	Jouni Malinen <jkmalinen@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Chris Mason <clm@...com>,
	Arjan van de Ven <arjan@...radead.org>, rt@...utronix.de,
	George Spelvin <linux@...encehorizons.net>,
	Len Brown <lenb@...nel.org>,
	Josh Triplett <josh@...htriplett.org>,
	Eric Dumazet <edumazet@...gle.com>
Subject: Re: [patch 4 14/22] timer: Switch to a non cascading wheel

On Fri, Aug 12, 2016 at 01:50:16PM -0400, Rik van Riel wrote:
> On Thu, 2016-08-11 at 18:21 +0300, Jouni Malinen wrote:
> > On Mon, Jul 4, 2016 at 12:50 PM, Thomas Gleixner <tglx@...utronix.de>
> > wrote:
> > > The current timer wheel has some drawbacks:
> > ...
> > 
> > It looks like this change (commit
> > 500462a9de657f86edaa102f8ab6bff7f7e43fc2 in linux.git) breaks one of
> > the automated test cases I'm using to test hostapd and wpa_supplicant
> > with mac80211_hwsim from the kernel. I'm not sure what exactly causes
> > this (did not really expect git bisect to point to timers..), but
> > this
> > seems to be very reproducible for me under kvm (though, this
> > apparently did not happen on another device, so I'm not completely
> > sure what it is needed to reproduce) with the ap_wps_er_http_proto
> > test cases failing to connect 20 TCP stream sockets to a server on
> > the
> > localhost. The client side is a python test script and the server is
> > hostapd. The failure shows up with about the 13th of those socket
> > connects failing while all others (both before and after this failed
> > one) going through.
> > 
> > Would you happen to have any idea why this commit has such a
> > difference in behavior? 
> 
> I have a vague hypothesis, more of a question actually.
> 
> How does the new timer wheel code handle lost timer ticks?
> 
> If a KVM guest does not run for a while, because the host
> is scheduling something else, the guest generally only gets
> one timer tick after the guest is scheduled back in.
> 
> If there are multiple lost ticks, they will remain lost.
> 
> Could that cause the new timer wheel code to skip over
> timer buckets occasionally, or is this hypothesis bunk?

FWIW, I do appear to be seeing more lost wakeups on current mainline
than on v4.7, but not enough of a difference to get a reliable bisction
in reasonable time.

							Thanx, Paul