[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20160812191411.GA12616@linux.vnet.ibm.com>
Date: Fri, 12 Aug 2016 12:14:11 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Rik van Riel <riel@...hat.com>
Cc: Jouni Malinen <jkmalinen@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Chris Mason <clm@...com>,
Arjan van de Ven <arjan@...radead.org>, rt@...utronix.de,
George Spelvin <linux@...encehorizons.net>,
Len Brown <lenb@...nel.org>,
Josh Triplett <josh@...htriplett.org>,
Eric Dumazet <edumazet@...gle.com>
Subject: Re: [patch 4 14/22] timer: Switch to a non cascading wheel
On Fri, Aug 12, 2016 at 01:50:16PM -0400, Rik van Riel wrote:
> On Thu, 2016-08-11 at 18:21 +0300, Jouni Malinen wrote:
> > On Mon, Jul 4, 2016 at 12:50 PM, Thomas Gleixner <tglx@...utronix.de>
> > wrote:
> > > The current timer wheel has some drawbacks:
> > ...
> >
> > It looks like this change (commit
> > 500462a9de657f86edaa102f8ab6bff7f7e43fc2 in linux.git) breaks one of
> > the automated test cases I'm using to test hostapd and wpa_supplicant
> > with mac80211_hwsim from the kernel. I'm not sure what exactly causes
> > this (did not really expect git bisect to point to timers..), but
> > this
> > seems to be very reproducible for me under kvm (though, this
> > apparently did not happen on another device, so I'm not completely
> > sure what it is needed to reproduce) with the ap_wps_er_http_proto
> > test cases failing to connect 20 TCP stream sockets to a server on
> > the
> > localhost. The client side is a python test script and the server is
> > hostapd. The failure shows up with about the 13th of those socket
> > connects failing while all others (both before and after this failed
> > one) going through.
> >
> > Would you happen to have any idea why this commit has such a
> > difference in behavior?
>
> I have a vague hypothesis, more of a question actually.
>
> How does the new timer wheel code handle lost timer ticks?
>
> If a KVM guest does not run for a while, because the host
> is scheduling something else, the guest generally only gets
> one timer tick after the guest is scheduled back in.
>
> If there are multiple lost ticks, they will remain lost.
>
> Could that cause the new timer wheel code to skip over
> timer buckets occasionally, or is this hypothesis bunk?
FWIW, I do appear to be seeing more lost wakeups on current mainline
than on v4.7, but not enough of a difference to get a reliable bisction
in reasonable time.
Thanx, Paul
Powered by blists - more mailing lists