[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20160616160215.GQ3923@linux.vnet.ibm.com>
Date: Thu, 16 Jun 2016 09:02:15 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Arjan van de Ven <arjanvandeven@...il.com>,
Eric Dumazet <edumazet@...gle.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Chris Mason <clm@...com>,
Arjan van de Ven <arjan@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
George Spelvin <linux@...encehorizons.net>
Subject: Re: [patch 13/20] timer: Switch to a non cascading wheel
On Thu, Jun 16, 2016 at 05:43:36PM +0200, Thomas Gleixner wrote:
> On Wed, 15 Jun 2016, Thomas Gleixner wrote:
> > On Wed, 15 Jun 2016, Arjan van de Ven wrote:
> > > what would 1 more timer wheel do?
> >
> > Waste storage space and make the collection of expired timers more expensive.
> >
> > The selection of the timer wheel properties is combination of:
> >
> > 1) Granularity
> >
> > 2) Storage space
> >
> > 3) Number of levels to collect
>
> So I came up with a slightly different solution for this. The problem case is
> HZ=1000 and again looking at the data, there is no reason why we need actual
> 1ms granularity for timer wheel timers. That's independent of the desired ms
> based interfaces.
>
> We can simply run the wheel internaly with 4ms base level resolution and
> degrade from there. That gives us 6 days+ and a simple cutoff at the capacity
> of the 7th level wheel.
>
> 0 0 4 ms 0 ms - 255 ms
> 1 64 32 ms 256 ms - 2047 ms (256ms - ~2s)
> 2 128 256 ms 2048 ms - 16383 ms (~2s - ~16s)
> 3 192 2048 ms (~2s) 16384 ms - 131071 ms (~16s - ~2m)
> 4 256 16384 ms (~16s) 131072 ms - 1048575 ms (~2m - ~17m)
> 5 320 131072 ms (~2m) 1048576 ms - 8388607 ms (~17m - ~2h)
> 6 384 1048576 ms (~17m) 8388608 ms - 67108863 ms (~2h - ~18h)
> 7 448 8388608 ms (~2h) 67108864 ms - 536870911 ms (~18h - ~6d)
>
> That works really nice and has the interesting side effect that we batch in
> the first level wheel which helps networking. I'll repost the series with the
> other review points addressed later tonight.
>
> Btw, I also thought a bit more about the milliseconds interfaces. I think we
> shouldn't invent new interfaces. The correct solution IMHO is to distangle the
> scheduler tick frequency and jiffies. If we have that completely seperated
> then we can do the following:
>
> 1) Force HZ=1000. That means jiffies and timer wheel units are 1ms. If the
> tick frequency is != 1000 we simply increment jiffies in the tick by the
> proper amount (4 @250 ticks/sec, 10 @100 ticks/sec).
>
> So all msec_to_jiffies() invocations compile out into nothing magically and
> we can remove them gradually over time.
Some of RCU's heuristics assume that if scheduling-clock ticks happen,
they happen once per jiffy. These would need to be adjusted, which would
not be a big deal, just a bit more use of HZ.
> 2) When we do that right, we can make the tick frequency a command line option
> and just have a compiled in default.
As long as there is something that tells RCU what the tick frequency
actually is at runtime, this should not be a problem. For example,
in rcu_implicit_dynticks_qs(), the following:
rdp->rsp->jiffies_resched += 5;
Would instead need to be something like:
rdp->rsp->jiffies_resched += 5 * jiffies_per_tick;
Changing tick frequency at runtime would be a bit more tricky, as it would
be tough to avoid some oddball false positives during the transition.
But setting it at boot time would be fine. ;-)
Thanx, Paul
> Thoughts?
>
> Thanks,
>
> tglx
>
Powered by blists - more mailing lists