lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 16 Jun 2016 09:02:15 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Arjan van de Ven <arjanvandeven@...il.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Chris Mason <clm@...com>,
	Arjan van de Ven <arjan@...radead.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	George Spelvin <linux@...encehorizons.net>
Subject: Re: [patch 13/20] timer: Switch to a non cascading wheel

On Thu, Jun 16, 2016 at 05:43:36PM +0200, Thomas Gleixner wrote:
> On Wed, 15 Jun 2016, Thomas Gleixner wrote:
> > On Wed, 15 Jun 2016, Arjan van de Ven wrote:
> > > what would 1 more timer wheel do?
> > 
> > Waste storage space and make the collection of expired timers more expensive.
> > 
> > The selection of the timer wheel properties is combination of:
> > 
> >     1) Granularity 
> > 
> >     2) Storage space
> > 
> >     3) Number of levels to collect
> 
> So I came up with a slightly different solution for this. The problem case is
> HZ=1000 and again looking at the data, there is no reason why we need actual
> 1ms granularity for timer wheel timers. That's independent of the desired ms
> based interfaces.
> 
> We can simply run the wheel internaly with 4ms base level resolution and
> degrade from there. That gives us 6 days+ and a simple cutoff at the capacity
> of the 7th level wheel.
> 
>  0     0        4 ms               0 ms -        255 ms		    
>  1    64       32 ms             256 ms -       2047 ms (256ms - ~2s)
>  2   128      256 ms            2048 ms -      16383 ms (~2s - ~16s) 
>  3   192     2048 ms (~2s)     16384 ms -     131071 ms (~16s - ~2m)
>  4   256    16384 ms (~16s)   131072 ms -    1048575 ms (~2m - ~17m)
>  5   320   131072 ms (~2m)   1048576 ms -    8388607 ms (~17m - ~2h)
>  6   384  1048576 ms (~17m)  8388608 ms -   67108863 ms (~2h - ~18h)
>  7   448  8388608 ms (~2h)  67108864 ms -  536870911 ms (~18h - ~6d)
> 
> That works really nice and has the interesting side effect that we batch in
> the first level wheel which helps networking. I'll repost the series with the
> other review points addressed later tonight.
> 
> Btw, I also thought a bit more about the milliseconds interfaces. I think we
> shouldn't invent new interfaces. The correct solution IMHO is to distangle the
> scheduler tick frequency and jiffies. If we have that completely seperated
> then we can do the following:
> 
> 1) Force HZ=1000. That means jiffies and timer wheel units are 1ms. If the
>    tick frequency is != 1000 we simply increment jiffies in the tick by the
>    proper amount (4 @250 ticks/sec, 10 @100 ticks/sec).
> 
>    So all msec_to_jiffies() invocations compile out into nothing magically and
>    we can remove them gradually over time.

Some of RCU's heuristics assume that if scheduling-clock ticks happen,
they happen once per jiffy.  These would need to be adjusted, which would
not be a big deal, just a bit more use of HZ.

> 2) When we do that right, we can make the tick frequency a command line option
>    and just have a compiled in default.

As long as there is something that tells RCU what the tick frequency
actually is at runtime, this should not be a problem.  For example,
in rcu_implicit_dynticks_qs(), the following:

	rdp->rsp->jiffies_resched += 5;

Would instead need to be something like:

	rdp->rsp->jiffies_resched += 5 * jiffies_per_tick;

Changing tick frequency at runtime would be a bit more tricky, as it would
be tough to avoid some oddball false positives during the transition.
But setting it at boot time would be fine.  ;-)

							Thanx, Paul

> Thoughts?
> 
> Thanks,
> 
> 	tglx
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ