linux-kernel - Re: [patch 13/20] timer: Switch to a non cascading wheel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.11.1606141045191.5839@nanos>
Date:	Tue, 14 Jun 2016 10:50:03 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	George Spelvin <linux@...encehorizons.net>
cc:	edumazet@...gle.com, linux-kernel@...r.kernel.org,
	peterz@...radead.org, richardcochran@...il.com
Subject: Re: [patch 13/20] timer: Switch to a non cascading wheel

On Tue, 14 Jun 2016, George Spelvin wrote:
> I think I see a buglet in your level-5 cascading.
> 
> Suppose a timer is requested far in the future for a time
> that is an exact multiple of 32768 jiffies.
> 
> collect_expired_timers() scans level 5 after all the previous ones,
> and will cascade it to level 0, in a level-0 bucket which has already
> been scanned, and won't be scanned again for 64 jiffies.
> 
> I agree that 64 jiffies is well within your allowed rounding accuracy,
> and order of timer firing is not guaranteed when they're for the same
> time, but it is a bit odd when a timer fires 32 jiffies *before* another
> timer scheduled for 32 jiffies later.  That's the sort of peculiarity
> that could lead to a subtle bug.

I thought about that and when looking at those long timeout thingies I came to
the conclusion that it's simply not worth the trouble.
 
> Wouldn't this all be so much simpler as
> 
> #define LVL_BITS	6	/* Renamed previous LVL_SHIFT */
> #define LVL_SIZE	(1 << LVL_BITS)
> #define LVL_MASK	(LVL_BITS - 1)
> #define LVL_OFFS(n)	((n) * LVL_SIZE)
> #define LVL_SHIFT(n)	((n) * LVL_CLK_SHIFT)
> #define LVL_GRAN(n)	(1 << LVL_SHIFT(n))

Indeed.
 
> Ideally, you'd like all of that
> 
> +	if (delta < LVL1_TSTART) {
> +		idx = (expires + LVL0_GRAN) & LVL_MASK;
> +	} else if (delta < LVL2_TSTART) {
> +		idx = calc_index(expires, LVL1_GRAN, LVL1_SHIFT, LVL1_OFFS);
> +	} else if (delta < LVL3_TSTART) {
> +		idx = calc_index(expires, LVL2_GRAN, LVL2_SHIFT, LVL2_OFFS);
> +	} else if (delta < LVL4_TSTART) {
> +		idx = calc_index(expires, LVL3_GRAN, LVL3_SHIFT, LVL3_OFFS);
> +	} else if (delta < LVL5_TSTART) {
> +		idx = calc_index(expires, LVL4_GRAN, LVL4_SHIFT, LVL4_OFFS);
> 
> to be replaced with __builtin_clz or similar:

Except that __fls() is noticeably slower than the if chain.

> > +static inline void detach_expired_timer(struct timer_list *timer)
> >  {
> >  	detach_timer(timer, true);
> > -	if (!(timer->flags & TIMER_DEFERRABLE))
> > -		base->active_timers--;
> > -	base->all_timers--;
> >  }
> 
> Is there even a reason to have this wrapper any more?  Why not
> just replace all calls to it in the source?

That just happened to stay there for no particular reason.

> > +		timer = hlist_entry(head->first, struct timer_list, entry);
> > +		fn = timer->function;
> > +		data = timer->data;
> > +
> > +		timer_stats_account_timer(timer);
> > +
> > +		base->running_timer = timer;
> > +		detach_expired_timer(timer);
> 
> Is there some non-obvious reason that you have to fetch fn and data
> so early?  It seems like a register pressure pessimization, if the
> compiler can't figure out that timer_stats code can't change them.
> 
> The cache line containing this timer was already prefetched when you
> updated its entry.pprev as part of removing the previous entry from
> the list.
> 
> I see why you want to fetch them with the lock held in case there's some
> freaky race, but I'd do it all after detach_timer().

That's not new code. We kept the ordering, but yes, we definitely can turn
that around. The only restriction is that we get it before releasing the lock.

Thanks,

	tglx