[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1610242048570.4815@nanos>
Date: Mon, 24 Oct 2016 21:09:05 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Linus Torvalds <torvalds@...ux-foundation.org>
cc: Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [GIT pull] timer updates for 4.9
On Mon, 24 Oct 2016, Linus Torvalds wrote:
> On Mon, Oct 24, 2016 at 7:51 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> >
> > Can you please check in the disassembly whether gcc really reloads
> > timer->flags? Mine does not...
>
> No, me neither. The code generation for lock_timer_base() looks
> reasonable, although not pretty (it needs one spill for the
> complexities in get_timer_cpu_base(), and the "*flags" games results
> in some unnecessary indirection too).
>
> I will try your patch, but also stare at my code some more.
>
> I'm starting to think that the problem could be due to the timer code
> being triggered _way_ too early (printk() ends up being obviously used
> long before most things end up using timers), and that the problem I
> see is just later fallout from that.
>
> Sergey (added to participants) tried an earlier version of my patch,
> and had more debug options enabled, and got
>
> BUG: spinlock bad magic on CPU#0
>
> from mod_timer() doing _raw_spin_unlock_irqrestore(), when the
Weird, that should have triggered in raw_spin_lock() already.
Can you bounce me the patch you are currently testing?
> printk() callchain happens very early in setup_arch ->
> setup_memory_map -> e820_print_map().
>
> So I think the timer bugs I found were _potentially_ true bugs, but
> likely not the cause of this all.
>
> init_timers() happens early, but we do printk's even earlier.
These are the things which are not initialized:
1) base->spinlock
That's a non issue for !debug kernels as the lock initializer is 0
(unlocked).
2) base->clk
That makes the timer queued at some random array bucket.
3) base->cpu
That's a non issue as base->cpu is 0 and at this point you are on CPU 0
and the stupid NOHZ remote queueing is not yet possible.
The hlist_head is not touched by init_timers() as it's NULL initialized
already, so we do not scribble over an already queued timer.
So anything you queue _before_ init_timers() will just be queued to some
random bucket, but it does not explain the wreckage you are seing.
Thanks,
tglx
Powered by blists - more mailing lists