linux-kernel - Re: [GIT pull] timer updates for 4.9

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1610242048570.4815@nanos>
Date:   Mon, 24 Oct 2016 21:09:05 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
cc:     Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ingo Molnar <mingo@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [GIT pull] timer updates for 4.9

On Mon, 24 Oct 2016, Linus Torvalds wrote:
> On Mon, Oct 24, 2016 at 7:51 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> >
> > Can you please check in the disassembly whether gcc really reloads
> > timer->flags? Mine does not...
> 
> No, me neither. The code generation for lock_timer_base() looks
> reasonable, although not pretty (it needs one spill for the
> complexities in get_timer_cpu_base(), and the "*flags" games results
> in some unnecessary indirection too).
> 
> I will try your patch, but also stare at my code some more.
> 
> I'm starting to think that the problem could be due to the timer code
> being triggered _way_ too early (printk() ends up being obviously used
> long before most things end up using timers), and that the problem I
> see is just later fallout from that.
> 
> Sergey (added to participants) tried an earlier version of my patch,
> and had more debug options enabled, and got
> 
>   BUG: spinlock bad magic on CPU#0
> 
> from mod_timer() doing _raw_spin_unlock_irqrestore(), when the

Weird, that should have triggered in raw_spin_lock() already.

Can you bounce me the patch you are currently testing?

> printk() callchain happens very early in setup_arch ->
> setup_memory_map -> e820_print_map().
> 
> So I think the timer bugs I found were _potentially_ true bugs, but
> likely not the cause of this all.
> 
> init_timers() happens early, but we do printk's even earlier.

These are the things which are not initialized:

1) base->spinlock

   That's a non issue for !debug kernels as the lock initializer is 0
   (unlocked).

2) base->clk
 
   That makes the timer queued at some random array bucket.

3) base->cpu

   That's a non issue as base->cpu is 0 and at this point you are on CPU 0
   and the stupid NOHZ remote queueing is not yet possible.

The hlist_head is not touched by init_timers() as it's NULL initialized
already, so we do not scribble over an already queued timer.

So anything you queue _before_ init_timers() will just be queued to some
random bucket, but it does not explain the wreckage you are seing.

Thanks,

	tglx