linux-kernel - Re: [GIT pull] timer updates for 4.9

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFyDAWBNfyYM_BG2D5-g1skj=iCkhfW9ZNA_h2zZDdfy_g@mail.gmail.com>
Date:   Sun, 23 Oct 2016 16:20:22 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ingo Molnar <mingo@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [GIT pull] timer updates for 4.9

On Sun, Oct 23, 2016 at 3:39 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> Some locking problem in the timer handling?

So I found what looks like a bug in lock_timer_base() wrt migration.

This code:

        for (;;) {
                struct timer_base *base;
                u32 tf = timer->flags;

                if (!(tf & TIMER_MIGRATING)) {
                        base = get_timer_base(tf);
                        spin_lock_irqsave(&base->lock, *flags);
                        if (timer->flags == tf)
                                return base;
                        spin_unlock_irqrestore(&base->lock, *flags);
                }
                cpu_relax();
        }

looks subtly buggy. I think that load of "tf" needs a READ_ONCE() to
make sure that gcc doesn't simply reload the valid of "timer->flags"
at random points.

Yes, the spin_lock_irqsave() is a barrier, but that's the only one.
Afaik, gcc could decide that "I need to spill tf, so I'll just reload
it" after looking up get_timer_base().

And no, I don't think this is the cause of my problem, but I suspect
that something _like_ fragility in lock_timer_base() could cause this.
Looking at "timer->flags" *before* you hold the lock that actually
locks down the value sounds very very fragile.

Another example of the exact same problem is in __mod_timer() itself:

                base = get_timer_base(timer->flags);
    ...
                if (idx == timer_get_idx(timer)) {

and since "timer_get_idx()" depends on timer->flags, doing this all
without holding the base lock smells really really bad.

Again, _if_ that whole "let's check timer_pending without holding the
lock" is valid, I'd suggest doing

                u32 flags = READ_ONCE(timer->flags);
                base = get_timer_base(flags);
    ...
                if (idx == timer_flags_get_idx(flags)) {

or something similar, so that at least we know that even if the
"flags" value changes, we use the *same* flag values to look up the
base and to then checking the index of the wheel of that base.

I dunno. That whole thing looks very fragile to begin with: is it
really ok to change the expiry time of a timer without holding any
locks what-so-ever? The timer may just be firing on another CPU, and
you may be setting the expiry time on a timer that isn't ever going to
fire again.

Anyway, the timer locking (and lack of it) makes me nervous.

And there may well be valid reasons why I'm full of crap, and there's
some reason why this is all safe. Maybe the GP fault I saw was my
fault after all, in some way that I can't for the life of me figure
out right now..

                Linus