lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFyDAWBNfyYM_BG2D5-g1skj=iCkhfW9ZNA_h2zZDdfy_g@mail.gmail.com>
Date:   Sun, 23 Oct 2016 16:20:22 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Ingo Molnar <mingo@...nel.org>,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [GIT pull] timer updates for 4.9

On Sun, Oct 23, 2016 at 3:39 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> Some locking problem in the timer handling?

So I found what looks like a bug in lock_timer_base() wrt migration.

This code:

        for (;;) {
                struct timer_base *base;
                u32 tf = timer->flags;

                if (!(tf & TIMER_MIGRATING)) {
                        base = get_timer_base(tf);
                        spin_lock_irqsave(&base->lock, *flags);
                        if (timer->flags == tf)
                                return base;
                        spin_unlock_irqrestore(&base->lock, *flags);
                }
                cpu_relax();
        }

looks subtly buggy. I think that load of "tf" needs a READ_ONCE() to
make sure that gcc doesn't simply reload the valid of "timer->flags"
at random points.

Yes, the spin_lock_irqsave() is a barrier, but that's the only one.
Afaik, gcc could decide that "I need to spill tf, so I'll just reload
it" after looking up get_timer_base().

And no, I don't think this is the cause of my problem, but I suspect
that something _like_ fragility in lock_timer_base() could cause this.
Looking at "timer->flags" *before* you hold the lock that actually
locks down the value sounds very very fragile.

Another example of the exact same problem is in __mod_timer() itself:

                base = get_timer_base(timer->flags);
    ...
                if (idx == timer_get_idx(timer)) {

and since "timer_get_idx()" depends on timer->flags, doing this all
without holding the base lock smells really really bad.

Again, _if_ that whole "let's check timer_pending without holding the
lock" is valid, I'd suggest doing

                u32 flags = READ_ONCE(timer->flags);
                base = get_timer_base(flags);
    ...
                if (idx == timer_flags_get_idx(flags)) {


or something similar, so that at least we know that even if the
"flags" value changes, we use the *same* flag values to look up the
base and to then checking the index of the wheel of that base.

I dunno. That whole thing looks very fragile to begin with: is it
really ok to change the expiry time of a timer without holding any
locks what-so-ever? The timer may just be firing on another CPU, and
you may be setting the expiry time on a timer that isn't ever going to
fire again.

Anyway, the timer locking (and lack of it) makes me nervous.

And there may well be valid reasons why I'm full of crap, and there's
some reason why this is all safe. Maybe the GP fault I saw was my
fault after all, in some way that I can't for the life of me figure
out right now..

                Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ