[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFyGkOCVGD3Ds7Wt1z9Dw7cmk_yXw7YwruACQh5QAXOQvQ@mail.gmail.com>
Date: Wed, 11 Feb 2015 11:59:06 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Rafael David Tinoco <inaddy@...ntu.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Jens Axboe <axboe@...nel.dk>
Subject: Re: smp_call_function_single lockups
On Wed, Feb 11, 2015 at 10:18 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> I'll think about this all, but we couldn't figure anything out last
> time we looked at it, so without more clues, don't hold your breath.
So having looked at it once more, one thing struck me:
Look at smp_call_function_single_async(). The comment says
* Like smp_call_function_single(), but the call is asynchonous and
* can thus be done from contexts with disabled interrupts.
but that is *only* true if we don't have to wait for the csd lock. The
comments even clarify that:
* The caller passes his own pre-allocated data structure
* (ie: embedded in an object) and is responsible for synchronizing it
* such that the IPIs performed on the @csd are strictly serialized.
but it's not at all clear that the caller *can* do that. Since the
"csd_unlock()" is done *after* the call to the callback function, any
serialization done by the caller is fundamentally not trustworthy,
since it cannot serialize with the csd lock - if it releases things in
the callback, the csd lock will still be set after releasing things.
So the caller has a really hard time guaranteeing that CSD_LOCK isn't
set. And if the call is done in interrupt context, for all we know it
is interrupting the code that is going to clear CSD_LOCK, so CSD_LOCK
will never be cleared at all, and csd_lock() will wait forever.
So I actually think that for the async case, we really *should* unlock
before doing the callback (which is what Thomas' old patch did).
And we migth well be better off doing something like
WARN_ON_ONCE(csd->flags & CSD_LOCK);
in smp_call_function_single_async(), because that really is a hard requirement.
And it strikes me that hrtick_csd is one of these cases that do this
with interrupts disabled, and use the callback for serialization. So I
really wonder if this is part of the problem..
Thomas? Am I missing something?
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists