linux-kernel - Re: smp_call_function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFyGkOCVGD3Ds7Wt1z9Dw7cmk_yXw7YwruACQh5QAXOQvQ@mail.gmail.com>
Date:	Wed, 11 Feb 2015 11:59:06 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Rafael David Tinoco <inaddy@...ntu.com>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Jens Axboe <axboe@...nel.dk>
Subject: Re: smp_call_function_single lockups

On Wed, Feb 11, 2015 at 10:18 AM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> I'll think about this all, but we couldn't figure anything out last
> time we looked at it, so without more clues, don't hold your breath.

So having looked at it once more, one thing struck me:

Look at smp_call_function_single_async(). The comment says

 * Like smp_call_function_single(), but the call is asynchonous and
 * can thus be done from contexts with disabled interrupts.

but that is *only* true if we don't have to wait for the csd lock. The
comments even clarify that:

 * The caller passes his own pre-allocated data structure
 * (ie: embedded in an object) and is responsible for synchronizing it
 * such that the IPIs performed on the @csd are strictly serialized.

but it's not at all clear that the caller *can* do that. Since the
"csd_unlock()" is done *after* the call to the callback function, any
serialization done by the caller is fundamentally not trustworthy,
since it cannot serialize with the csd lock - if it releases things in
the callback, the csd lock will still be set after releasing things.

So the caller has a really hard time guaranteeing that CSD_LOCK isn't
set. And if the call is done in interrupt context, for all we know it
is interrupting the code that is going to clear CSD_LOCK, so CSD_LOCK
will never be cleared at all, and csd_lock() will wait forever.

So I actually think that for the async case, we really *should* unlock
before doing the callback (which is what Thomas' old patch did).

And we migth well be better off doing something like

        WARN_ON_ONCE(csd->flags & CSD_LOCK);

in smp_call_function_single_async(), because that really is a hard requirement.

And it strikes me that hrtick_csd is one of these cases that do this
with interrupts disabled, and use the callback for serialization. So I
really wonder if this is part of the problem..

Thomas? Am I missing something?

                               Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/