lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 23 Feb 2015 11:32:50 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Rafael David Tinoco <inaddy@...ntu.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Jens Axboe <axboe@...nel.dk>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Gema Gomez <gema.gomez-solano@...onical.com>,
	Christopher Arges <chris.j.arges@...onical.com>
Subject: Re: smp_call_function_single lockups

On Mon, Feb 23, 2015 at 6:01 AM, Rafael David Tinoco <inaddy@...ntu.com> wrote:
>
> This is v3.19 + your patch (smp acquire/release)
> - (nested kvm with 2 vcpus on top of proliant with x2apic cluster mode
> and acpi_idle)

Hmm. There is absolutely nothing else going on on that machine, except
for the single call to smp_call_function_single() that is waiting for
the CSD to be released.

> * It looks like we got locked because of reentrant flush_tlb_* through
> smp_call_*
> but I'll leave it to you.

No, that is all a perfectly regular callchain:

 .. native_flush_tlb_others -> smp_call_function_many ->
smp_call_function_single

but the stack contains some stale addresses (one is probably just from
smp_call_function_single() calling into "generic_exec_single()", and
thus the stack contains the return address inside
smp_call_function_single() in _addition_ to the actual place where the
watchdog timer then interrupted it).

It all really looks very regular and sane, and looks like
smp_call_function_single() is happily just waiting for the IPI to
finish in the (inlined) csd_lock_wait().

I see nothing wrong at all.

However, here's a patch to the actual x86
smp_call_function_single_interrupt() handler, which probably doesn't
make any difference at all, but does:

 - gets rid of the incredibly crazy and ugly smp_entering_irq() that
inexplicably pairs with irq_exit()

 - makes the SMP IPI functions dot he APIC ACK cycle at the *end*
rather than at the beginning.

The exact placement of the ACK should make absolutely no difference,
since interrupts should be disabled for the whole period anyway. But
I'm desperate and am flailing around worrying about x2apic bugs and
whatnot in addition to whatever bugs we migth have in this area in sw.

So maybe there is some timing thing. This makes us consistently ack
the IPI _after_ we have cleared the list of smp call actions, and
executed the list, rather than ack it before, and then go on to muck
with the list.

Plus that old code really annoyed me anyway.

So I don't see why this should matter, but since I don't see how the
bug could happen in the first place, might as well try it..

                          Linus

View attachment "patch.diff" of type "text/plain" (2511 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ