lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFxbtyciJwhgaLMK7XH8MyQ9Nm4=5Ke-QCo3WOaacegLXw@mail.gmail.com>
Date:	Wed, 11 Feb 2015 10:18:43 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Rafael David Tinoco <inaddy@...ntu.com>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Jens Axboe <axboe@...nel.dk>
Subject: Re: smp_call_function_single lockups

On Wed, Feb 11, 2015 at 5:19 AM, Rafael David Tinoco <inaddy@...ntu.com> wrote:
>
> - After applying patch provided by Thomas we were able to cause the
> lockup only after 6 days (also locked inside
> smp_call_function_single). Test performance (even for a nested kvm)
> was reduced substantially with 3.19 + this patch.

I think that just means that the patch from Thomas doesn't change
anything - the reason it takes longer to lock up is just that
performance reduction, so whatever race it is that causes the problem
was just harder to hit, but not fundamentally affected.

I think a more interesting thing to get is the traces from the other
CPU's when this happens. In a virtualized environment, that might be
easier to get than on real hardware, and if you are able to reproduce
this at will  - especially with something recent like 3.19, and could
get that, that would be really good.

I'll think about this all, but we couldn't figure anything out last
time we looked at it, so without more clues, don't hold your breath.

That said, it *would* be good if we could get rid of the synchronous
behavior entirely, and make it a rule that if somebody wants to wait
for it, they'll have to do their own waiting. Because I still think
that that CSD_FLAG_WAIT is pure and utter garbage. And I think that
Jens said that it is probably bogus to begin with.

I also don't even see where the CSD_FLAG_WAIT bit woudl ever be
cleared, so it all looks completely buggy anyway.

Does this (COMPLETELY UNTESTED!) attached patch change anything?

                      Linus

View attachment "patch.diff" of type "text/plain" (1453 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ