lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMiJ5CXO1r+rYSnUBSAgj3aZCdevWHUUQjwWNqSFYcxo-R58GQ@mail.gmail.com>
Date:	Thu, 12 Feb 2015 14:38:58 -0200
From:	Rafael David Tinoco <inaddy@...ntu.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Jens Axboe <axboe@...nel.dk>,
	Frederic Weisbecker <fweisbec@...il.com>,
	chris.j.arges@...onical.com, gema.gomez-solano@...onical.com
Subject: Re: smp_call_function_single lockups

Meanwhile we'll take the opportunity to run same tests with the
"smp_load_acquire/smp_store_release + outside sync/async" approach
made by your latest  patch on top of 3.19. If anything comes up I'll
provide full back traces (2 vcpus).

Here I can only reproduce this inside nested kvm on top of Proliant
DL360 Gen8 machines with:

- no opt out from x2apic (gen8 firmware asks for opting out but HP
says x2apic should be used for >= gen8)
- no intel_idle since proliant firmware is causing NMIs during MWAIT
instructions

As observed before, reducing performance made the problem to be
triggered only after some days so, if nothing goes wrong with
performance this time, I expect to have results in between 10 to 30
hours.

Thank you

Tinoco

On Wed, Feb 11, 2015 at 6:42 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> [ Added Frederic to the cc, since he's touched this file/area most ]
>
> On Wed, Feb 11, 2015 at 11:59 AM, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > So the caller has a really hard time guaranteeing that CSD_LOCK isn't
> > set. And if the call is done in interrupt context, for all we know it
> > is interrupting the code that is going to clear CSD_LOCK, so CSD_LOCK
> > will never be cleared at all, and csd_lock() will wait forever.
> >
> > So I actually think that for the async case, we really *should* unlock
> > before doing the callback (which is what Thomas' old patch did).
> >
> > And we migth well be better off doing something like
> >
> >         WARN_ON_ONCE(csd->flags & CSD_LOCK);
> >
> > in smp_call_function_single_async(), because that really is a hard requirement.
> >
> > And it strikes me that hrtick_csd is one of these cases that do this
> > with interrupts disabled, and use the callback for serialization. So I
> > really wonder if this is part of the problem..
> >
> > Thomas? Am I missing something?
>
> Ok, this is a more involved patch than I'd like, but making the
> *caller* do all the CSD maintenance actually cleans things up.
>
> And this is still completely untested, and may be entirely buggy. What
> do you guys think?
>
>                              Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ