lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141120164228.GG2542@lerouge>
Date:	Thu, 20 Nov 2014 17:42:30 +0100
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Dave Jones <davej@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	the arch/x86 maintainers <x86@...nel.org>
Subject: Re: frequent lockups in 3.18rc4

On Thu, Nov 20, 2014 at 11:19:25AM -0500, Dave Jones wrote:
> On Thu, Nov 20, 2014 at 04:08:00PM +0100, Frederic Weisbecker wrote:
>  
>  > > Great start to the week: I decided to confirm my recollection that .17
>  > > was ok, only to hit this within 10 minutes.
>  > > 
>  > > Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 3
>  > > CPU: 3 PID: 17176 Comm: trinity-c95 Not tainted 3.17.0+ #87
>  > >  0000000000000000 00000000f3a61725 ffff880244606bf0 ffffffff9583e9fa
>  > >  ffffffff95c67918 ffff880244606c78 ffffffff9583bcc0 0000000000000010
>  > >  ffff880244606c88 ffff880244606c20 00000000f3a61725 0000000000000000
>  > > Call Trace:
>  > >  <NMI>  [<ffffffff9583e9fa>] dump_stack+0x4e/0x7a
>  > >  [<ffffffff9583bcc0>] panic+0xd4/0x207
>  > >  [<ffffffff95150908>] watchdog_overflow_callback+0x118/0x120
>  > >  [<ffffffff95193dbe>] __perf_event_overflow+0xae/0x340
>  > >  [<ffffffff95192230>] ? perf_event_task_disable+0xa0/0xa0
>  > >  [<ffffffff9501a7bf>] ? x86_perf_event_set_period+0xbf/0x150
>  > >  [<ffffffff95194be4>] perf_event_overflow+0x14/0x20
>  > >  [<ffffffff95020676>] intel_pmu_handle_irq+0x206/0x410
>  > >  [<ffffffff9501966b>] perf_event_nmi_handler+0x2b/0x50
>  > >  [<ffffffff95007bb2>] nmi_handle+0xd2/0x390
>  > >  [<ffffffff95007ae5>] ? nmi_handle+0x5/0x390
>  > >  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
>  > >  [<ffffffff950080a2>] default_do_nmi+0x72/0x1c0
>  > >  [<ffffffff950082a8>] do_nmi+0xb8/0x100
>  > >  [<ffffffff9584b9aa>] end_repeat_nmi+0x1e/0x2e
>  > >  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
>  > >  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
>  > >  [<ffffffff958489b0>] ? _raw_spin_lock_irqsave+0x80/0x90
>  > >  <<EOE>>  <IRQ>  [<ffffffff95101685>] lock_hrtimer_base.isra.18+0x25/0x50
>  > >  [<ffffffff951019d3>] hrtimer_try_to_cancel+0x33/0x1f0
>  > 
>  > Ah that one got fixed in the merge window and in -stable, right?
>  
> If that's true, that changes everything, and this might be more
> bisectable.  I did the test above on 3.17, but perhaps I should
> try a run on 3.17.3

It might not be easier to bisect because stable is a seperate branch than the next -rc1.
And that above got fixed in -rc1, perhaps in the same merge window where the new different
issues were introduced. So you'll probably need to shutdown the above issue in order to
bisect the others.

What you can do is to bisect and then before every build apply the patches that
fix the above issue in -stable, those that I just enumerated to gregkh in our
discussion with him. There are only 4. Just try to apply all of them before each
build, unless they are already.

I could give you a much simpler hack but I fear it may chaoticly apply depending if
the real fixes are applied, halfway or not at all, all that with unpredictable results.
So lets rather stick to what we know to work.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ