linux-kernel - Re: [BUG 3.12.rc4] Oops: unable to handle kernel paging request during shutdown

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131014212830.GD5790@linux.vnet.ibm.com>
Date:	Mon, 14 Oct 2013 14:28:30 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Knut Petersen <Knut_Petersen@...nline.de>,
	Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Greg KH <greg@...ah.com>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [BUG 3.12.rc4] Oops: unable to handle kernel paging request
 during shutdown

On Mon, Oct 14, 2013 at 10:53:03AM -0700, Linus Torvalds wrote:
> Hmm. No obvious ideas come to mind, but I'm adding more people to the cc.
> 
> Clearly the wait_event_interruptible_timeout() in the RCU grace-period
> thread causes this, but I'm not seeing why shutdown would trigger it.
> 
> The code disassembles to
> 
>    0: 85 db                 test   %ebx,%ebx
>    2: 79 0c                 jns    0x10
>    4: 81 e6 ff 00 00 00     and    $0xff,%esi
>    a: 8d 44 f0 30           lea    0x30(%eax,%esi,8),%eax
>    e: eb 0a                 jmp    0x1a
>   10: c1 e9 1a             shr    $0x1a,%ecx
>   13: 8d 84 c8 30 0e 00 00 lea    0xe30(%eax,%ecx,8),%eax
>   1a: 8b 48 04             mov    0x4(%eax),%ecx
>   1d: 89 50 04             mov    %edx,0x4(%eax)
>   20: 89 02                 mov    %eax,(%edx)
>   22: 89 4a 04             mov    %ecx,0x4(%edx)
>   25:* 89 11                 mov    %edx,(%ecx) <-- trapping instruction
>   27: 5b                   pop    %ebx
>   28: 5e                   pop    %esi
>   29: 5d                   pop    %ebp
>   2a: c3                   ret
> 
> so the oops is in the final
> 
>         list_add_tail(&timer->entry, vec);
> 
> where "%ecx" is "vec->prev" (f8c551f4). That looks like it might be a
> perfectly valid pointer, but clearly it isn't (it's about 115M off the
> top of virtual memory, I think that might be in the vmalloc area).
> 
> So I'm *guessing* that something did a vfree() on some data structure
> that contained active timers - and then later on the RCU thread ended
> up being the next thing that tried to add a timer after the
> now-non-existing one.
> 
> And your other oopses do seem to have a similar pattern, even if their
> actual oops is elsewhere. They oops in run_timer_softirq, also taking
> a page fault in the 0xf9...... range, so it might well be a vmalloc
> address there too.
> 
> But I sure as hell can't start to guess what that would be.
> 
> I'm wondering it CONFIG_DEBUG_OBJECTS (and then
> CONFIG_DEBUG_OBJECTS_FREE=y and CONFIG_DEBUG_OBJECTS_TIMERS=y) might
> help catch this...

I would also like to nominate CONFIG_DEBUG_OBJECTS_RCU_HEAD=y, which
checks for invoking call_rcu() twice in a row on the same rcu_head.

Any chance of a look at the .config file?

							Thanx, Paul

>                 Linus
> 
> On Mon, Oct 14, 2013 at 4:07 AM, Knut Petersen
> <Knut_Petersen@...nline.de> wrote:
> >
> > It愀 the third time in four months that I have to report a kernel Oops during
> > shutdown.
> > All of these Oopses seem somehow related to the timer subsystem, but they
> > are
> > not easily reproducible. As all this happens on two different machines, it愀
> > unlikely
> > that this mess is related to bad hardware.
> >
> > I clearly would appreciate any idea how to track this down.
> >
> > For the last two reports see:
> >
> > http://www.gossamer-threads.com/lists/linux/kernel/1782575?#1782575
> >
> > http://www.gossamer-threads.com/lists/linux/kernel/1744892?#1744892
> >
> > This time the kernel oopsed after systemd reported that target shutdown
> > had been reached - see attached pdf for the full trace. To make it easier
> > to find this problem a shortened call trace:
> >
> >
> > Call Trace:
> >     internal_add_timer
> >     schedule_timeout
> >     ? call_timer_fn
> >     rcu_gp_kthread
> >     __init_waitqueue_head
> >     ? rcu_gp_fqs
> >     kthread
> >     ret_from_kernel_thread
> >     ? __init_kthread_worker
> >
> > EIP: __internal_add_timer
> >
> > Hardware: AOpen i915GMm-hfs mobo with a Pentium-M Dothan and 2GB of RAM.
> > Distribution: openSuSE 12.3
> > Kernel: local 3.12.0-rc4-00127-g45877c4 is  kernel 9d05746 with my
> > "Enforce 1 as lower limit for perf_event_max_sample_rate"
> > patch applied.
> >
> > cu,
> >  knut
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/