[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120821015841.GA12492@localhost>
Date:	Tue, 21 Aug 2012 09:58:41 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	John Stultz <john.stultz@...aro.org>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Richard Cochran <richardcochran@...il.com>,
	Prarit Bhargava <prarit@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-fsdevel@...r.kernel.org
Subject: Re: BUG: NULL pointer dereference in shmem_evict_inode()
On Mon, Aug 20, 2012 at 06:46:05PM -0700, John Stultz wrote:
> On 08/20/2012 06:31 PM, Fengguang Wu wrote:
> >On Mon, Aug 20, 2012 at 06:10:57PM -0700, John Stultz wrote:
> >>On 08/20/2012 06:04 PM, Fengguang Wu wrote:
> >>>Hi John,
> >>>
> >>>The below oops happens in v3.5..v3.6-rc2 and it's bisected down to commit
> >>>2a8c0883c ("time: Move xtime_nsec adjustment underflow handling timekeeping_adjust").
> >>>
> >>>However linux-next is working fine. Do you have any fixes not yet sent to Linus?
> >>Yea, there's a fix pending in tip/timers/urgent
> >>(4e8b14526ca7fb046a81c94002c1c43b6fdf0e9b) to catch crazy values
> >>from settimeofday or the cmos clock that might overflow a ktime_t.
> >That's great!
> >
> >>Out of curiosity, how are you triggering/reproducing this?
> >I boot test lots of randconfig kernels in kvm, and this oops shows up
> >several times in one ranconfig and some of the test boxes. I find it
> >pretty hard to reproduce, but managed to bisect it down by counting
> >1000 good boots as bisect success and running dozens of KVM instances
> >in parallel in several test boxes to speed up the progress. Here is one step:
> 
> Oof.  That's an really impressive setup!
Thank you :)
 
> That said, if this happens only at boot up, and you don't have
> systems with crazy cmos values, I'm not sure I see how commit
> 4e8b14526ca7fb046a81c94002c1c43b6fdf0e9b might fix this.  So that's
> not very reassuring.
Sorry if my words mislead you, but the bug happens after booting the
user space. Look at the following dmesg mixed with userspace logs.
I noticed this when doing the bisects: the [    5.310905] suddenly
jumped to [ 2204.090146] in very short wall time.
        [    5.303661] device: 'input2': device_add
        [    5.304677] PM: Adding info for No Bus:input2
        [    5.305666] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input2
        [    5.307546] device: 'mouse0': device_add
        [    5.308452] PM: Adding info for No Bus:mouse0
        [    5.309505] driver: 'serio1': driver_bound: bound to device 'psmouse'
        [    5.310905] bus: 'serio': really_probe: bound device serio1 to driver psmouse
        modprobe: FATAL: Could not load /lib/modules/3.6.0-rc1/modules.dep: No such file or directory
        modprobe: FATAL: Could not load /lib/modules/3.6.0-rc1/modules.dep: No such file or directory
        modprobe: FATAL: Could not load /lib/modules/3.6.0-rc1/modules.dep: No such file or directory
        modprobe: FATAL: Could not load /lib/modules/3.6.0-rc1/modules.dep: No such file or directory
        [ 2204.090146] plymouthd (52) used greatest stack depth: 6324 bytes left
        modprobe: FATAL: Could not load /lib/modules/3.6.0-rc1/modules.dep: No such file or directory
        modprobe: FATAL: Could not load /lib/modules/3.6.0-rc1/modules.dep: No such file or directory
         * Asking all remaining processes to terminate...
        modprobe: FATAL: Could not load /lib/modules/3.6.0-rc1/modules.dep: No such file or directory
        modprobe: FATAL: Could not load /lib/modules/3.6.0-rc1/modules.dep: No such file or directory
        modprobe: FATAL: Could not load /lib/modules/3.6.0-rc1/modules.dep: No such file or directory
        modprobe: FATAL: Could not load /lib/modules/3.6.0-rc1/modules.dep: No such file or directory
        modprobe: FATAL: Could not load /lib/modules/3.6.0-rc1/modules.dep: No such file or directory
         * Killing all remaining processes...
        mount: unknown filesystem type 'devpts'
        mountall: mount /dev/pts [1267] terminated with status 32
        mountall: Filesystem could not be mounted: /dev/pts
        mountall: Skipping mounting /dev/pts since Plymouth is not available
        udevd[1346]: error creating signalfd
        udevd[1360]: error creating signalfd
         * Deactivating swap...
        [ 2220.929173] ip (1388) used greatest stack depth: 6132 bytes left
        udevd[1381]: error creating signalfd
        udevd[1397]: error creating signalfd
        [ 2221.089504] VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds.  Have a nice day...
        [ 2221.091656] BUG: unable to handle kernel NULL pointer dereference at 0000000c                       
        [ 2221.093256] IP: [<810d2a2c>] shmem_free_inode+0x10/0x45
        [ 2221.093927] *pde = 00000000
> As a tangent, I think this sort of big-data style testing is a
> really great contribution, so thank you for setting up and doing all
> this work.
I'm glad you love it. Thanks!
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
