[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4523614.I5MBhorHFt@vostro.rjw.lan>
Date: Sun, 24 Nov 2013 22:06:45 +0100
From: "Rafael J. Wysocki" <rjw@...ysocki.net>
To: Francis Moreau <francis.moro@...il.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
Jingoo Han <jg1.han@...sung.com>,
'Borislav Petkov' <bp@...en8.de>,
'Wei WANG' <wei_wang@...lsil.com.cn>,
'LKML' <linux-kernel@...r.kernel.org>,
'Samuel Ortiz' <sameo@...ux.intel.com>,
'Chris Ball' <cjb@...top.org>
Subject: Re: 3.12: kernel panic when resuming from suspend to RAM (x86_64)
On Sunday, November 24, 2013 10:39:20 AM Francis Moreau wrote:
> Hello Thomas
>
> On 11/22/2013 11:27 PM, Thomas Gleixner wrote:
> > On Fri, 22 Nov 2013, Rafael J. Wysocki wrote:
> >> On Friday, November 22, 2013 10:36:23 PM Francis Moreau wrote:
> >>> Ok, I've finally managed to find out the bad commit:
> >>> ad07277e82dedabacc52c82746633680a3187d25: ACPI / PM: Hold acpi_scan_lock
> >>> over system PM transitions
> >>>
> >>> I verified that the parent commit doesn't have the problem.
> >>
> >> Interesting.
> >>
> >>> Rafael, you're the man now ;)
> >>
> >> I kind of don't see how that commit may result in behavior that you
> >> described earlier in the thread.
> >>
> >> You get a memory corruption that seems to have started to happen because
> >> we're holding an additional lock over suspend resume now. Something's fishy
> >> on that machine and we need to figure out what it is.
> >
> > The hickup happens in the timer softirq.
> >
> > @Francis: Did you try to enable DEBUG_OBJECTS.*. If not please give it
> > a try.
>
> This looks like it was a good idea.
>
> The kernel now outputs the following traces after resuming.
>
> [ 26.973928] WARNING: CPU: 0 PID: 4 at lib/debugobjects.c:260
> debug_print_object+0x83/0xa0()
> [ 26.973932] ODEBUG: free active (active state 0) object type:
> timer_list hint: delayed_work_timer_fn+0x0/0x20
> [ 26.973972] Modules linked in: x86_pkg_temp_thermal intel_powerclamp
> rtsx_pci_ms coretemp memstick kvm_intel i2c_i801 iTCO_wdt
> iTCO_vendor_support i915 i2c_algo_bit intel_agp intel_gtt drm_kms_helper
> r8169 drm kvm mii agpgart i2c_core lpc_ich ac shpchp crc32c_intel
> battery thermal wmi evdev mei_me video mei button mperf processor
> serio_raw microcode ext4 crc16 mbcache jbd2 sr_mod cdrom sd_mod
> usb_storage rtsx_pci_sdmmc mmc_core ahci libahci libata ehci_pci
> ehci_hcd xhci_hcd scsi_mod rtsx_pci usbcore usb_common
> [ 26.974013] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted
> 3.11.0-rc2-ARCH #64
> [ 26.974014] Hardware name: CLEVO CO. W55xEU
> /W55xEU , BIOS 4.6.5
> 03/05/2013
> [ 26.974019] Workqueue: kacpi_hotplug hotplug_event_work
> [ 26.974020] 0000000000000009 ffff880407d0da18 ffffffff81459fe9
> ffff880407d0da60
> [ 26.974023] ffff880407d0da50 ffffffff8104dc7d ffff880407fad488
> ffffffff81836fc0
> [ 26.974025] ffffffff81701358 ffffffff81afef70 0000000000000003
> ffff880407d0dab0
> [ 26.974027] Call Trace:
> [ 26.974031] [<ffffffff81459fe9>] dump_stack+0x54/0x8d
> [ 26.974043] [<ffffffff8104dc7d>] warn_slowpath_common+0x7d/0xa0
> [ 26.974044] [<ffffffff8104dcec>] warn_slowpath_fmt+0x4c/0x50
> [ 26.974047] [<ffffffff81261433>] debug_print_object+0x83/0xa0
> [ 26.974050] [<ffffffff8106b820>] ? queue_work_on+0x50/0x50
> [ 26.974053] [<ffffffff81261c2b>] __debug_check_no_obj_freed+0x1fb/0x240
> [ 26.974059] [<ffffffffa008e959>] ? rtsx_pci_remove+0x119/0x1d0
> [rtsx_pci]
So a device driven by rtsx_pcr.c is removed after resume. Without the commit
you've bisected it is removed as well, but that happens during resume, so
rtsx_pci_resume() is likely not called in that case.
I bet that there's a bug either in rtsx_pci_remove() or in rtsx_pci_resume().
The latter definitely should check if the device is actually still present
before scheduling the delayed work, but then the Boris' patch should take care
of that anyway.
Thanks!
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists