lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 24 Nov 2013 10:39:20 +0100
From:	Francis Moreau <francis.moro@...il.com>
To:	Thomas Gleixner <tglx@...utronix.de>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>
CC:	Jingoo Han <jg1.han@...sung.com>, 'Borislav Petkov' <bp@...en8.de>,
	'Wei WANG' <wei_wang@...lsil.com.cn>,
	'LKML' <linux-kernel@...r.kernel.org>,
	'Samuel Ortiz' <sameo@...ux.intel.com>,
	'Chris Ball' <cjb@...top.org>
Subject: Re: 3.12: kernel panic when resuming from suspend to RAM (x86_64)

Hello Thomas

On 11/22/2013 11:27 PM, Thomas Gleixner wrote:
> On Fri, 22 Nov 2013, Rafael J. Wysocki wrote:
>> On Friday, November 22, 2013 10:36:23 PM Francis Moreau wrote:
>>> Ok, I've finally managed to find out the bad commit:
>>> ad07277e82dedabacc52c82746633680a3187d25: ACPI / PM: Hold acpi_scan_lock
>>> over system PM transitions
>>>
>>> I verified that the parent commit doesn't have the problem.
>>
>> Interesting.
>>
>>> Rafael, you're the man now ;)
>>
>> I kind of don't see how that commit may result in behavior that you
>> described earlier in the thread.
>>
>> You get a memory corruption that seems to have started to happen because
>> we're holding an additional lock over suspend resume now.  Something's fishy
>> on that machine and we need to figure out what it is.
> 
> The hickup happens in the timer softirq.
> 
> @Francis: Did you try to enable DEBUG_OBJECTS.*. If not please give it
> 	  a try.

This looks like it was a good idea.

The kernel now outputs the following traces after resuming.

[   26.973928] WARNING: CPU: 0 PID: 4 at lib/debugobjects.c:260
debug_print_object+0x83/0xa0()
[   26.973932] ODEBUG: free active (active state 0) object type:
timer_list hint: delayed_work_timer_fn+0x0/0x20
[   26.973972] Modules linked in: x86_pkg_temp_thermal intel_powerclamp
rtsx_pci_ms coretemp memstick kvm_intel i2c_i801 iTCO_wdt
iTCO_vendor_support i915 i2c_algo_bit intel_agp intel_gtt drm_kms_helper
r8169 drm kvm mii agpgart i2c_core lpc_ich ac shpchp crc32c_intel
battery thermal wmi evdev mei_me video mei button mperf processor
serio_raw microcode ext4 crc16 mbcache jbd2 sr_mod cdrom sd_mod
usb_storage rtsx_pci_sdmmc mmc_core ahci libahci libata ehci_pci
ehci_hcd xhci_hcd scsi_mod rtsx_pci usbcore usb_common
[   26.974013] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted
3.11.0-rc2-ARCH #64
[   26.974014] Hardware name: CLEVO CO.                        W55xEU
                       /W55xEU                          , BIOS 4.6.5
03/05/2013
[   26.974019] Workqueue: kacpi_hotplug hotplug_event_work
[   26.974020]  0000000000000009 ffff880407d0da18 ffffffff81459fe9
ffff880407d0da60
[   26.974023]  ffff880407d0da50 ffffffff8104dc7d ffff880407fad488
ffffffff81836fc0
[   26.974025]  ffffffff81701358 ffffffff81afef70 0000000000000003
ffff880407d0dab0
[   26.974027] Call Trace:
[   26.974031]  [<ffffffff81459fe9>] dump_stack+0x54/0x8d
[   26.974043]  [<ffffffff8104dc7d>] warn_slowpath_common+0x7d/0xa0
[   26.974044]  [<ffffffff8104dcec>] warn_slowpath_fmt+0x4c/0x50
[   26.974047]  [<ffffffff81261433>] debug_print_object+0x83/0xa0
[   26.974050]  [<ffffffff8106b820>] ? queue_work_on+0x50/0x50
[   26.974053]  [<ffffffff81261c2b>] __debug_check_no_obj_freed+0x1fb/0x240
[   26.974059]  [<ffffffffa008e959>] ? rtsx_pci_remove+0x119/0x1d0
[rtsx_pci]
[   26.974062]  [<ffffffff81262619>] debug_check_no_obj_freed+0x19/0x20
[   26.974065]  [<ffffffff8116f861>] kfree+0x191/0x210
[   26.974069]  [<ffffffff813819e0>] ? pcibios_disable_device+0x20/0x30
[   26.974072]  [<ffffffffa008e959>] ? rtsx_pci_remove+0x119/0x1d0
[rtsx_pci]
[   26.974075]  [<ffffffffa008e959>] rtsx_pci_remove+0x119/0x1d0 [rtsx_pci]
[   26.974079]  [<ffffffff8128004b>] pci_device_remove+0x3b/0xb0
[   26.974092]  [<ffffffff8132c92f>] __device_release_driver+0x7f/0xf0
[   26.974094]  [<ffffffff8132c9c3>] device_release_driver+0x23/0x30
[   26.974096]  [<ffffffff8132c194>] bus_remove_device+0xf4/0x170
[   26.974098]  [<ffffffff81328c55>] device_del+0x135/0x1d0
[   26.974108]  [<ffffffff8127ae24>] pci_stop_bus_device+0x94/0xa0
[   26.974110]  [<ffffffff8127af32>]
pci_stop_and_remove_bus_device+0x12/0x20
[   26.974113]  [<ffffffff81297466>] disable_slot+0x76/0xd0
[   26.974115]  [<ffffffff81297568>] acpiphp_check_bridge+0xa8/0xd0
[   26.974118]  [<ffffffff81297c8a>] hotplug_event+0xfa/0x210
[   26.974120]  [<ffffffff81297dc7>] hotplug_event_work+0x27/0x60
[   26.974123]  [<ffffffff8106c178>] process_one_work+0x178/0x470
[   26.974125]  [<ffffffff8106cb91>] worker_thread+0x121/0x3a0
[   26.974127]  [<ffffffff8106ca70>] ? manage_workers.isra.21+0x2b0/0x2b0
[   26.974130]  [<ffffffff81073a50>] kthread+0xc0/0xd0
[   26.974132]  [<ffffffff81073990>] ? kthread_create_on_node+0x120/0x120
[   26.974135]  [<ffffffff814688ec>] ret_from_fork+0x7c/0xb0
[   26.974137]  [<ffffffff81073990>] ? kthread_create_on_node+0x120/0x120
[   26.974139] ---[ end trace 0895c2e7925b5485 ]---

Also the kernel doesn't panic anymore.

I'm also attaching the dmesg when CONFIG_DEBUG_KOBJECT and
CONFIG_DEBUG_OBJECT* were activated.

Thanks.

Download attachment "dmesg-with-debug-objects.txt.gz" of type "application/gzip" (62316 bytes)

Powered by blists - more mailing lists