lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 24 Nov 2013 10:42:16 +0100
From:	Francis Moreau <francis.moro@...il.com>
To:	"Rafael J. Wysocki" <rjw@...ysocki.net>
CC:	Jingoo Han <jg1.han@...sung.com>, 'Borislav Petkov' <bp@...en8.de>,
	'Wei WANG' <wei_wang@...lsil.com.cn>,
	'LKML' <linux-kernel@...r.kernel.org>,
	'Thomas Gleixner' <tglx@...utronix.de>,
	'Samuel Ortiz' <sameo@...ux.intel.com>,
	'Chris Ball' <cjb@...top.org>
Subject: Re: 3.12: kernel panic when resuming from suspend to RAM (x86_64)

Hello Rafael,

On 11/22/2013 11:08 PM, Rafael J. Wysocki wrote:
> On Friday, November 22, 2013 10:36:23 PM Francis Moreau wrote:
>> On 11/22/2013 01:54 PM, Rafael J. Wysocki wrote:
>>> On Friday, November 22, 2013 10:57:25 AM Francis Moreau wrote:
>>>> Le 22/11/2013 08:43, Francis Moreau a écrit :
>>>>> Le 21/11/2013 12:17, Jingoo Han a écrit :
>>>>> [...]
>>>>>>>
>>>>>>>> Also I took a look at the changes between v3.11 and v3.12 in this area
>>>>>>>> and those changes match the issue I'm facing:
>>>>>>>>
>>>>>>>> $ git log --oneline v3.11..v3.12 -- drivers/mfd/rtsx_pcr.c
>>>>>>>> 09fd867 mfd: rtsx: Copyright modifications
>>>>>>>> eb891c6 mfd: rtsx: Configure to enter a deeper power-saving mode in S3
>>>>>>>> 7140812 mfd: rtsx: Move some actions from rtsx_pci_init_hw to individual
>>>>>>>> extra_init_hw
>>>>>>>> 5947c16 mfd: rtsx: Add shutdown callback in rtsx_pci_driver
>>>>>>>> 773ccdf mfd: rtsx: Read vendor setting from config space
>>>>>>
>>>>>> In my opinion, rtsx_pci_resume()/rtsx_pci_suspend() in realtek PCIe card
>>>>>> reader driver may make the kernel panic.
>>>>>>
>>>>>> I think that the commit "mfd: rtsx: Configure to enter a deeper
>>>>>> power-saving mode in S3" may be the culprit.
>>>>>
>>>>> Unfortunately no, reverting this commit on top of v3.12 doesn't help. I
>>>>> also reverted 7140812, 5947c16 but it didn't improve anything.
>>>>>
>>>>> The good news is that I managed to have a "light" kernel configuration
>>>>> which is faster to build and more important it seems that the bug is
>>>>> almost 100% reproductible now.
>>>>>
>>>>> So I'll try to do another git-bisect session later.
>>>>
>>>> So after bisecting between v3.11..v3.12 range, git bisect told me:
>>>>
>>>> the first bad commit is 551f5c74e17ba9257cdc35bf657ee448cad2d5b0
>>>>
>>>> Merge branch 'acpi-processor'
>>>>
>>>>     * acpi-processor:
>>>>       ACPI / processor: Acquire writer lock to update CPU maps
>>>>       ACPI / processor: Remove acpi_processor_get_limit_info()
>>>>
>>>> The two commits brought by the merge are not the culprits because
>>>> reseting HEAD on "ACPI / processor: Acquire writer lock to update CPU
>>>> maps" doesn't have the issue anymore.
>>>>
>>>> At that point I'm not sure how to bisect futher.
>>>
>>> Does the second parent of this merge (that is, 8462d9df9d50) have the problem?
>>>
>>
>> Yes it does.
>>
>> Ok, I've finally managed to find out the bad commit:
>> ad07277e82dedabacc52c82746633680a3187d25: ACPI / PM: Hold acpi_scan_lock
>> over system PM transitions
>>
>> I verified that the parent commit doesn't have the problem.
> 
> Interesting.
> 
>> Rafael, you're the man now ;)
> 
> I kind of don't see how that commit may result in behavior that you
> described earlier in the thread.
> 
> You get a memory corruption that seems to have started to happen because
> we're holding an additional lock over suspend resume now.  Something's fishy
> on that machine and we need to figure out what it is.
> 
> Please file a bug at bugzilla.kernel.org against ACPI and assign it to me.
> Please put all of the relevant info in there and attach the output of dmesg
> after a fresh boot and the output of acpidump from the affected machine to
> the bug entry.
> 

I just sent a new trace with DEBUG_OBJECTS enabled which seems to give
some interesting traces.

If nothing can be found from them, I'll do the bug report.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists