linux-kernel - Re: 3.12: kernel panic when resuming from suspend to RAM (x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <52A61B0C.6030305@gmail.com>
Date:	Mon, 09 Dec 2013 20:33:32 +0100
From:	Francis Moreau <francis.moro@...il.com>
To:	Thomas Gleixner <tglx@...utronix.de>
CC:	Jingoo Han <jg1.han@...sung.com>,
	'Wei WANG' <wei_wang@...lsil.com.cn>,
	'Samuel Ortiz' <sameo@...ux.intel.com>,
	'Chris Ball' <cjb@...top.org>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	'Borislav Petkov' <bp@...en8.de>,
	'LKML' <linux-kernel@...r.kernel.org>
Subject: Re: 3.12: kernel panic when resuming from suspend to RAM (x86_64)

On 12/03/2013 09:14 AM, Francis Moreau wrote:
> Hello Thomas,
> 
> On 12/02/2013 12:20 PM, Thomas Gleixner wrote:
>> On Mon, 2 Dec 2013, Thomas Gleixner wrote:
>>> On Sat, 30 Nov 2013, Francis Moreau wrote:
>>>> Hello Thomas,
>>>>
>>>> Sorry for the delay.
>>>>
>>>> On 11/29/2013 10:02 AM, Thomas Gleixner wrote:
>>>>> On Fri, 29 Nov 2013, Francis Moreau wrote:
>>>>>> Since it seems to be related to rtsx driver or its upper layer, could
>>>>>> the folks involved in this area have a look to this issue please ?
>>>>>
>>>>> I'm not involved, but looking at the debug objects backtrace it's
>>>>> related to the delayed work in rtsx.
>>>>>
>>>>> Does the untested patch below cure the issue?
>>>>>
>>>>
>>>> It seems it does since I can't see the debug object trace anymore
>>>> however Ican see this now:
>>>
>>> <SNIP>
>>>  
>>>> So I don't think it completely solve the problem but it's a good start.
>>>
>>> I kinda expected that, but I wanted to confirm my suspicion, that the
>>> interrupt hits after the delayed work is canceled and just requeues it
>>> again, which then leads to an armed timer being freed further down.
>>>
>>> I'm not familiar with that driver and I leave the final fixup to the
>>> driver maintainers. It's enough data for them to figure out the real
>>> solution.
>>
>> Just had a quick look and the obvious solution is to disable the
>> interrupts at the device level _BEFORE_ doing anything else in the
>> teardown path. Updated patch below. That should avoid the nobody cared
>> splat on the other irq line.
>>
> 
> Yes it does.
> 
> Now that you did the hard work, I hope driver's maintainer/developper
> will care about this issue.
> 

Unfortunately he/she doesn't seem to care.

Moreover I've been by this now:

[  241.003324] INFO: task kworker/u16:4:108 blocked for more than 120
seconds.
[  241.003331]       Not tainted 3.12.2-1-ARCH #1
[  241.003332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  241.003335] kworker/u16:4   D ffff880405bc8000     0   108      2
0x00000000
[  241.003355] Workqueue: kmemstick memstick_check [memstick]
[  241.003358]  ffff880405bc3c90 0000000000000046 00000000000144c0
ffff880405bc3fd8
[  241.003362]  ffff880405bc3fd8 00000000000144c0 ffff880405bc8000
ffff880405bc3c68
[  241.003366]  ffffffff814ef57c ffff880405bc3fd8 0000000000000286
0000000000000000
[  241.003370] Call Trace:
[  241.003380]  [<ffffffff814ef57c>] ? schedule_timeout+0x13c/0x290
[  241.003385]  [<ffffffff8106f590>] ? detach_if_pending+0x120/0x120
[  241.003388]  [<ffffffff8106f590>] ? detach_if_pending+0x120/0x120
[  241.003392]  [<ffffffff814f2e79>] schedule+0x29/0x70
[  241.003396]  [<ffffffff814ef659>] schedule_timeout+0x219/0x290
[  241.003401]  [<ffffffff8129a4d1>] ? vsnprintf+0x1e1/0x680
[  241.003405]  [<ffffffff814f2213>] wait_for_common+0xd3/0x180
[  241.003411]  [<ffffffff81095100>] ? wake_up_process+0x40/0x40
[  241.003414]  [<ffffffff814f22dd>] wait_for_completion+0x1d/0x20
[  241.003419]  [<ffffffffa061334a>] memstick_set_rw_addr+0x4a/0x50
[memstick]
[  241.003424]  [<ffffffffa061388e>] memstick_check+0x10e/0x370 [memstick]
[  241.003429]  [<ffffffff8107daf7>] process_one_work+0x167/0x450
[  241.003432]  [<ffffffff8107e501>] worker_thread+0x121/0x3a0
[  241.003436]  [<ffffffff8107e3e0>] ? manage_workers.isra.23+0x2b0/0x2b0
[  241.003441]  [<ffffffff81084e90>] kthread+0xc0/0xd0
[  241.003446]  [<ffffffff81084dd0>] ? kthread_create_on_node+0x120/0x120
[  241.003450]  [<ffffffff814fc33c>] ret_from_fork+0x7c/0xb0
[  241.003454]  [<ffffffff81084dd0>] ? kthread_create_on_node+0x120/0x120

looks like a different issue.

I already black listed this driver, maybe it's time to mark it as broken ?

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/