linux-kernel - Re: [patch 00/14] x86/irq: Plug various vector cleanup races

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <569CFE21.9010104@stratus.com>
Date:	Mon, 18 Jan 2016 10:00:49 -0500
From:	Joe Lawrence <joe.lawrence@...atus.com>
To:	Thomas Gleixner <tglx@...utronix.de>
CC:	Borislav Petkov <bp@...en8.de>,
	LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>, Peter Anvin <hpa@...or.com>,
	Jiang Liu <jiang.liu@...ux.intel.com>,
	Jeremiah Mahler <jmmahler@...il.com>,
	<andy.shevchenko@...il.com>, Guenter Roeck <linux@...ck-us.net>
Subject: Re: [patch 00/14] x86/irq: Plug various vector cleanup races

On 01/16/2016 04:37 PM, Joe Lawrence wrote:
> On 01/14/2016 05:33 AM, Borislav Petkov wrote:
>> On Thu, Jan 14, 2016 at 09:24:35AM +0100, Thomas Gleixner wrote:
>>> On Mon, 4 Jan 2016, Joe Lawrence wrote:
>>>> No issues running the same PCI device removal and stress tests against
>>>> the patchset.
>>>
>>> Thanks for testing!
>>>
>>> Though there is yet another long standing bug in that area. Fix below.
>>>
>>> Thanks,
>>>
>>>     tglx
>>>
>>> 8<--------------------
>>>
> [ ... snip ... ]
>>
>> s/d//
>>
>> With those micro-changes:
>>
>> Tested-by: Borislav Petkov <bp@...e.de>
>>
>> :-)
> 
> Tests still running ok here (with same micro-change as Borislav).

Hi Thomas,

When logging in this morning and looking at the box running the 14
patches + additional patch, I see it hit a hung task timeout in xhci USB
code about 39 hours in.  Stack trace below (looks to be waiting on a
completion that never comes).

I didn't see this when running only the *initial* 14 patches.  Of
course, before these irq cleanup fixes my tests never ran this long :)
So it may or may not be related to the patchset, I'm still poking around
the generated vmcore.  Let me know if there is anything you might be
interested in looking at from the wreckage.

-- Joe



INFO: task kworker/0:1:1506 blocked for more than 120 seconds.
      Tainted: P           OE   4.3.0sra12+ #50
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/0:1     D 0000000000000000     0  1506      2 0x00000080
Workqueue: usb_hub_wq hub_event
 ffff8801e46dba58 0000000000000046 ffff8810375dac00 ffff881038430000
 ffff8801e46dc000 ffff88025ac20440 ffff88025ac20438 ffff881038430000
 0000000000000000 ffff8801e46dba70 ffffffff81659893 7fffffffffffffff
Call Trace:
 [<ffffffff81659893>] schedule+0x33/0x80
 [<ffffffff8165c530>] schedule_timeout+0x200/0x2a0
 [<ffffffff810e2761>] ? internal_add_timer+0x71/0xb0
 [<ffffffff810e4994>] ? mod_timer+0x114/0x210
 [<ffffffff8165a371>] wait_for_completion+0xf1/0x130
 [<ffffffff810a70d0>] ? wake_up_q+0x70/0x70
 [<ffffffff814b14a1>] xhci_discover_or_reset_device+0x1e1/0x540
 [<ffffffff814723b8>] hub_port_reset+0x3c8/0x590
 [<ffffffff81472aa5>] hub_port_init+0x525/0xb00
 [<ffffffff81476068>] hub_port_connect+0x328/0x940
 [<ffffffff81476cbc>] hub_event+0x63c/0xb00
 [<ffffffff810947dc>] process_one_work+0x14c/0x3c0
 [<ffffffff81095044>] worker_thread+0x114/0x470
 [<ffffffff8165925f>] ? __schedule+0x2af/0x8b0
 [<ffffffff81094f30>] ? rescuer_thread+0x310/0x310
 [<ffffffff8109ab88>] kthread+0xd8/0xf0
 [<ffffffff8109aab0>] ? kthread_park+0x60/0x60
 [<ffffffff8165d75f>] ret_from_fork+0x3f/0x70
 [<ffffffff8109aab0>] ? kthread_park+0x60/0x60