linux-kernel - Re: [patch 00/14] x86/irq: Plug various vector cleanup races

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <569F05AF.5070006@stratus.com>
Date:	Tue, 19 Jan 2016 22:57:35 -0500
From:	Joe Lawrence <joe.lawrence@...atus.com>
To:	Thomas Gleixner <tglx@...utronix.de>
CC:	Borislav Petkov <bp@...en8.de>,
	LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...nel.org>, Peter Anvin <hpa@...or.com>,
	Jiang Liu <jiang.liu@...ux.intel.com>,
	Jeremiah Mahler <jmmahler@...il.com>,
	<andy.shevchenko@...il.com>, Guenter Roeck <linux@...ck-us.net>
Subject: Re: [patch 00/14] x86/irq: Plug various vector cleanup races

On 01/18/2016 10:00 AM, Joe Lawrence wrote:
[... snip ... ] 
> Hi Thomas,
> 
> When logging in this morning and looking at the box running the 14
> patches + additional patch, I see it hit a hung task timeout in xhci USB
> code about 39 hours in.  Stack trace below (looks to be waiting on a
> completion that never comes).
> 
> I didn't see this when running only the *initial* 14 patches.  Of
> course, before these irq cleanup fixes my tests never ran this long :)
> So it may or may not be related to the patchset, I'm still poking around
> the generated vmcore.  Let me know if there is anything you might be
> interested in looking at from the wreckage.
> 
> -- Joe
> 
> 
> 
> INFO: task kworker/0:1:1506 blocked for more than 120 seconds.
>        Tainted: P           OE   4.3.0sra12+ #50
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/0:1     D 0000000000000000     0  1506      2 0x00000080
> Workqueue: usb_hub_wq hub_event
>   ffff8801e46dba58 0000000000000046 ffff8810375dac00 ffff881038430000
>   ffff8801e46dc000 ffff88025ac20440 ffff88025ac20438 ffff881038430000
>   0000000000000000 ffff8801e46dba70 ffffffff81659893 7fffffffffffffff
> Call Trace:
>   [<ffffffff81659893>] schedule+0x33/0x80
>   [<ffffffff8165c530>] schedule_timeout+0x200/0x2a0
>   [<ffffffff810e2761>] ? internal_add_timer+0x71/0xb0
>   [<ffffffff810e4994>] ? mod_timer+0x114/0x210
>   [<ffffffff8165a371>] wait_for_completion+0xf1/0x130
>   [<ffffffff810a70d0>] ? wake_up_q+0x70/0x70
>   [<ffffffff814b14a1>] xhci_discover_or_reset_device+0x1e1/0x540
>   [<ffffffff814723b8>] hub_port_reset+0x3c8/0x590
>   [<ffffffff81472aa5>] hub_port_init+0x525/0xb00
>   [<ffffffff81476068>] hub_port_connect+0x328/0x940
>   [<ffffffff81476cbc>] hub_event+0x63c/0xb00
>   [<ffffffff810947dc>] process_one_work+0x14c/0x3c0
>   [<ffffffff81095044>] worker_thread+0x114/0x470
>   [<ffffffff8165925f>] ? __schedule+0x2af/0x8b0
>   [<ffffffff81094f30>] ? rescuer_thread+0x310/0x310
>   [<ffffffff8109ab88>] kthread+0xd8/0xf0
>   [<ffffffff8109aab0>] ? kthread_park+0x60/0x60
>   [<ffffffff8165d75f>] ret_from_fork+0x3f/0x70
>   [<ffffffff8109aab0>] ? kthread_park+0x60/0x60

Hi Thomas / Boris,

In an effort to exonerate the patchset, I instrumented xHCI to monitor
complementary wait_for_completion / complete calls in that driver,
hoping that an early exit in its probe might be simply skipping the
complete call ... but I ended up collecting two new crashes in
get_next_timer_interrupt:

(Again with proprietary and out-of-tree drivers loaded.)

general protection fault: 0000 [#1] SMP
Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun
matroxfb(OE) ccmod(POE) ftmod(OE) videosw(OE) [ ... snip ... ]
CPU: 10 PID: 0 Comm: swapper/10 Tainted: P           OE   4.3.0sra13+ #53
Hardware name: Stratus ftServer 6800/G7LYY, BIOS BIOS Version 8.1:61 09/10/2015
task: ffff881038e35800 ti: ffff881038e3c000 task.ti: ffff881038e3c000
RIP: 0010:[<ffffffff810e4c55>]  [<ffffffff810e4c55>] get_next_timer_interrupt+0x1a5/0x240
RSP: 0018:ffff881038e3fde0  EFLAGS: 00010002
RAX: ffff88103fa8e8b8 RBX: 000013629b0c5740 RCX: 000000014140a6d6
RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000001 RDI: 00000000010140a7
RBP: ffff881038e3fe30 R08: 6b6b6b6b6b6b6b6b R09: 0000000000000027
R10: 0000000000000027 R11: ffff881038e3fde8 R12: 000000010140a6d6
R13: ffff88103fa8e080 R14: ffff881038e3fe00 R15: 0000000000000040
FS:  0000000000000000(0000) GS:ffff88103fa80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcbf695a000 CR3: 0000002030582000 CR4: 00000000001406e0
Stack:
 ffff88103fa8e8b8 ffff88103fa8eab8 ffff88103fa8ecb8 ffff88103fa8eeb8
 c5067d2c236293b0 ffff88103fa8f500 ffff88103fa8d600 000013629b1632f3
 000000010140a6d6 000013629b0c5740 ffff881038e3fe88 ffffffff810f421f
Call Trace:
 [<ffffffff810f421f>] tick_nohz_stop_sched_tick+0x1bf/0x2b0
 [<ffffffff810f43af>] __tick_nohz_idle_enter+0x9f/0x150
 [<ffffffff810f47dc>] tick_nohz_idle_enter+0x3c/0x70
 [<ffffffff810bf00c>] cpu_startup_entry+0x9c/0x330
 [<ffffffff8104e700>] start_secondary+0x160/0x1a0
Code: 95 38 0e 00 00 48 89 55 c8 48 8d 55 b0 4c 8d 5a 08 4c 8d 72 20 41 89 fa 41 83 e2 3f 45 89 d1 49 63 d1 48 8b 14 d0 48 85 d2 74 1e <f6> 42 2a 10 75 10 4c 8b 42 10 be 01 00 00 00 49 39 c8 49 0f 48
RIP  [<ffffffff810e4c55>] get_next_timer_interrupt+0x1a5/0x240
 RSP <ffff881038e3fde0>

crash> dis -l get_next_timer_interrupt+0x1a5
/root/linus/kernel/time/timer.c: 1289
  <get_next_timer_interrupt+0x1a5>:    testb  $0x10,0x2a(%rdx)

RDX: 6b6b6b6b6b6b6b6b

1246 static unsigned long __next_timer_interrupt(struct tvec_base *base)
...
1251         struct timer_list *nte;
...
1277         /* Check tv2-tv5. */
1278         varray[0] = &base->tv2;
1279         varray[1] = &base->tv3;
1280         varray[2] = &base->tv4;
1281         varray[3] = &base->tv5;
...
1283         for (array = 0; array < 4; array++) {
1284                 struct tvec *varp = varray[array];
1285
1286                 index = slot = timer_jiffies & TVN_MASK;
1287                 do {
1288                         hlist_for_each_entry(nte, varp->vec + slot, entry) {
1289                                 if (nte->flags & TIMER_DEFERRABLE)

So the nte pointer contains the slub_debug poison pattern.

>From the disassembly of __next_timer_interrupt, it looks like r13 is
used to store "base".

R13: ffff88103fa8e080

crash> struct tvec_base ffff88103fa8e080
struct tvec_base {
  lock = {
    {
      rlock = {
        raw_lock = {
          val = {
            counter = 0x1
          }
        }
      }
    }
  },
  running_timer = 0x0,
  timer_jiffies = 0x10140a6d7,
  next_timer = 0x10140a6d6,
  active_timers = 0x7,
  all_timers = 0x8,
  cpu = 0xa,
  migration_enabled = 0x1,
  nohz_active = 0x1,
  ...
  tv1 = {
    vec = {{
      ...
        first = 0xffff88203800dff8
      ...
  tv2 = {
    vec = {{
      ...
        first = 0xffff88100aa9a550     << index 39 points to slub poison
      ...
        first = 0xffff88103fa90ea0
      ...
  tv3 = {
    vec = {{
      ...
        first = 0xffff8807452b3928
      ...
        first = 0xffff88103fa8d380
      ...
  tv4 = {
    vec = {{
      ...
  tv5 = {
    vec = {{
      ...
        first = 0xffff88100c8955e8

crash> struct hlist_node 0xffff88203800dff8
struct hlist_node {
  next = 0x0,
  pprev = 0xffff88103fa8e790
}
crash> struct hlist_node 0xffff88100aa9a550
struct hlist_node {
  next = 0x6b6b6b6b6b6b6b6b,           << uhoh!
  pprev = 0x6b6b6b6b6b6b6b6b           <<
}
crash> struct hlist_node 0xffff88103fa90ea0
struct hlist_node {
  next = 0x0,
  pprev = 0xffff88103fa8e9f8
}
crash> struct hlist_node 0xffff8807452b3928
struct hlist_node {
  next = 0xffff88100aa9da68,
  pprev = 0xffff88103fa8eae0
}
crash> struct hlist_node 0xffff88103fa8d380
struct hlist_node {
  next = 0x0,
  pprev = 0xffff88103fa8eb58
}
crash> struct hlist_node 0xffff88100c8955e8
struct hlist_node {
  next = 0xffff881021e47598,
  pprev = 0xffff88103fa8eec0
}

crash utility confirms it in its "timer" display:

crash> timer
TVEC_BASES[9]: ffff88103fa4e080
  JIFFIES
4315982678
  EXPIRES      TIMER_LIST         FUNCTION
4315982681  ffff88203800ddb0  ffffffff8150e7a0 <intel_pstate_timer_func>
4315982973  ffff88103fa50ea0  ffffffff81092d90 <delayed_work_timer_fn>
4316267973  ffff88103fa4d380  ffffffff81041930 <mce_timer_fn>

timer: invalid list entry: 6b6b6b6b6b6b6b6b
timer: ignoring faulty timer list at index 39 of timer array

timer: invalid list entry: 6b6b6b6b6b6b6b6b
timer: ignoring faulty timer list at index 39 of timer array
TVEC_BASES[10]: ffff88103fa8e080
      JIFFIES
         4315982678
      EXPIRES           TIMER_LIST         FUNCTION
         4315981531  ffff88203800dff8  ffffffff8150e7a0 <intel_pstate_timer_func>
         4316034039  ffff88100aa9da68  ffffffff81092d90 <delayed_work_timer_fn>
         4316034111  ffff8807452b3928  ffffffff81092d90 <delayed_work_timer_fn>
         4316267970  ffff88103fa8d380  ffffffff81041930 <mce_timer_fn>
         4401397760  ffff88100c8955e8  ffffffff8160bbe0 <ipv6_regen_rndid>
         4401397760  ffff881021e47598  ffffffff8160bbe0 <ipv6_regen_rndid>
7740398493674204011  ffff88100aa9a550  6b6b6b6b6b6b6b6b

crash> struct timer_list ffff88100aa9a550
struct timer_list {
  entry = {
    next = 0x6b6b6b6b6b6b6b6b,
    pprev = 0x6b6b6b6b6b6b6b6b
  },
  expires = 0x6b6b6b6b6b6b6b6b,
  function = 0x6b6b6b6b6b6b6b6b,
  data = 0x6b6b6b6b6b6b6b6b,
  flags = 0x6b6b6b6b,
  slack = 0x6b6b6b6b,
  start_pid = 0x6b6b6b6b,
  start_site = 0x6b6b6b6b6b6b6b6b,
  start_comm = "kkkkkkkkkkkkkkkk"
}

A second crash of the same signature occurred a few hours later.


Unfortunately I only have a single box to run these tests in what
amounts to an after-hours effort.  I started testing back in 4.3 but
avoided moving forward to avoid the 4.4 development cycle (and
incidental issues that it might have muddled the waters).  That said,
what would be the best way to proceed?

  Change device removal tests to avoid proprietary drivers.

  What about the other out-of-tree device drivers (mpt3sas, ixgbe,
  etc.)?  These are open source, but contain much Stratus
  device removal paranoia that upstream hasn't adopted.

  Rebase evil(TM) proprietary/out-of-tree drivers against 4.4 or
  4.5rcX, apply this patchset and any other required device removal
  fixups.

If proprietary/out-of-tree drivers are a debugging deal breaker, I
understand.  The platform offers a unique hotplug testbed, so I try to
contribute testing and bug reports where I feel they apply equally
to untainted upstream.

Regards,

-- Joe