linux-kernel - irq_desc use-after-free in smp_irq_move_cleanup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <5653B688.4050809@stratus.com>
Date:	Mon, 23 Nov 2015 19:59:52 -0500
From:	Joe Lawrence <joe.lawrence@...atus.com>
To:	<linux-kernel@...r.kernel.org>
CC:	Thomas Gleixner <tglx@...utronix.de>,
	Joe Lawrence <joe.lawrence@...atus.com>
Subject: irq_desc use-after-free in smp_irq_move_cleanup_interrupt

Hi Thomas,

I've been chasing down a use-after-free on an irq_desc structure during
repeated device removal testing that crashes 4.3 in
smp_irq_move_cleanup_interrupt.  So far I have a bunch of crashes and some
data gleaned from instrumenting the kernel with trace logging.  Details to
follow, but in short, I think I'm hitting the use-after-free because of
a782a7e46bb5 "x86/irq: Store irq descriptor in vector array".

During the tests, I usually see RCU stalls (sometimes a crash later) --
when looking at a vmcore, the CPU stall is here:

  crash> bt ffff881038ee4200
  PID: 0      TASK: ffff881038ee4200  CPU: 1   COMMAND: "swapper/1"
      [exception RIP: _raw_spin_lock+0x10]
      RIP: ffffffff81672a10  RSP: ffff88103f843f70  RFLAGS: 00000046
      RAX: 0000000000000000  RBX: 0000000000000091  RCX: 0000000000000001
      RDX: 0000000000000001  RSI: 0000000000000001  RDI: ffffffff81d36a88
      RBP: ffff88103f843fa8   R8: 0000000000000101   R9: 0000000000000f78
      R10: 0000000000100000  R11: ffff8810384ccf8c  R12: 0000000000000091
      R13: ffff88102a819794  R14: 000000000000b5e8  R15: ffff88102a8196f8
      CS: 0010  SS: 0018
   #0 [ffff88103f843f70] smp_irq_move_cleanup_interrupt at ffffffff8105319b
   #1 [ffff88103f843fb0] irq_move_cleanup_interrupt at ffffffff81673a04
  --- <IRQ stack> ---
   #2 [ffff881038f0fd90] irq_move_cleanup_interrupt at ffffffff81673a04
      [exception RIP: unknown or invalid address]
      RIP: 000000000000001d  RSP: 0000000000000000  RFLAGS: ffff881038f10000
      RAX: 0000000000000001  RBX: ffff881038f0c000  RCX: 0000000000000005
      RDX: 00000000000a087a  RSI: 0000000000051ca2  RDI: 0000000000000018
      RBP: ffff88103f84f100   R8: ffff881038f0fea0   R9: ffffe8f000042378
      R10: 0000000000000002  R11: 000000ee954efd02  R12: ffffffff810e5060
      R13: ffff881038f0fdc8  R14: ffff88103f84f500  R15: ffff88103f84f100
      ORIG_RAX: ffff88103f856c40  CS: 8000a044  SS: ffffffffffffffdf
  bt: WARNING: possibly bogus exception frame
   #3 [ffff881038f0fe38] cpuidle_enter_state at ffffffff8151e278
   #4 [ffff881038f0fea8] cpuidle_enter at ffffffff8151e417
   #5 [ffff881038f0feb8] call_cpuidle at ffffffff810bef42
   #6 [ffff881038f0fed0] cpu_startup_entry at ffffffff810bf1dc
   #7 [ffff881038f0ff28] start_secondary at ffffffff8104e9e0

The irq_desc is in R15: ffff88102a8196f8

  crash> kmem ffff88102a8196f8
  CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
  ffff88103f408c80 kmalloc-512              512       5292      5460    140    32k
    SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
    ffffea0040aa0600  ffff88102a818000     0     39         17    22
    FREE / [ALLOCATED]
     ffff88102a8196f8  

        PAGE         PHYSICAL      MAPPING       INDEX CNT FLAGS
  ffffea0040aa0640 102a819000                0        0  0 2fffff80008000 tail

This irq_desc is no longer allocated, it's been filled with the slub debug
poison pattern (hence the spinlock is stuck):

  crash> rd ffff88102a8196f8 0x28
  ffff88102a8196f8:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819708:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819718:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819728:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819738:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819748:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819758:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819768:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819778:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819788:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819798:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a8197a8:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a8197b8:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a8197c8:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a8197d8:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a8197e8:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a8197f8:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819808:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819818:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819828:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk

The irq vector is in RBX: 0000000000000091

  crash> p/d 0x91
  $1 = 145

The irq_desc_tree entry for vector 0x91 appears to have been updated to
something new (the device re-add):

  crash> tree -t radix irq_desc_tree -s irq_desc | grep 'irq = 0x91' -B11 | grep -e '^ffff' -e 'irq ='
  ffff88102fb144e8
      irq = 0x91,

But from a custom crash gdb script, the freed irq_desc can still be found
in CPU 1's vector_irq[145]:

  cpu[1] vector_irq[145] irq_desc @ 0xffff88102a8196f8

Sifting through a heavily grepped and abbreviated trace log:

  - The irq moved from CPU 1, vector 145 to CPU 44, vector 134 
  - Both CPUs skip cleaning up their vector_irq[] entries for this irq
    because data->move_in_progress is set (is this normal?)
  - The irq moves again to CPU 34, vector 123
  - The underlying device is removed @ 593.106514 jiffies
  - The irq_desc is freed @ 593.239879 jiffies
  - CPU 1 is last heard from @ 1022.830083 jiffies

  [001]    22.936764: __assign_irq_vector : cpu 44 : vector=134 -> 0xffff88102a8196f8
  [044]    35.209945: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
  [001]    35.212370: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
  [044]    35.674114: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
  [001]    35.675395: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
  [044]    36.265716: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
  [044]    36.517785: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
  [001]    36.520115: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
  [001]    54.636651: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
  [001]    54.636722: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
  [044]    61.670267: __assign_irq_vector : cpu 34 : vector=123 -> 0xffff88102a8196f8
  [001]    61.670979: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
  [044]    61.696120: smp_irq_move_cleanup_interrupt : cpu (this) : vector=134 0xffff88102a8196f8 -> (nil)
  [034]    69.086107: smp_irq_move_cleanup_interrupt : vector == data->cfg.vector && ... : vector=123 0xffff88102a8196f8
  [000]   593.239811: clear_irq_vector : 1 : cpu 34 : vector=123 0xffff88102a8196f8 -> (nil)
  [000]   593.239879: free_desc : desc @ 0xffff88102a8196f8
  [001]  1022.830083: smp_irq_move_cleanup_interrupt : IS_ERR_OR_NULL : vector=144 (nil)

  crash> log | grep 'mpt3sas[0-9]*: _scsih_remove$'
  [  291.658410] mpt3sas0: _scsih_remove
  [  593.106514] mpt3sas1: _scsih_remove           << free_desc
  [  874.345239] mpt3sas3: _scsih_remove

Prior to a782a7e46bb5 "x86/irq: Store irq descriptor in vector array", the
vector_irq array held irq values, those interested in the irq_desc would
have to perform their own irq_to_desc() (ie, a radix_tree_lookup of the
irq_desc_tree).  The radix tree is updated in free_desc(), so any
subsequent lookups would fail.

After the change though, I'm not so sure that direct access to the
irq_desc is safe -- I couldn't figure out anything preventing
use-after-free (the sparse_irq_lock and the vector_lock seem to protect
their respective data-structures, but not the dual-use of the irq_desc's
by both the irq_desc_tree and the per-cpu vector_irq[].)

Hopefully this is on the right track -- I can modify the trace logging or
run other tests if there is additional information that would be helpful.

Regards,

-- Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/