linux-kernel - Possible security issue in perf_event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <CACXSKC-tj=v4vNHcT7Zug-Tfe5qx_0DFufNMrm05cdYmPdPZdQ@mail.gmail.com>
Date:	Wed, 6 Apr 2016 00:07:27 +1000
From:	Vitaly Nikolenko <vnik5287@...il.com>
To:	linux-kernel@...r.kernel.org
Subject: Possible security issue in perf_event_open

Hi,

I wasn't sure who to email but I believe this is somehow related to
the perf counters implementation. I kept getting lockups (deadlocks)
on a 64-bit uniprocessor system:

Linux ubuntu 3.19.0-43-generic #49~14.04.1-Ubuntu SMP Thu Dec 31
15:44:49 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

This issue seems to be reproducible on all 3.2+ < 4 kernels (possibly
earlier). On my test system, it can take from a few seconds to an hour
for a deadlock to occur.

There's some race condition (?) in x86_pmu_stop() that calls
x86_perf_event_update() that causes it to run with a user-space
pointer. The backtrace is attached. In frame #19, the event pointer is
valid and points to 0xffff88003b4ae000. However, in frame #18,
x86_perf_event_update() is called with the user-space pointer
0x40000002 in %rdi. Same applies to the vfs_write call. If this
address 0x40000002 is mmaped in user space, it might be possible to
perform arbitrary code execution via a crafted file struct, for
example.

The following disassembly (last 2 lines) shows that the call at
0xffffffff8102ba8b is executed before the value in %r12
(0xffff88003b4ae000) is copied into %rdi. So 86_perf_event_update()
runs with the old %rdi value (0x40000002). I'm not sure where this
value 0x40000002 is coming from though. The only explanation I could
find is it could be related to cpuid/rdmsr and the vmware hypervisor
since this was only reproducible on VMware hypervisors. I've also
tried qemu and virtualbox but wasn't able to reproduce it there.

(gdb) disassemble x86_pmu_stop

Dump of assembler code for function x86_pmu_stop:

   0xffffffff8102ba30 <+0>:     push   %rbp

   0xffffffff8102ba31 <+1>:     mov    %rsp,%rbp

   0xffffffff8102ba34 <+4>:     push   %r13

   0xffffffff8102ba36 <+6>:     mov    %esi,%r13d

   0xffffffff8102ba39 <+9>:     push   %r12

   0xffffffff8102ba3b <+11>:    mov    %rdi,%r12

   0xffffffff8102ba3e <+14>:    push   %rbx

   0xffffffff8102ba3f <+15>:    mov    $0xbb20,%rbx

   0xffffffff8102ba46 <+22>:    sub    $0x8,%rsp

   0xffffffff8102ba4a <+26>:    add    %gs:0x7efde6f6(%rip),%rbx
 # 0xa148 <this_cpu_off>

   0xffffffff8102ba52 <+34>:    movslq 0x154(%rdi),%rax

   0xffffffff8102ba59 <+41>:    btr    %rax,0x200(%rbx)

   0xffffffff8102ba61 <+49>:    sbb    %eax,%eax

   0xffffffff8102ba63 <+51>:    test   %eax,%eax

   0xffffffff8102ba65 <+53>:    jne    0xffffffff8102baa8 <x86_pmu_stop+120>

   0xffffffff8102ba67 <+55>:    and    $0x4,%r13d

   0xffffffff8102ba6b <+59>:    je     0xffffffff8102ba78 <x86_pmu_stop+72>

   0xffffffff8102ba6d <+61>:    testb  $0x2,0x198(%r12)

   0xffffffff8102ba76 <+70>:    je     0xffffffff8102ba88 <x86_pmu_stop+88>

   0xffffffff8102ba78 <+72>:    add    $0x8,%rsp

   0xffffffff8102ba7c <+76>:    pop    %rbx

   0xffffffff8102ba7d <+77>:    pop    %r12

   0xffffffff8102ba7f <+79>:    pop    %r13

   0xffffffff8102ba81 <+81>:    pop    %rbp

   0xffffffff8102ba82 <+82>:    retq

   0xffffffff8102ba83 <+83>:    nopl   0x0(%rax,%rax,1)

   0xffffffff8102ba88 <+88>:    mov    %r12,%rdi

   0xffffffff8102ba8b <+91>:    callq  0xffffffff8102b990
<x86_perf_event_update>

Please let me know if you need any other information.

-- 
--
Regards,
Vitaly

View attachment "bt.txt" of type "text/plain" (8679 bytes)