linux-kernel - Re: kvm: WARNING in mmu_spte_clear_track

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACT4Y+aFoCp96Z2CmwO9Jb7oGCV+kj3_RRU2EyS3=oKerbYRyg@mail.gmail.com>
Date:   Thu, 23 Mar 2017 17:39:19 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Radim Krčmář <rkrcmar@...hat.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        "x86@...nel.org" <x86@...nel.org>, KVM list <kvm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Alan Stern <stern@...land.harvard.edu>,
        Steve Rutherford <srutherford@...gle.com>,
        Xiao Guangrong <guangrong.xiao@...ux.intel.com>,
        Haozhong Zhang <haozhong.zhang@...el.com>,
        syzkaller <syzkaller@...glegroups.com>
Subject: Re: kvm: WARNING in mmu_spte_clear_track_bits

On Tue, Mar 14, 2017 at 4:17 PM, Radim Krčmář <rkrcmar@...hat.com> wrote:
> 2017-03-12 12:20+0100, Dmitry Vyukov:
>> On Tue, Jan 17, 2017 at 5:00 PM, Dmitry Vyukov <dvyukov@...gle.com> wrote:
>>> On Tue, Jan 17, 2017 at 4:20 PM, Paolo Bonzini <pbonzini@...hat.com> wrote:
>>>>
>>>>
>>>> On 13/01/2017 12:15, Dmitry Vyukov wrote:
>>>>>
>>>>> I've commented out the WARNING for now, but I am seeing lots of
>>>>> use-after-free's and rcu stalls involving mmu_spte_clear_track_bits:
>>>>>
>>>>>
>>>>> BUG: KASAN: use-after-free in mmu_spte_clear_track_bits+0x186/0x190
>>>>> arch/x86/kvm/mmu.c:597 at addr ffff880068ae2008
>>>>> Read of size 8 by task syz-executor2/16715
>>>>> page:ffffea00016e6170 count:0 mapcount:0 mapping:          (null) index:0x0
>>>>> flags: 0x500000000000000()
>>>>> raw: 0500000000000000 0000000000000000 0000000000000000 00000000ffffffff
>>>>> raw: ffffea00017ec5a0 ffffea0001783d48 ffff88006aec5d98
>>>>> page dumped because: kasan: bad access detected
>>>>> CPU: 2 PID: 16715 Comm: syz-executor2 Not tainted 4.10.0-rc3+ #163
>>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>>>> Call Trace:
>>>>>  __dump_stack lib/dump_stack.c:15 [inline]
>>>>>  dump_stack+0x292/0x3a2 lib/dump_stack.c:51
>>>>>  kasan_report_error mm/kasan/report.c:213 [inline]
>>>>>  kasan_report+0x42d/0x460 mm/kasan/report.c:307
>>>>>  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:333
>>>>>  mmu_spte_clear_track_bits+0x186/0x190 arch/x86/kvm/mmu.c:597
>>>>>  drop_spte+0x24/0x280 arch/x86/kvm/mmu.c:1182
>>>>>  kvm_zap_rmapp+0x119/0x260 arch/x86/kvm/mmu.c:1401
>>>>>  kvm_unmap_rmapp+0x1d/0x30 arch/x86/kvm/mmu.c:1412
>>>>>  kvm_handle_hva_range+0x54a/0x7d0 arch/x86/kvm/mmu.c:1565
>>>>>  kvm_unmap_hva_range+0x2e/0x40 arch/x86/kvm/mmu.c:1591
>>>>>  kvm_mmu_notifier_invalidate_range_start+0xae/0x140
>>>>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:360
>>>>>  __mmu_notifier_invalidate_range_start+0x1f8/0x300 mm/mmu_notifier.c:199
>>>>>  mmu_notifier_invalidate_range_start include/linux/mmu_notifier.h:282 [inline]
>>>>>  unmap_vmas+0x14b/0x1b0 mm/memory.c:1368
>>>>>  unmap_region+0x2f8/0x560 mm/mmap.c:2460
>>>>>  do_munmap+0x7b8/0xfa0 mm/mmap.c:2657
>>>>>  mmap_region+0x68f/0x18e0 mm/mmap.c:1612
>>>>>  do_mmap+0x6a2/0xd40 mm/mmap.c:1450
>>>>>  do_mmap_pgoff include/linux/mm.h:2031 [inline]
>>>>>  vm_mmap_pgoff+0x1a9/0x200 mm/util.c:305
>>>>>  SYSC_mmap_pgoff mm/mmap.c:1500 [inline]
>>>>>  SyS_mmap_pgoff+0x22c/0x5d0 mm/mmap.c:1458
>>>>>  SYSC_mmap arch/x86/kernel/sys_x86_64.c:95 [inline]
>>>>>  SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:86
>>>>>  entry_SYSCALL_64_fastpath+0x1f/0xc2
>>>>> RIP: 0033:0x445329
>>>>> RSP: 002b:00007fb33933cb58 EFLAGS: 00000282 ORIG_RAX: 0000000000000009
>>>>> RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 0000000000445329
>>>>> RDX: 0000000000000003 RSI: 0000000000af1000 RDI: 0000000020000000
>>>>> RBP: 00000000006dfe90 R08: ffffffffffffffff R09: 0000000000000000
>>>>> R10: 0000000000000032 R11: 0000000000000282 R12: 0000000000700000
>>>>> R13: 0000000000000006 R14: ffffffffffffffff R15: 0000000020001000
>>>>> Memory state around the buggy address:
>>>>>  ffff880068ae1f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>  ffff880068ae1f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>> ffff880068ae2000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>                       ^
>>>>>  ffff880068ae2080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>>  ffff880068ae2100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>> ==================================================================
>>>>
>>>> This could be related to the gfn_to_rmap issues.
>>>
>>>
>>> Humm... That's possible. Potentially I am not seeing any more of
>>> spte-related crashes after I applied the following patch:
>>>
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -968,8 +968,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
>>>                 /* Check for overlaps */
>>>                 r = -EEXIST;
>>>                 kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
>>> -                       if ((slot->id >= KVM_USER_MEM_SLOTS) ||
>>> -                           (slot->id == id))
>>> +                       if (slot->id == id)
>>>                                 continue;
>>>                         if (!((base_gfn + npages <= slot->base_gfn) ||
>>>                               (base_gfn >= slot->base_gfn + slot->npages)))
>
> I don't understand how this fixes the test: the only memslot that the
> test creates is at memory range 0x0-0x1000, which should not overlap
> with any private memslots.
> There should be just the IDENTITY_PAGETABLE_PRIVATE_MEMSLOT @
> 0xfffbc000ul.
>
> Do you get any ouput with this hunk?
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a17d78759727..7e1929432232 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -888,6 +888,14 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
>         return old_memslots;
>  }
>
> +void kvm_dump_slot(struct kvm_memory_slot *slot)
> +{
> +       printk("kvm_memory_slot %p { .id = %u, .base_gfn = %#llx, .npages = %lu, "
> +              ".userspace_addr = %#lx, .flags = %u, .dirty_bitmap = %p, .arch = ? }\n",
> +                       slot, slot->id, slot->base_gfn, slot->npages,
> +                       slot->userspace_addr, slot->flags, slot->dirty_bitmap);
> +}
> +
>  /*
>   * Allocate some memory and give it an address in the guest physical address
>   * space.
> @@ -978,12 +986,14 @@ int __kvm_set_memory_region(struct kvm *kvm,
>                 /* Check for overlaps */
>                 r = -EEXIST;
>                 kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
> -                       if ((slot->id >= KVM_USER_MEM_SLOTS) ||
> -                           (slot->id == id))
> +                       if (slot->id == id)
>                                 continue;
>                         if (!((base_gfn + npages <= slot->base_gfn) ||
> -                             (base_gfn >= slot->base_gfn + slot->npages)))
> +                             (base_gfn >= slot->base_gfn + slot->npages))) {
> +                               kvm_dump_slot(&new);
> +                               kvm_dump_slot(slot);
>                                 goto out;
> +                       }
>                 }
>         }
>
>
>> Friendly ping. Just hit it on
>
> And the warning happens at mmap ... I can't reproduce, but does the bug
> happen on the second mmap()?  (Test line 210 when i = 0.)
>
> The change above makes sense as memslots currently cannot overlap
> anywhere.  There are three private memslots that can cause this problem:
> TSS, IDENTITY_MAP and APIC.
>
> TSS and IDENTITY_MAP can be configured by userspace and must not
> conflict by design, so we can safely enforce that.
> APIC memslot doesn't provide such guarantees and should be overlaid over
> any memory, but assuming that userspace doesn't configure memslots there
> seems bearable.
>
> Still, I'd like to understand why that patch would fix this bug.
>
> Thanks.


Humm... I cannot reproduce it anymore. Maybe it was fixed by something else...
However this looks very close and is still not fixed:
https://groups.google.com/d/msg/syzkaller/IqkesiRS-t0/aLcJuMXqBgAJ
Maybe it's another reincarnation of the same problem...




>> mmotm/86292b33d4b79ee03e2f43ea0381ef85f077c760 (without the above
>> change):
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 1 PID: 31060 at arch/x86/kvm/mmu.c:682
>> mmu_spte_clear_track_bits+0x3a1/0x420 arch/x86/kvm/mmu.c:682
>> CPU: 1 PID: 31060 Comm: syz-executor0 Not tainted 4.11.0-rc1+ #328
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:16 [inline]
>>  dump_stack+0x1a7/0x26a lib/dump_stack.c:52
>>  panic+0x1f8/0x40f kernel/panic.c:180
>>  __warn+0x1c4/0x1e0 kernel/panic.c:541
>>  warn_slowpath_null+0x2c/0x40 kernel/panic.c:584
>>  mmu_spte_clear_track_bits+0x3a1/0x420 arch/x86/kvm/mmu.c:682
>>  drop_spte+0x24/0x280 arch/x86/kvm/mmu.c:1323
>>  mmu_page_zap_pte+0x223/0x350 arch/x86/kvm/mmu.c:2438
>>  kvm_mmu_page_unlink_children arch/x86/kvm/mmu.c:2460 [inline]
>>  kvm_mmu_prepare_zap_page+0x1ce/0x13d0 arch/x86/kvm/mmu.c:2504
>>  kvm_zap_obsolete_pages arch/x86/kvm/mmu.c:5134 [inline]
>>  kvm_mmu_invalidate_zap_all_pages+0x4d4/0x6b0 arch/x86/kvm/mmu.c:5175
>>  kvm_arch_flush_shadow_all+0x15/0x20 arch/x86/kvm/x86.c:8364
>>  kvm_mmu_notifier_release+0x71/0xb0
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:472
>>  __mmu_notifier_release+0x1e5/0x6b0 mm/mmu_notifier.c:75
>>  mmu_notifier_release include/linux/mmu_notifier.h:235 [inline]
>>  exit_mmap+0x3a3/0x470 mm/mmap.c:2941
>>  __mmput kernel/fork.c:890 [inline]
>>  mmput+0x228/0x700 kernel/fork.c:912
>>  exit_mm kernel/exit.c:558 [inline]
>>  do_exit+0x9e8/0x1c20 kernel/exit.c:866
>>  do_group_exit+0x149/0x400 kernel/exit.c:983
>>  get_signal+0x6d9/0x1840 kernel/signal.c:2318
>>  do_signal+0x94/0x1f30 arch/x86/kernel/signal.c:808
>>  exit_to_usermode_loop+0x1e5/0x2d0 arch/x86/entry/common.c:157
>>  prepare_exit_to_usermode arch/x86/entry/common.c:191 [inline]
>>  syscall_return_slowpath+0x3bd/0x460 arch/x86/entry/common.c:260
>>  entry_SYSCALL_64_fastpath+0xc0/0xc2
>> RIP: 0033:0x4458d9
>> RSP: 002b:00007ffa472c3b58 EFLAGS: 00000286 ORIG_RAX: 00000000000000ce
>> RAX: fffffffffffffff4 RBX: 0000000000708000 RCX: 00000000004458d9
>> RDX: 0000000000000000 RSI: 000000002006bff8 RDI: 000000000000a05b
>> RBP: 0000000000000fe0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000286 R12: 00000000006df0a0
>> R13: 000000000000a05b R14: 000000002006bff8 R15: 0000000000000000