linux-kernel - Re: [PATCH] KVM: x86: Reacquire kvm->srcu in vcpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <YhfvRRa56qQR9w5K@google.com>
Date:   Thu, 24 Feb 2022 20:49:09 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, Like Xu <like.xu.linux@...il.com>
Subject: Re: [PATCH] KVM: x86: Reacquire kvm->srcu in vcpu_run() if exiting
 on pending signal

On Thu, Feb 24, 2022, Sean Christopherson wrote:
> Reacquire kvm->srcu in vcpu_run() before returning to the caller if srcu
> was dropped to handle pending work.  If the task receives a signal, KVM
> will exit without reacquiring kvm->srcu, resulting in an unbalanced
> unlock kvm_arch_vcpu_ioctl_run(), and eventually hung tasks.
> 
>  =====================================
>  WARNING: bad unlock balance detected!
>  5.17.0-rc3+ #749 Not tainted
>  -------------------------------------
>  CPU 0/KVM/1803 is trying to release lock (&kvm->srcu) at:
>  [<ffffffff81042a19>] kvm_arch_vcpu_ioctl_run+0x669/0x1f60
>  but there are no more locks to release!
> 
>  other info that might help us debug this:
>  1 lock held by CPU 0/KVM/1803:
>   #0: ffff88810489c0b0 (&vcpu->mutex){....}-{3:3}, at: kvm_vcpu_ioctl+0x77/0x690
> 
>  stack backtrace:
>  CPU: 7 PID: 1803 Comm: CPU 0/KVM Not tainted 5.17.0-rc3+ #749
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x34/0x44
>   lock_release+0x1b4/0x240
>   kvm_arch_vcpu_ioctl_run+0x680/0x1f60
>   kvm_vcpu_ioctl+0x279/0x690
>   __x64_sys_ioctl+0x83/0xb0
>   do_syscall_64+0x3b/0xc0
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>   </TASK>
>  INFO: task stable:2347 blocked for more than 120 seconds.
>        Not tainted 5.17.0-rc3+ #749
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  task:stable          state:D stack:    0 pid: 2347 ppid:  2340 flags:0x00000000
>  Call Trace:
>   <TASK>
>   __schedule+0x328/0xa00
>   schedule+0x44/0xb0
>   schedule_timeout+0x26f/0x300
>   wait_for_completion+0x84/0xe0
>   __synchronize_srcu.part.0+0x7a/0xa0
>   kvm_swap_active_memslots+0x141/0x180
>   kvm_set_memslot+0x2f9/0x470
>   kvm_set_memory_region+0x29/0x40
>   kvm_vm_ioctl+0x2c3/0xd70
>   __x64_sys_ioctl+0x83/0xb0
>   do_syscall_64+0x3b/0xc0
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>   </TASK>
>  INFO: lockdep is turned off.

Ugh, the task hung is actually a different mess introduced by the same patch.
I suspect I'm hitting the one Like reported.

I'll get a fix posted shortly...