linux-kernel - Re: [PATCH] KVM: x86: make KVM_REQ_NMI request iff NMI pending for vcpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <3921d23a-100d-a696-d48a-116e0e702e3f@oracle.com>
Date: Thu, 8 Feb 2024 16:34:26 -0800
From: Dongli Zhang <dongli.zhang@...cle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Prasad Pandit <ppandit@...hat.com>, Prasad Pandit
 <pjp@...oraproject.org>,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: x86: make KVM_REQ_NMI request iff NMI pending for
 vcpu



On 2/6/24 12:58, Sean Christopherson wrote:
> On Tue, Feb 06, 2024, Dongli Zhang wrote:
>> Hi Prasad,
>>
>> On 1/2/24 23:53, Prasad Pandit wrote:
>>> From: Prasad Pandit <pjp@...oraproject.org>
>>>
>>> kvm_vcpu_ioctl_x86_set_vcpu_events() routine makes 'KVM_REQ_NMI'
>>> request for a vcpu even when its 'events->nmi.pending' is zero.
>>> Ex:
>>>     qemu_thread_start
>>>      kvm_vcpu_thread_fn
>>>       qemu_wait_io_event
>>>        qemu_wait_io_event_common
>>>         process_queued_cpu_work
>>>          do_kvm_cpu_synchronize_post_init/_reset
>>>           kvm_arch_put_registers
>>>            kvm_put_vcpu_events (cpu, level=[2|3])
>>>
>>> This leads vCPU threads in QEMU to constantly acquire & release the
>>> global mutex lock, delaying the guest boot due to lock contention.
>>
>> Would you mind sharing where and how the lock contention is at QEMU space? That
>> is, how the QEMU mutex lock is impacted by KVM KVM_REQ_NMI?
>>
>> Or you meant line 3031 at QEMU side?
> 
> Yeah, something like that.  Details in this thread.
> 
> https://urldefense.com/v3/__https://lore.kernel.org/all/CAE8KmOyffXD4k69vRAFwesaqrBGzFY3i*kefbkHcQf4=jNYzOA@mail.gmail.com__;Kw!!ACWV5N9M2RV99hQ!N61g2QXuC5B5RpVNBowgKUXjHzX4vp0lCXuton3fKVRbzBuXaVtJgePu0ddSf3EB9EEQORTmwop4vD5KrQ$ 

Thank you very much for pointing to the discussion. I should have found them :)

Here is my understanding.

1. During the VM creation, the mp_state of AP (non-BSP) is always
KVM_MP_STATE_UNINITIALIZED, until INIT/SIPI.

2. Ideally, AP should block at below. That is, line 3775 is always false.

3760 bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
3761 {
3762         struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
3763         bool waited = false;
3764
3765         vcpu->stat.generic.blocking = 1;
3766
3767         preempt_disable();
3768         kvm_arch_vcpu_blocking(vcpu);
3769         prepare_to_rcuwait(wait);
3770         preempt_enable();
3771
3772         for (;;) {
3773                 set_current_state(TASK_INTERRUPTIBLE);
3774
3775                 if (kvm_vcpu_check_block(vcpu) < 0)
3776                         break;
3777
3778                 waited = true;
3779                 schedule();
3780         }

3. Unfortunately, the issue may set KVM_REQ_NMI for AP.

4. This leads to the kvm_vcpu_check_block() to return.

kvm_arch_vcpu_ioctl_run()
-> kvm_vcpu_block()
   -> kvm_vcpu_check_block()
      -> kvm_arch_vcpu_runnable()
         -> kvm_vcpu_has_events()
            -> kvm_test_request(KVM_REQ_NMI, vcpu)


5. The kvm_arch_vcpu_ioctl_run() returns to QEMU with -EAGAIN.

6. The QEMU side is not able to handle -EAGAIN, but to goto line 2984 to return.

It acquires the global mutex at line 2976 (release before entering into guest
again). The KVM_REQ_NMI is never cleared until INIT/SIPI.


2808 int kvm_cpu_exec(CPUState *cpu)
2809 {
.. ...
2868         if (run_ret < 0) {
2869             if (run_ret == -EINTR || run_ret == -EAGAIN) {
2870                 trace_kvm_io_window_exit();
2871                 kvm_eat_signals(cpu);
2872                 ret = EXCP_INTERRUPT;
2873                 break;
2874             }
.. ...
2973     } while (ret == 0);
2974
2975     cpu_exec_end(cpu);
2976     bql_lock();
2977
2978     if (ret < 0) {
2979         cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
2980         vm_stop(RUN_STATE_INTERNAL_ERROR);
2981     }
2982
2983     qatomic_set(&cpu->exit_request, 0);
2984     return ret;
2985 }


7. The QEMU AP vCPU thread enters into KVM_RUN again. Same flow as step 4, goto
step 4, again and again.

The lock has been frequently acquired/released. The vCPU 0 is unhappy with it,
especially when the number of APs is large!

I guess it is not an issue after VM reboot (without QEMU instance re-creation
because the mpstate is not KVM_MP_STATE_UNINITIALIZED any longer).


Thank you very much!

Dongli Zhang

> 
>> 2858 int kvm_cpu_exec(CPUState *cpu)
>> 2859 {
>> 2860     struct kvm_run *run = cpu->kvm_run;
>> 2861     int ret, run_ret;
>> ... ...
>> 3023         default:
>> 3024             DPRINTF("kvm_arch_handle_exit\n");
>> 3025             ret = kvm_arch_handle_exit(cpu, run);
>> 3026             break;
>> 3027         }
>> 3028     } while (ret == 0);
>> 3029
>> 3030     cpu_exec_end(cpu);
>> 3031     qemu_mutex_lock_iothread();
>> 3032
>> 3033     if (ret < 0) {
>> 3034         cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
>> 3035         vm_stop(RUN_STATE_INTERNAL_ERROR);
>> 3036     }
>> 3037
>> 3038     qatomic_set(&cpu->exit_request, 0);
>> 3039     return ret;
>> 3040 }