linux-kernel - Re: QEMU's Hyper-V HV_X64_MSR_EOM is broken with split IRQCHIP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87ikoposs6.fsf@redhat.com>
Date: Tue, 04 Mar 2025 13:59:53 +0100
From: Vitaly Kuznetsov <vkuznets@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org, Paolo Bonzini
 <pbonzini@...hat.com>, Peter Xu <peterx@...hat.com>, Maciej S. Szmigiero
 <maciej.szmigiero@...cle.com>
Subject: Re: QEMU's Hyper-V HV_X64_MSR_EOM is broken with split IRQCHIP

Sean Christopherson <seanjc@...gle.com> writes:

> FYI, QEMU's Hyper-V emulation of HV_X64_MSR_EOM has been broken since QEMU commit
> c82d9d43ed ("KVM: Kick resamplefd for split kernel irqchip"), as nothing in KVM
> will forward the EOM notification to userspace.  I have no idea if anything in
> QEMU besides hyperv_testdev.c cares.

The only VMBus device in QEMU besides the testdev seems to be Hyper-V
ballooning driver, Cc: Maciej to check whether it's a real problem for
it or not.

>
> The bug is reproducible by running the hyperv_connections KVM-Unit-Test with a
> split IRQCHIP.

Thanks, I can reproduce the problem too.

>
> Hacking QEMU and KVM (see KVM commit 654f1f13ea56 ("kvm: Check irqchip mode before
> assign irqfd") as below gets the test to pass.  Assuming that's not a palatable
> solution, the other options I can think of would be for QEMU to intercept
> HV_X64_MSR_EOM when using a split IRQCHIP, or to modify KVM to do KVM_EXIT_HYPERV_SYNIC
> on writes to HV_X64_MSR_EOM with a split IRQCHIP.

AFAIR, Hyper-V message interface is a fairly generic communication
mechanism which in theory can be used without interrupts at all: the
corresponding SINT can be masked and the guest can be polling for
messages, proccessing them and then writing to HV_X64_MSR_EOM to trigger
delivery on the next queued message. To support this scenario on the
backend, we need to receive HV_X64_MSR_EOM writes regardless of whether
irqchip is split or not. (In theory, we can get away without this by
just checking if pending messages can be delivered upon each vCPU entry
but this can take an undefined amount of time in some scenarios so I
guess we're better off with notifications).

>
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index c65b790433..820bc1692e 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2261,10 +2261,9 @@ static int kvm_irqchip_assign_irqfd(KVMState *s, EventNotifier *event,
>               * the INTx slow path).
>               */
>              kvm_resample_fd_insert(virq, resample);
> -        } else {
> -            irqfd.flags |= KVM_IRQFD_FLAG_RESAMPLE;
> -            irqfd.resamplefd = rfd;
>          }
> +        irqfd.flags |= KVM_IRQFD_FLAG_RESAMPLE;
> +        irqfd.resamplefd = rfd;
>      } else if (!assign) {
>          if (kvm_irqchip_is_split()) {
>              kvm_resample_fd_remove(virq);
>
>
> diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
> index 63f66c51975a..0bf85f89eb27 100644
> --- a/arch/x86/kvm/irq.c
> +++ b/arch/x86/kvm/irq.c
> @@ -166,9 +166,7 @@ void __kvm_migrate_timers(struct kvm_vcpu *vcpu)
>  
>  bool kvm_arch_irqfd_allowed(struct kvm *kvm, struct kvm_irqfd *args)
>  {
> -       bool resample = args->flags & KVM_IRQFD_FLAG_RESAMPLE;
> -
> -       return resample ? irqchip_kernel(kvm) : irqchip_in_kernel(kvm);
> +       return irqchip_in_kernel(kvm);
>  }
>  
>  bool kvm_arch_irqchip_in_kernel(struct kvm *kvm)
>
>

-- 
Vitaly