lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z4kkuaY_mJ6z0sa2@google.com>
Date: Thu, 16 Jan 2025 15:24:41 +0000
From: Sean Christopherson <seanjc@...gle.com>
To: chichen241 <chichen241@...il.com>
Cc: Greg KH <gregkh@...uxfoundation.org>, 
	Linus Torvalds <torvalds@...uxfoundation.org>, Paolo Bonzini <pbonzini@...hat.com>, 
	"security@...nel.org" <security@...nel.org>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Potential Denial-of-Service Vulnerability in KVM When Emulating
 'hlt' Instruction in L2 Guests

+KVM and LKML to for archival, as this is not a DoS

On Thu, Jan 16, 2025, chichen241 wrote:
> It seems that the attachment content is not convenient for you to see, so I
> will reuse the email content to describe it.

...

> syz_kvm_setup_cpu(/*fd=*/vmfd, /*cpufd=*/vcpufd, /*usermem=*/mem,
> /*text=*/&nop_text, /*ntext*/ 1,/*flags=*/-1, /*opts=*/opts, /*nopt=*/1); //
> The nested vm will run '\x90\xf4', the vm will try to emulate the hlt
> instruction and fail, entry endless loop.  ioctl(vcpufd, KVM_RUN, NULL);
> printf("The front kvm_run will caught in loop. This code will not be
> executed") } ```
> linux kernel version: 6.12-rc7
> Also I checked my mailbox and didn't see any quesiton from Sean. Maybe there's some mistake?

For posterity:

  > > virtualization. When an L2 guest attempts to emulate an instruction
  
  How did you coerce KVM into emulating HLT from L2?
  
  > > using the x86_emulate_instruction() function, and the instruction to
  > > be emulated is hlt, the x86_decode_emulated_instruction() function
  > > used for instruction decoding does not support parsing the hlt
  > > instruction.
  
  KVM should parse HLT just fine, I suspect the issue is that KVM _intentionally_
  refuses to emulate HLT from L2, because encountering HLT in the emulator when L2
  is active either requires the guest to be playing TLB games (e.g. generate an
  emulated MMIO exit on a MOV, patch the MOV into a HLT), or it requires enabling
  an off-by-default, "for testing purposes only" KVM module param.
  
  > > As a result, x86_decode_emulated_instruction() returns
  > > ctxt->execute as null, causing the L2 guest to fail to execute the hlt
  > > instruction properly. Subsequently, KVM enters an infinite loop,
  
  Define "infinite loop", i.e. what are the bounds of the loop?  If the "loop" is
  KVM re-entering the guest on the same instruction over and over, then everything
  is working as intended.
  
  > > repeatedly invoking x86_emulate_instruction() to perform the same
  > > operation. This issue does not occur when the instruction to be
  > > emulated by L2 is another standard instruction.
  > >
  > > Therefore, I am wondering whether this constitutes a denial-of-service
  > > (DoS) vulnerability and whether a CVE number can be assigned.
  
  Unless your reproducer causes a hard hang in KVM, or prevents L1 from gaining
  control from L2, e.g. via a (virtual) interrupt, this is not a DoS.  I can imagine
  scenarios where L2 can put itself into an infinite loop, i.e. DoS itself, but
  that's not a vulnerability in any reasonable sense of things.
  
  > > Generally, for software emulation in L1 guests, KVM's
  > > x86_emulate_instruction() function will, after parsing the instruction
  > > with x86_decode_emulated_instruction(), attempt to use
  > > retry_instruction() to retry instruction execution.
  
  No, retry_instruction() is specifically for cases where KVM fails to emulate an
  instruction _and_ the emulation was triggered by a write to guest PTE that KVM
  is shadowing, i.e. a guest page that KVM has made read-only.  If certain criteria
  were met, KVM will unprotect the page, i.e. make it writable again, and resume
  the guest to let the CPU retry the instruction.
 
> ## DESCRIPTION in this file, the most code is from
> syzkaller(executor/common_kvm_amd64.h), I mainly call the `syz_kvm_setup_cpu`
> function and run the vm using ioctl `kvm_run`.  First I use
> `syz_kvm_setup_cpu` to setup the vm to run a nested vm.  The second time the
> `syz_kvm_setup_cpu` will turn on the TF bit in the eflag register of the
> nested vm and let the nested vm run `nop;hlt` code.
> When running kvm_run, the code will begin looping.
> ## ANALYSE
> The nested vm try to emulate the `hlt` code but failed, it will always try, caught in an endless loop.

The guest loops because the the guest's IDT is located in emulated MMIO space,
and as suspected above, KVM refuses to emulates HLT for L2.

The single-step #DB induced by RFLAGS.TF=1 triggers an EPT Violation as a result
of the CPU trying to vector the #DB with the IDT residing in non-existent memory.
At this point KVM *should* kick out to host userspace, as userspace is responsible
for dealing with the emulate MMIO access during exception vectoring.

           repro-1289    [019] d....   140.314684: kvm_exit: vcpu 0 reason EXCEPTION_NMI rip 0x1 info1 0x0000000000004000 info2 0x0000000000000000 intr_info 0x80000301 error_code 0x00000000
           repro-1289    [019] .....   140.314685: kvm_nested_vmexit: vcpu 0 reason EXCEPTION_NMI rip 0x1 info1 0x0000000000004000 info2 0x0000000000000000 intr_info 0x80000301 error_code 0x00000000
           repro-1289    [019] .....   140.314688: kvm_inj_exception: #DB
           repro-1289    [019] d....   140.314688: kvm_entry: vcpu 0, rip 0x1
           repro-1289    [019] d....   140.314704: kvm_exit: vcpu 0 reason EPT_VIOLATION rip 0x1 info1 0x0000000000000181 info2 0x0000000080000301 intr_info 0x00000000 error_code 0x00000000
           repro-1289    [019] .....   140.314706: kvm_nested_vmexit: vcpu 0 reason EPT_VIOLATION rip 0x1 info1 0x0000000000000181 info2 0x0000000080000301 intr_info 0x00000000 error_code 0x00000000
           repro-1289    [019] .....   140.314706: kvm_page_fault: vcpu 0 rip 0x1 address 0x0000000000001050 error_code 0x181
           repro-1289    [019] .....   140.314708: kvm_inj_exception: #DB [reinjected]
           repro-1289    [019] d....   140.314709: kvm_entry: vcpu 0, rip 0x1

KVM misses the weird edge case, and instead ends up trying to emulate the
instruction at the current RIP.  That instruction happens to be HLT, which KVM
doesn't support for L2 (nested guests), and so KVM injects #UD.

           repro-1289    [019] d....   140.314732: kvm_exit: vcpu 0 reason EPT_VIOLATION rip 0x1 info1 0x00000000000001aa info2 0x0000000080000301 intr_info 0x00000000 error_code 0x00000000
           repro-1289    [019] .....   140.314749: kvm_emulate_insn: 0:1:f4 (prot32)
           repro-1289    [019] .....   140.314751: kvm_emulate_insn: 0:1:f4 (prot32) failed
           repro-1289    [019] .....   140.314752: kvm_inj_exception: #UD

Vectoring the #UD suffers the same fate as the #DB, and so KVM unintentionally
puts the vCPU into an endless loop.

           repro-1289    [019] d....   140.314767: kvm_exit: vcpu 0 reason EPT_VIOLATION rip 0x1 info1 0x00000000000001aa info2 0x0000000080000306 intr_info 0x00000000 error_code 0x00000000
           repro-1289    [019] .....   140.314767: kvm_nested_vmexit: vcpu 0 reason EPT_VIOLATION rip 0x1 info1 0x00000000000001aa info2 0x0000000080000306 intr_info 0x00000000 error_code 0x00000000
           repro-1289    [019] .....   140.314768: kvm_page_fault: vcpu 0 rip 0x1 address 0x0000000000000f78 error_code 0x1aa
           repro-1289    [019] .....   140.314778: kvm_emulate_insn: 0:1:f4 (prot32)
           repro-1289    [019] .....   140.314779: kvm_emulate_insn: 0:1:f4 (prot32) failed

> ## QUESTION
> The phenomenon is due to the kvm's emulate function can't emulate all the
> instructions.

No, the issue is that KVM doesn't detect a weird edge case where the *guest* has
messed up, and instead of effectively terminating the VM, KVM puts it into an
infinite loop of sorts.

Amusingly, this edge case was just "fixed" for both VMX and SVM[*] (expected to
to land in v6.14).  In quotes because "fixing" the problem really means killing
the VM instead of letting it loop.

  [1/7] KVM: x86: Add function for vectoring error generation
        https://github.com/kvm-x86/linux/commit/11c98fa07a79
  [2/7] KVM: x86: Add emulation status for unhandleable vectoring
        https://github.com/kvm-x86/linux/commit/5c9cfc486636
  [3/7] KVM: x86: Unprotect & retry before unhandleable vectoring check
        https://github.com/kvm-x86/linux/commit/704fc6021b9e
  [4/7] KVM: VMX: Handle vectoring error in check_emulate_instruction
        https://github.com/kvm-x86/linux/commit/47ef3ef843c0
  [5/7] KVM: SVM: Handle vectoring error in check_emulate_instruction
        https://github.com/kvm-x86/linux/commit/7bd7ff99110a
  [6/7] selftests: KVM: extract lidt into helper function
        https://github.com/kvm-x86/linux/commit/4e9427aeb957
  [7/7] selftests: KVM: Add test case for MMIO during vectoring
        https://github.com/kvm-x86/linux/commit/62e41f6b4f36

[*] https://lore.kernel.org/all/173457555486.3295983.11848882309599168611.b4-ty@google.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ