linux-kernel - Re: [PATCH 0/4] Process some MMIO-related errors without KVM exit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20240927121328.GA37012@dev-dsk-iorlov-1b-d2eae488.eu-west-1.amazon.com>
Date: Fri, 27 Sep 2024 12:13:28 +0000
From: Ivan Orlov <iorlov@...zon.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Jack Allister <jalliste@...zon.co.uk>, Ivan Orlov <iorlov@...zon.co.uk>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
	"bp@...en8.de" <bp@...en8.de>, "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>, "hpa@...or.com" <hpa@...or.com>,
	"mingo@...hat.com" <mingo@...hat.com>, "tglx@...utronix.de"
	<tglx@...utronix.de>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
	"nh-open-source@...zon.com" <nh-open-source@...zon.com>, "shuah@...nel.org"
	<shuah@...nel.org>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH 0/4] Process some MMIO-related errors without KVM exit

On Wed, Sep 25, 2024 at 05:06:45PM -0700, Sean Christopherson wrote:
> 
> Yeah, but only because the alternative sucks worse.  If KVM unconditionally exited
> with an emulation error, then unsuspecting (read: old) VMMs would likely terminate
> the guest, which gives guest userspace a way to DoS the entire VM, especially on
> older CPUs where KVM needs to emulate much more often.
> 
>         if (kvm->arch.exit_on_emulation_error ||
>             (emulation_type & EMULTYPE_SKIP)) {
>                 prepare_emulation_ctxt_failure_exit(vcpu);
>                 return 0;
>         }
> 
>         kvm_queue_exception(vcpu, UD_VECTOR);
> 
>         if (!is_guest_mode(vcpu) && kvm_x86_call(get_cpl)(vcpu) == 0) {
>                 prepare_emulation_ctxt_failure_exit(vcpu);
>                 return 0;
>         }
> 
>         return 1;
> 
> And that's exactly why KVM_CAP_EXIT_ON_EMULATION_FAILURE exists.  VMMs that know
> they won't unintentionally give guest userspace what amounts to a privilege
> escalation can trap the emulation failure, do some logging or whatever, and then
> take whatever action it wants to take.
> 

Hi Sean,

Makes sense, thank you for the explanation.

> > and I believe how we do this
> > is debatable. I maintain we should either set a flag in emulation_failure.flags
> > to indicate that the error happened due to fetch from mmio (to give more
> > information to VMM),
> 
> Generally speaking, I'm not opposed to adding more information along those lines.
> Though realistically, I don't know that an extra flag is warranted in this case,
> as it shouldn't be _that_ hard for userspace to deduce what went wrong, especially
> if KVM_TRANSLATE2[*] lands (though I'm somewhat curious as to why QEMU doesn't do
> the page walks itself).
> 
> [*] https://lore.kernel.org/all/20240910152207.38974-1-nikwip@amazon.de
> 

Fair enough, but I still believe that it would be good to provide more
information about the failure to the VMM (considering the fact that KVM
tries to emulate an instruction anyway, adding a flag won't introduce any
performance overhead). I'll think about how we could do that without
being redundant :)

> > or we shouldn't return an error at all... Maybe it should be KVM_EXIT_MMIO with
> > some flag set? What do you think?
> 
> It'd be a breaking change and added complexity, for no benefit as far as I can
> tell.  KVM_EXIT_INTERNAL_ERROR is _not_ a death sentence, or at least it doesn't
> have to be.  Most VMMs do terminate the guest, but nothing is stopping userspace
> from grabbing RIP and emulating the instruction.  I.e. userspace doesn't need
> "permission" in the form of KVM_EXIT_MMIO to try and keep the guest alive.

Yeah, I just thought that "internal error" is not the best exit code for
the situations when guest fetches from MMIO (since it is a perfectly legal
operation from the architectural point of view). But I agree that it
would be a breaking change without functional benefit ( especially if we
provide a flag about what happened :) ).

P.S. I tested the latest kvm/next, and if we set the IDT descriptor base to
an MMIO address it still falls into the infinite loop on SVM. I'm going
to send the fix in the next couple of days.

Kind regards,
Ivan Orlov