linux-kernel - Re: [PATCH 0/4] Process some MMIO-related errors without KVM exit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240924095422.GA66922@dev-dsk-iorlov-1b-d2eae488.eu-west-1.amazon.com>
Date: Tue, 24 Sep 2024 09:54:22 +0000
From: Ivan Orlov <iorlov@...zon.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Jack Allister <jalliste@...zon.co.uk>, Ivan Orlov <iorlov@...zon.co.uk>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
	"bp@...en8.de" <bp@...en8.de>, "dave.hansen@...ux.intel.com"
	<dave.hansen@...ux.intel.com>, "hpa@...or.com" <hpa@...or.com>,
	"mingo@...hat.com" <mingo@...hat.com>, "tglx@...utronix.de"
	<tglx@...utronix.de>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
	"nh-open-source@...zon.com" <nh-open-source@...zon.com>, "shuah@...nel.org"
	<shuah@...nel.org>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"x86@...nel.org" <x86@...nel.org>
Subject: Re: [PATCH 0/4] Process some MMIO-related errors without KVM exit

On Mon, Sep 23, 2024 at 02:46:17PM -0700, Sean Christopherson wrote:
 > >
> > > No.  This is not architectural behavior.  It's not even remotely
> > > close to
> > > architectural behavior.  KVM's behavior isn't great, but making up
> > > _guest visible_
> > > behavior is not going to happen.
> >
> > Is this a no to the whole series or from the cover letter?
> 
> The whole series.
> 
> > For patch 1 we have observed that if a guest has incorrectly set it's
> > IDT base to point inside of an MMIO region it will result in a triple
> > fault (bare metal Cascake Lake Intel).
> 
> That happens because the IDT is garbage and/or the CPU is getting master abort
> semantics back, not because anything in the x86 architectures says that accessing
> MMIO during exception vectoring goes straight to shutdown.
>

Hi Sean, thank you for the detailed reply.

Yes, I ran the reproducer on my AMD Ryzen 5 once again, and it seems like
pointing the IDT descriptor base to a framebuffer works perfectly fine,
without any triple faults, so injecting it into guest is not a correct
solution.

However, I believe KVM should demonstrate consistent behaviour in
handling MMIO during event delivery on vmx/svm, so either by returning
an event delivery error in both cases or going into infinite loop in
both cases. I'm going to test the kvm/next with the commits you
mentioned, and send a fix if it still hits an infinite loop on SVM.

Also, I reckon as detecting such an issue on the KVM level doesn't
introduce much complexity, returning a sane error flag would be nice. For
instance, we could set one of the 'internal.data' elements to identify
that the problem occured due to MMIO during event delivery

> > Yes a sane operating system is not really going to be doing setting it's IDT
> > or GDT base to point into an MMIO region, but we've seen occurrences.
> > Normally when other external things have gone horribly wrong.
> >
> > Ivan can clarify as to what's been seen on AMD platforms regarding the
> > infinite loop for patch one. This was also tested on bare metal
> > hardware. Injection of the #UD within patch 2 may be debatable but I
> > believe Ivan has some more data from experiments backing this up.
> 
> I have no problems improving KVM's handling of scenarios that KVM can't emulate,
> but there needs to be reasonable justification for taking on complexity, and KVM
> must not make up guest visible behavior.

Regarding the #UD-related change: the way how I formulated it in the
cover letter and the patch is confusing, sorry. We are _alredy_ enqueuing
an #UD when fetching from MMIO, so I believe it is already architecturally
incorrect (see handle_emulation_failure in arch/x86/kvm/x86.c). However,
we return an emulation failure after that, and I believe how we do this
is debatable. I maintain we should either set a flag in emulation_failure.flags
to indicate that the error happened due to fetch from mmio (to give more
information to VMM), or we shouldn't return an error at all... Maybe it
should be KVM_EXIT_MMIO with some flag set? What do you think?

Thank you!

Kind regards,
Ivan Orlov