[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190905082503.GB4320@e113682-lin.lund.arm.com>
Date: Thu, 5 Sep 2019 10:25:03 +0200
From: Christoffer Dall <christoffer.dall@....com>
To: Peter Maydell <peter.maydell@...aro.org>
Cc: Marc Zyngier <maz@...nel.org>,
Daniel P . Berrangé <berrange@...hat.com>,
Heinrich Schuchardt <xypron.glpk@....de>,
lkml - Kernel Mailing List <linux-kernel@...r.kernel.org>,
Stefan Hajnoczi <stefanha@...hat.com>,
kvmarm@...ts.cs.columbia.edu,
arm-mail-list <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be
decoded
On Thu, Sep 05, 2019 at 09:16:54AM +0100, Peter Maydell wrote:
> On Thu, 5 Sep 2019 at 09:04, Marc Zyngier <maz@...nel.org> wrote:
> > How can you tell that the access would fault? You have no idea at that
> > stage (the kernel doesn't know about the MMIO ranges that userspace
> > handles). All you know is that you're faced with a memory access that
> > you cannot emulate in the kernel. Injecting a data abort at that stage
> > is not something that the architecture allows.
>
> To be fair, locking up the whole CPU (which is effectively
> what the kvm_err/ENOSYS is going to do to the VM) isn't
> something the architecture allows either :-)
>
> > Of course, the best thing would be to actually fix the guest so that
> > it doesn't use non-emulatable MMIO accesses. In general, that the sign
> > of a bug in low-level accessors.
>
> This is true, but the problem is that barfing out to userspace
> makes it harder to debug the guest because it means that
> the VM is immediately destroyed, whereas AIUI if we
> inject some kind of exception then (assuming you're set up
> to do kernel-debug via gdbstub) you can actually examine
> the offending guest code with a debugger because at least
> your VM is still around to inspect...
>
Is it really going to be easier to debug a guest that sees behavior
which may not be architecturally correct? For example, seeing a data
abort on an access to an MMIO region because the guest used a strange
instruction?
I appreaciate that the current way we handle this is confusing and has
led many people down a rabbit hole, so we should do better.
Would a better approach not be to return to userspace saying, "we can't
handle this in the kernel, you decide", without printing the dubious
kernel error message. Then user space could suspend the VM and print a
lenghty explanation of all the possible problems there could be, or
re-inject something back into the guest, or whatever, for a particular
environment.
Thoughts?
Thanks,
Christoffer
Powered by blists - more mailing lists