linux-kernel - Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAFEAcA-3ne3Z0dwz9C9kJmk36_AdNJRuqgB1jzFJ0WUB2NT_iQ@mail.gmail.com>
Date:   Thu, 5 Sep 2019 09:32:23 +0100
From:   Peter Maydell <peter.maydell@...aro.org>
To:     Christoffer Dall <christoffer.dall@....com>
Cc:     Marc Zyngier <maz@...nel.org>,
        Daniel P . Berrangé <berrange@...hat.com>,
        Heinrich Schuchardt <xypron.glpk@....de>,
        lkml - Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Stefan Hajnoczi <stefanha@...hat.com>,
        kvmarm@...ts.cs.columbia.edu,
        arm-mail-list <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH 1/1] KVM: inject data abort if instruction cannot be decoded

On Thu, 5 Sep 2019 at 09:25, Christoffer Dall <christoffer.dall@....com> wrote:
>
> On Thu, Sep 05, 2019 at 09:16:54AM +0100, Peter Maydell wrote:
> > This is true, but the problem is that barfing out to userspace
> > makes it harder to debug the guest because it means that
> > the VM is immediately destroyed, whereas AIUI if we
> > inject some kind of exception then (assuming you're set up
> > to do kernel-debug via gdbstub) you can actually examine
> > the offending guest code with a debugger because at least
> > your VM is still around to inspect...
> >
>
> Is it really going to be easier to debug a guest that sees behavior
> which may not be architecturally correct?  For example, seeing a data
> abort on an access to an MMIO region because the guest used a strange
> instruction?

Yeah, a data abort is not ideal. You could UNDEF the insn, which
probably is more likely to result in getting control in the
debugger I suppose.

As for whether it's going to be easier to debug, for the
user who reported this in the first place it certainly was.
(Consider even a simple Linux guest not under a debugger --
if we UNDEF the insn the guest kernel will print a helpful
backtrace so you can tell where the problem is; at the moment
we just print a register dump from the host kernel, which is a
lot less informative.)

> I appreaciate that the current way we handle this is confusing and has
> led many people down a rabbit hole, so we should do better.
>
> Would a better approach not be to return to userspace saying, "we can't
> handle this in the kernel, you decide", without printing the dubious
> kernel error message.

Printing the message in the kernel is the best clue we give
the user at the moment that they've run into this problem;
I would be wary of removing it (even if we decide to also
do something else).

> Then user space could suspend the VM and print a
> lenghty explanation of all the possible problems there could be, or
> re-inject something back into the guest, or whatever, for a particular
> environment.

In theory I guess so. In practice that's not what userspace
currently in the wild does, and injecting an exception from
userspace is a bit awkward (I dunno if kvmtool does it,
QEMU only needs to in really obscure circumstances and
was buggy in how it tried to do it until very recently)...

thanks
-- PMM