lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAFEAcA9nGHFCuocGv+XLTStOZMONNK1SFYZ=uxhVN+KRU6BG2w@mail.gmail.com>
Date:   Thu, 10 Jan 2019 13:25:50 +0000
From:   Peter Maydell <peter.maydell@...aro.org>
To:     gengdongjiu <gengdongjiu@...wei.com>
Cc:     James Morse <james.morse@....com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        Jonathan Corbet <corbet@....net>,
        Christoffer Dall <christoffer.dall@....com>,
        Marc Zyngier <marc.zyngier@....com>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will.deacon@....com>,
        kvm-devel <kvm@...r.kernel.org>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        lkml - Kernel Mailing List <linux-kernel@...r.kernel.org>,
        arm-mail-list <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [RFC RESEND PATCH] kvm: arm64: export memory error recovery
 capability to user space

On Thu, 10 Jan 2019 at 12:09, gengdongjiu <gengdongjiu@...wei.com> wrote:
> Peter, I summarize James's main idea, James think QEMU does not needs
> to check *something* if Qemu support firmware-first.
> What do we do for your comments?

Unless I'm missing something, the code in your most recent patchset
attempts to update an ACPI table when it gets the SIGBUS from the
host kernel without doing anything to check whether it has ever
created the ACPI table (and set up the QEMU global variable that
tells the code where it is in the guest memory) in the first place.
I don't see how that can work.

> >> I think one question here which it would be good to answer is:
> >> if we are modelling a guest and we haven't specifically provided
> >> it an ACPI table to tell it about memory errors, what do we do
> >> when we get a sigbus from the host? We have basically two choices:
> >>  (1) send the guest an SError (aka asynchronous external abort)
> >>      anyway (with no further info about what the memory error is)
> >
> > For an AR signal an external abort is valid. Its up to the implementation
> > whether these are synchronous or asynchronous. Qemu can only take a signal for
> > something that was synchronous, so you can choose between the two.
> > Synchronous external abort is marginally better as an unaware OS knows its
> > affects this thread, and may be able to kill it.
> > SError with an imp-def ESR is indistinguishable from 'part of the soc fell out',
> > and should always result in a panic().
> >
> >
> >>  (2) just stop QEMU (as we would for a memory error in QEMU's
> >>      own memory)
> >
> > This is also valid. A machine may take external-abort to EL3 and then
> > reboot/crash/burn.

We should decide which of these we want to do, and have a comment
explaining what we're doing. If I'm reading your current patchset
correctly, it does neither -- if it can't record the fault in the
ACPI table it just ignores it without either stopping QEMU or
delivering an SError.

I think I favour option (2) here.

thanks
-- PMM

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ