lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 22 Jan 2018 19:32:12 +0000
From:   James Morse <james.morse@....com>
To:     gengdongjiu <gengdj.1984@...il.com>
CC:     gengdongjiu <gengdongjiu@...wei.com>,
        wuquanming <wuquanming@...wei.com>, linux-doc@...r.kernel.org,
        kvm@...r.kernel.org, Marc Zyngier <marc.zyngier@....com>,
        Catalin Marinas <catalin.marinas@....com>,
        Jonathan Corbet <corbet@....net>, rjw@...ysocki.net,
        linux@...linux.org.uk, linuxarm@...wei.com,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-acpi@...r.kernel.org, bp@...en8.de,
        arm-mail-list <linux-arm-kernel@...ts.infradead.org>,
        Huangshaoyu <huangshaoyu@...wei.com>, pbonzini@...hat.com,
        kvmarm@...ts.cs.columbia.edu, devel@...ica.org
Subject: Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization

Hi gengdongjiu,

On 21/01/18 02:45, gengdongjiu wrote:
> For the ESR_ELx_AET_UER, this exception is precise, closing the VM may
> be better[1].
> But if you think panic is better until we support kernel-first, it is
> also OK to me.

I'm not convinced SError while a guest was running means only guest memory could
be affected. Mechanisms like KSM means the error could affect multiple guests.

Both firmware-fist and kernel-first will give us the address, with which we can
know which processes are affected, isolated the memory and signal affected
processes.

Until we have one of these panic() is the only way we have to contain an error,
but its an interim fix.
Not panic()ing the host for an error that should be contained to the guest is a
fudge, we don't actually know its safe (KSM, page-table etc). I want to improve
on this with {firmware, kernel}-first support (or both!), I don't want to expose
that this is happening to user-space, as once we have one of {firmware,
kernel}-first, it shouldn't happen.


>> This is inventing something new for RAS errors not claimed by firmware-first.
>> If we have kernel-first too, this will never happen. (unless your system is
>> losing the error description).

> In fact, if we have kernel-first, I think we still need to judge the
> error type by ESR, right?

The kernel-first mechanism should consider the ESR/FAR, yes, but once the error
has been claimed and handled, KVM shouldn't care about any of these values.
(maybe we'll sanity check for uncontained errors, just in case the error escaped
to the RAS code...)

My point here was exposing 'unhandled' (ignored) RAS errors to user-space
creates an ABI: someone will complain once we start handling the error, and they
no longer get a notification via this 'unhandled' interface. Code written to use
this interface becomes useless/untested.


> If the handle_guest_sei() , may be the system does not support firmware-first,
> so we judge the ESR value,

...and panic()/ignore as appropriate.

I agree not all systems will support firmware-first, (big-endian is the obvious
example), but if we get kernel-first support this ESR guessing can disappear,
I'm against exposing it to user-space in the meantime.


Thanks,

James

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ