lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <58E7B6BD.3000401@arm.com>
Date:   Fri, 07 Apr 2017 16:56:45 +0100
From:   James Morse <james.morse@....com>
To:     Xie XiuQi <xiexiuqi@...wei.com>
CC:     christoffer.dall@...aro.org, marc.zyngier@....com,
        catalin.marinas@....com, will.deacon@....com, fu.wei@...aro.org,
        rostedt@...dmis.org, hanjun.guo@...aro.org, shiju.jose@...wei.com,
        linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.cs.columbia.edu,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-acpi@...r.kernel.org, gengdongjiu@...wei.com,
        zhengqiang10@...wei.com, wuquanming@...wei.com,
        wangxiongfeng2@...wei.com,
        Wang Xiongfeng <wangxiongfengi2@...wei.com>
Subject: Re: [PATCH v3 8/8] arm64: exception: check shared writable page in
 SEI handler

Hi Xie XiuQi,

On 30/03/17 11:31, Xie XiuQi wrote:
> From: Wang Xiongfeng <wangxiongfeng2@...wei.com>
> 
> Since SEI is asynchronous, the error data has been consumed. So we must
> suppose that all the memory data current process can write are
> contaminated. If the process doesn't have shared writable pages, the
> process will be killed, and the system will continue running normally.
> Otherwise, the system must be terminated, because the error has been
> propagated to other processes running on other cores, and recursively
> the error may be propagated to several another processes.

This is pretty complicated. We can't guarantee that another CPU hasn't modified
the page tables while we do this, (so its racy). We can't guarantee that the
corrupt data hasn't been sent over the network or written to disk in the mean
time (so its not enough).

The scenario you have is a write of corrupt data to memory where another CPU
reading it doesn't know the value is corrupt.

The hardware gives us quite a lot of help containing errors. The RAS
specification (DDI 0587A) describes your scenario as error propagation in '2.1.2
Architectural error propagation', and then classifies it in '2.1.3
Architecturally infected, containable and uncontainable' as uncontained because
the value is no longer in the general-purpose registers. For uncontained errors
we should panic().

We shouldn't need to try to track errors after we get a notification as the
hardware has done this for us.


Firmware-first does complicate this if events like this are not delivered using
a synchronous external abort, as Linux may have PSTATE.A masked preventing
SError Interrupts from being taken. It looks like PSTATE.A is masked much more
often than is necessary. I will look into cleaning this up.


Thanks,

James

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ