linux-kernel - Re: [PATCH v3 8/8] arm64: exception: check shared writable page in SEI handler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5b914702-7262-54d5-2f64-0521a04add03@huawei.com>
Date:   Wed, 12 Apr 2017 16:35:07 +0800
From:   Xiongfeng Wang <wangxiongfeng2@...wei.com>
To:     James Morse <james.morse@....com>, Xie XiuQi <xiexiuqi@...wei.com>
CC:     <christoffer.dall@...aro.org>, <marc.zyngier@....com>,
        <catalin.marinas@....com>, <will.deacon@....com>,
        <fu.wei@...aro.org>, <rostedt@...dmis.org>,
        <hanjun.guo@...aro.org>, <shiju.jose@...wei.com>,
        <linux-arm-kernel@...ts.infradead.org>,
        <kvmarm@...ts.cs.columbia.edu>, <kvm@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <linux-acpi@...r.kernel.org>,
        <gengdongjiu@...wei.com>, <zhengqiang10@...wei.com>,
        <wuquanming@...wei.com>,
        Wang Xiongfeng <wangxiongfengi2@...wei.com>
Subject: Re: [PATCH v3 8/8] arm64: exception: check shared writable page in
 SEI handler

Hi James,


On 2017/4/7 23:56, James Morse wrote:
> Hi Xie XiuQi,
> 
> On 30/03/17 11:31, Xie XiuQi wrote:
>> From: Wang Xiongfeng <wangxiongfeng2@...wei.com>
>>
>> Since SEI is asynchronous, the error data has been consumed. So we must
>> suppose that all the memory data current process can write are
>> contaminated. If the process doesn't have shared writable pages, the
>> process will be killed, and the system will continue running normally.
>> Otherwise, the system must be terminated, because the error has been
>> propagated to other processes running on other cores, and recursively
>> the error may be propagated to several another processes.
> 
> This is pretty complicated. We can't guarantee that another CPU hasn't modified
> the page tables while we do this, (so its racy). We can't guarantee that the
> corrupt data hasn't been sent over the network or written to disk in the mean
> time (so its not enough).
> 
> The scenario you have is a write of corrupt data to memory where another CPU
> reading it doesn't know the value is corrupt.
> 
> The hardware gives us quite a lot of help containing errors. The RAS
> specification (DDI 0587A) describes your scenario as error propagation in '2.1.2
> Architectural error propagation', and then classifies it in '2.1.3
> Architecturally infected, containable and uncontainable' as uncontained because
> the value is no longer in the general-purpose registers. For uncontained errors
> we should panic().
> 
> We shouldn't need to try to track errors after we get a notification as the
> hardware has done this for us.
> 
Thanks for your comments. I think what you said is reasonable. We will remove this
patch and use AET fields of ESR_ELx to determine whether we should kill current
process or just panic.
> 
> Firmware-first does complicate this if events like this are not delivered using
> a synchronous external abort, as Linux may have PSTATE.A masked preventing
> SError Interrupts from being taken. It looks like PSTATE.A is masked much more
> often than is necessary. I will look into cleaning this up.
> 
> 
> Thanks,
> 
> James
> 
> .
> 
Thanks,
Wang Xiongfeng