linux-kernel - Re: [PATCH v5 1/3] arm64/ras: support sea error recovery

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5A70C536.7040208@arm.com>
Date:   Tue, 30 Jan 2018 19:19:18 +0000
From:   James Morse <james.morse@....com>
To:     Xie XiuQi <xiexiuqi@...wei.com>
CC:     catalin.marinas@....com, will.deacon@....com, mingo@...hat.com,
        mark.rutland@....com, ard.biesheuvel@...aro.org,
        Dave.Martin@....com, takahiro.akashi@...aro.org,
        tbaicar@...eaurora.org, stephen.boyd@...aro.org, bp@...e.de,
        julien.thierry@....com, shiju.jose@...wei.com,
        zjzhang@...eaurora.org, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org, linux-acpi@...r.kernel.org,
        wangxiongfeng2@...wei.com, zhengqiang10@...wei.com,
        gengdongjiu@...wei.com, huawei.libin@...wei.com,
        wangkefeng.wang@...wei.com, lijinyue@...wei.com,
        guohanjun@...wei.com, hanjun.guo@...aro.org,
        cj.chengjian@...wei.com
Subject: Re: [PATCH v5 1/3] arm64/ras: support sea error recovery

Hi Xie XiuQi,

On 26/01/18 12:31, Xie XiuQi wrote:
> With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors
> are consumed. According to the existing process, errors occurred in the
> kernel, leading to direct panic, if it occurred the user-space, we should
> just kill process.
> 
> But there is a class of error, in fact, is not necessary to kill
> process, you can recover and continue to run the process. Such as
> the instruction data corrupted, where the memory page might be
> read-only, which is has not been modified, the disk might have the
> correct data, so you can directly drop the page, ant reload it when
> necessary.

With firmware-first support, we do all this...


> So this patchset is just try to solve such problem: if the error is
> consumed in user-space and the error occurs on a clean page, you can
> directly drop the memory page without killing process.
> 
> If the corrupted page is clean, just dropped it and return to user-space
> without side effects. And if corrupted page is dirty, memory_failure()
> will send SIGBUS with code=BUS_MCEERR_AR. While without this patchset,
> do_sea() will just send SIGBUS, so the process was killed in the same place.

... but this happens too. I agree its something we should fix, but I don't think
this is the best way to do it.

This series is pulling the memory-failure-queue details back into the arch-code
to build a second list, that gets processed as extra work when we return to
user-space.


The root of the issue is ghes_notify_sea() claims the notification as something
APEI has dealt with, ... but it hasn't done it yet. The signals will be
generated by something currently stuck in a queue. (Evidently x86 doesn't handle
synchronous errors like this using firmware-first).

I think a smaller fix is to give the queues that may be holding the
memory_failure() work a kick as part of the code that calls ghes_notify_sea().
This means that by the time we return to do_sea() ghes_notify_sea()'s claim that
APEI has dealt with it is true as any generated signals are pending. We can then
skip the existing SIGBUS generation code.


> Because memory_failure() may sleep, we can not call it directly in SEA

(this one is more serious, I've attempted to fix it by moving all NMI-like
GHES-notifications to use the estatus queue).


> exception context. So we saved faulting physical address associated with
> a process in the ghes handler and set __TIF_SEA_NOTIFY. When we return
> from SEA exception context and get into do_notify_resume() before the
> process running, we could check it and call memory_failure() to do
> recovery.

> It's safe, because we are in process context.

I think this is the trick. When we take a Synchronous-external-abort out of
userspace, we're in process context too. We can add helpers to drain the
memory_failure_queue which can be called when do_sea() when we know we're
preemptible and interrupts-et-al are unmasked.


Thanks,

James


[0] https://www.spinics.net/lists/linux-acpi/msg80149.html

> ---
>  arch/arm64/Kconfig                   |  11 +++
>  arch/arm64/include/asm/ras.h         |  23 ++++++
>  arch/arm64/include/asm/thread_info.h |   4 +-
>  arch/arm64/kernel/Makefile           |   1 +
>  arch/arm64/kernel/ras.c              | 142 +++++++++++++++++++++++++++++++++++
>  arch/arm64/kernel/signal.c           |   7 ++
>  arch/arm64/mm/fault.c                |  27 +++++--
>  drivers/acpi/apei/ghes.c             |   8 +-
>  include/acpi/ghes.h                  |   3 +
>  9 files changed, 216 insertions(+), 10 deletions(-)
>  create mode 100644 arch/arm64/include/asm/ras.h
>  create mode 100644 arch/arm64/kernel/ras.c