linux-kernel - Re: [PATCH V7 04/10] arm64: exception: handle Synchronous External Abort

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5880FD8A.2000405@arm.com>
Date:   Thu, 19 Jan 2017 17:55:22 +0000
From:   James Morse <james.morse@....com>
To:     "Baicar, Tyler" <tbaicar@...eaurora.org>
CC:     christoffer.dall@...aro.org, marc.zyngier@....com,
        pbonzini@...hat.com, rkrcmar@...hat.com, linux@...linux.org.uk,
        catalin.marinas@....com, will.deacon@....com, rjw@...ysocki.net,
        lenb@...nel.org, matt@...eblueprint.co.uk, robert.moore@...el.com,
        lv.zheng@...el.com, nkaje@...eaurora.org, zjzhang@...eaurora.org,
        mark.rutland@....com, akpm@...ux-foundation.org,
        eun.taik.lee@...sung.com, sandeepa.s.prabhu@...il.com,
        labbott@...hat.com, shijie.huang@....com, rruigrok@...eaurora.org,
        paul.gortmaker@...driver.com, tn@...ihalf.com, fu.wei@...aro.org,
        rostedt@...dmis.org, bristot@...hat.com,
        linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.cs.columbia.edu,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-acpi@...r.kernel.org, linux-efi@...r.kernel.org,
        devel@...ica.org, Suzuki.Poulose@....com, punit.agrawal@....com,
        astone@...hat.com, harba@...eaurora.org, hanjun.guo@...aro.org,
        john.garry@...wei.com, shiju.jose@...wei.com
Subject: Re: [PATCH V7 04/10] arm64: exception: handle Synchronous External
 Abort

Hi Tyler,

On 18/01/17 23:26, Baicar, Tyler wrote:
> On 1/17/2017 3:31 AM, James Morse wrote:
>> On 12/01/17 18:15, Tyler Baicar wrote:
>>> SEA exceptions are often caused by an uncorrected hardware
>>> error, and are handled when data abort and instruction abort
>>> exception classes have specific values for their Fault Status
>>> Code.
>>> When SEA occurs, before killing the process, go through
>>> the handlers registered in the notification list.
>>> Update fault_info[] with specific SEA faults so that the
>>> new SEA handler is used.
>>> @@ -480,6 +496,28 @@ static int do_bad(unsigned long addr, unsigned int esr,
>>> struct pt_regs *regs)
>>>       return 1;
>>>   }
>>>   +/*
>>> + * This abort handler deals with Synchronous External Abort.
>>> + * It calls notifiers, and then returns "fault".
>>> + */
>>> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>> +{
>>> +    struct siginfo info;
>>> +
>>> +    atomic_notifier_call_chain(&sea_handler_chain, 0, NULL);
>>> +
>>> +    pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>> +         fault_name(esr), esr, addr);
>>> +
>>> +    info.si_signo = SIGBUS;
>>> +    info.si_errno = 0;
>>> +    info.si_code  = 0;
>> Half of the other do_*() functions in this file read the signo and code from the
>> fault_info table.
>>
>>
>>> +    info.si_addr  = (void __user *)addr;
>> addr here was read from FAR_EL1, but for some of the classes of exception you
>> have listed below this register isn't updated with the faulting address.
>>
>> The ARM-ARM version 'k' in D1.10.5 "Summary of registers on faults taken to an
>> Exception level that is using Aarch64" has:
>>> The architecture permits that the FAR_ELx is UNKNOWN for Synchronous External
>>> Aborts other than Synchronous External Aborts on Translation Table Walks. In
>>> this case, the ISS.FnV bit returned in ESR_ELx  indicates whether FAR_ELx is
>>> valid.
>> This is a problem if we get 'synchronous external abort' or 'synchronous parity
>> error' while a user space process was running.

> It looks like this would just cause an incorrect address to be printed in the
> above pr_err.
> Unless I'm missing something, I don't see arm64_notify_die or anything that gets
> called from
> there using the info.si_addr variable.

I may be misreading something here...

This patch has:
>	info.si_addr  = (void __user *)addr;
>	arm64_notify_die("", regs, &info, esr);

>From arch/arm64/kernel/traps.c:arm64_notify_die():
>	if (user_mode(regs)) {
>		current->thread.fault_address = 0;
>		current->thread.fault_code = err;
>		force_sig_info(info->si_signo, info, current);
>	}

So if the SEA interrupted userspace, we put maybe-unknown addr into
force_sig_info() to deliver a signal to user space. User-space then gets a copy
of the info struct containing the maybe-unknown addr.

I think this is an existing bug, but if we are separating the synchronous
external aborts from the generic do_bad handler, we should probably check the
FnV bit. (I think we should still print it out)


> What do you suggest I do here? The firmware should be reporting the physical and
> virtual
> address information if it is available in the HEST entry that the kernel will
> parse.

Its not just firmware that may trigger this, other SoCs may use it for parity or
ECC errors, and they may not always have a valid address in FAR_EL1.

I think we should check the FnV bit in the esr variable and set info.si_addr to
0 if the addr we have isn't valid:
'For some implementations, the value of si_addr may be inaccurate.' [0]


Thanks,

James


[0] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html