[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5880FD8A.2000405@arm.com>
Date: Thu, 19 Jan 2017 17:55:22 +0000
From: James Morse <james.morse@....com>
To: "Baicar, Tyler" <tbaicar@...eaurora.org>
CC: christoffer.dall@...aro.org, marc.zyngier@....com,
pbonzini@...hat.com, rkrcmar@...hat.com, linux@...linux.org.uk,
catalin.marinas@....com, will.deacon@....com, rjw@...ysocki.net,
lenb@...nel.org, matt@...eblueprint.co.uk, robert.moore@...el.com,
lv.zheng@...el.com, nkaje@...eaurora.org, zjzhang@...eaurora.org,
mark.rutland@....com, akpm@...ux-foundation.org,
eun.taik.lee@...sung.com, sandeepa.s.prabhu@...il.com,
labbott@...hat.com, shijie.huang@....com, rruigrok@...eaurora.org,
paul.gortmaker@...driver.com, tn@...ihalf.com, fu.wei@...aro.org,
rostedt@...dmis.org, bristot@...hat.com,
linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.cs.columbia.edu,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-acpi@...r.kernel.org, linux-efi@...r.kernel.org,
devel@...ica.org, Suzuki.Poulose@....com, punit.agrawal@....com,
astone@...hat.com, harba@...eaurora.org, hanjun.guo@...aro.org,
john.garry@...wei.com, shiju.jose@...wei.com
Subject: Re: [PATCH V7 04/10] arm64: exception: handle Synchronous External
Abort
Hi Tyler,
On 18/01/17 23:26, Baicar, Tyler wrote:
> On 1/17/2017 3:31 AM, James Morse wrote:
>> On 12/01/17 18:15, Tyler Baicar wrote:
>>> SEA exceptions are often caused by an uncorrected hardware
>>> error, and are handled when data abort and instruction abort
>>> exception classes have specific values for their Fault Status
>>> Code.
>>> When SEA occurs, before killing the process, go through
>>> the handlers registered in the notification list.
>>> Update fault_info[] with specific SEA faults so that the
>>> new SEA handler is used.
>>> @@ -480,6 +496,28 @@ static int do_bad(unsigned long addr, unsigned int esr,
>>> struct pt_regs *regs)
>>> return 1;
>>> }
>>> +/*
>>> + * This abort handler deals with Synchronous External Abort.
>>> + * It calls notifiers, and then returns "fault".
>>> + */
>>> +static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
>>> +{
>>> + struct siginfo info;
>>> +
>>> + atomic_notifier_call_chain(&sea_handler_chain, 0, NULL);
>>> +
>>> + pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
>>> + fault_name(esr), esr, addr);
>>> +
>>> + info.si_signo = SIGBUS;
>>> + info.si_errno = 0;
>>> + info.si_code = 0;
>> Half of the other do_*() functions in this file read the signo and code from the
>> fault_info table.
>>
>>
>>> + info.si_addr = (void __user *)addr;
>> addr here was read from FAR_EL1, but for some of the classes of exception you
>> have listed below this register isn't updated with the faulting address.
>>
>> The ARM-ARM version 'k' in D1.10.5 "Summary of registers on faults taken to an
>> Exception level that is using Aarch64" has:
>>> The architecture permits that the FAR_ELx is UNKNOWN for Synchronous External
>>> Aborts other than Synchronous External Aborts on Translation Table Walks. In
>>> this case, the ISS.FnV bit returned in ESR_ELx indicates whether FAR_ELx is
>>> valid.
>> This is a problem if we get 'synchronous external abort' or 'synchronous parity
>> error' while a user space process was running.
> It looks like this would just cause an incorrect address to be printed in the
> above pr_err.
> Unless I'm missing something, I don't see arm64_notify_die or anything that gets
> called from
> there using the info.si_addr variable.
I may be misreading something here...
This patch has:
> info.si_addr = (void __user *)addr;
> arm64_notify_die("", regs, &info, esr);
>From arch/arm64/kernel/traps.c:arm64_notify_die():
> if (user_mode(regs)) {
> current->thread.fault_address = 0;
> current->thread.fault_code = err;
> force_sig_info(info->si_signo, info, current);
> }
So if the SEA interrupted userspace, we put maybe-unknown addr into
force_sig_info() to deliver a signal to user space. User-space then gets a copy
of the info struct containing the maybe-unknown addr.
I think this is an existing bug, but if we are separating the synchronous
external aborts from the generic do_bad handler, we should probably check the
FnV bit. (I think we should still print it out)
> What do you suggest I do here? The firmware should be reporting the physical and
> virtual
> address information if it is available in the HEST entry that the kernel will
> parse.
Its not just firmware that may trigger this, other SoCs may use it for parity or
ECC errors, and they may not always have a valid address in FAR_EL1.
I think we should check the FnV bit in the esr variable and set info.si_addr to
0 if the addr we have isn't valid:
'For some implementations, the value of si_addr may be inaccurate.' [0]
Thanks,
James
[0] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html
Powered by blists - more mailing lists