[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <95ae5d1a-fcfd-9106-4b13-9978de1a3d23@huawei.com>
Date: Mon, 20 Jun 2022 09:53:26 +0800
From: Tong Tiangen <tongtiangen@...wei.com>
To: Mark Rutland <mark.rutland@....com>
CC: James Morse <james.morse@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
"Ingo Molnar" <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Robin Murphy <robin.murphy@....com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"Catalin Marinas" <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
"Alexander Viro" <viro@...iv.linux.org.uk>,
Michael Ellerman <mpe@...erman.id.au>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Paul Mackerras <paulus@...ba.org>, <x86@...nel.org>,
"H . Peter Anvin" <hpa@...or.com>, <linuxppc-dev@...ts.ozlabs.org>,
<linux-arm-kernel@...ts.infradead.org>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
Kefeng Wang <wangkefeng.wang@...wei.com>,
Xie XiuQi <xiexiuqi@...wei.com>,
Guohanjun <guohanjun@...wei.com>
Subject: Re: [PATCH -next v5 6/8] arm64: add support for machine check error
safe
在 2022/6/18 20:52, Mark Rutland 写道:
> On Sat, Jun 18, 2022 at 05:18:55PM +0800, Tong Tiangen wrote:
>> 在 2022/6/17 16:55, Mark Rutland 写道:
>>> On Sat, May 28, 2022 at 06:50:54AM +0000, Tong Tiangen wrote:
>>>> +static bool arm64_do_kernel_sea(unsigned long addr, unsigned int esr,
>>>> + struct pt_regs *regs, int sig, int code)
>>>> +{
>>>> + if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC))
>>>> + return false;
>>>> +
>>>> + if (user_mode(regs) || !current->mm)
>>>> + return false;
>>>
>>> What's the `!current->mm` check for? >>
>> At first, I considered that only user processes have the opportunity to
>> recover when they trigger memory error.
>>
>> But it seems that this restriction is unreasonable. When the kernel thread
>> triggers memory error, it can also be recovered. for instance:
>>
>> https://lore.kernel.org/linux-mm/20220527190731.322722-1-jiaqiyan@google.com/
>>
>> And i think if(!current->mm) shoud be added below:
>>
>> if(!current->mm) {
>> set_thread_esr(0, esr);
>> arm64_force_sig_fault(...);
>> }
>> return true;
>
> Why does 'current->mm' have anything to do with this, though?
Sorry, typo, my original logic was:
if(current->mm) {
[...]
}
>
> There can be kernel threads with `current->mm` set in unusual circumstances
> (and there's a lot of kernel code out there which handles that wrong), so if
> you want to treat user tasks differently, we should be doing something like
> checking PF_KTHREAD, or adding something like an is_user_task() helper.
>
OK, i do want to treat user tasks differently here and didn't take into
account what you said. will be fixed next version according to your
suggestiong.
As follows:
if (!(current->flags & PF_KTHREAD)) {
set_thread_esr(0, esr);
arm64_force_sig_fault(...);
}
return true;
> [...]
>
>>>> +
>>>> + if (apei_claim_sea(regs) < 0)
>>>> + return false;
>>>> +
>>>> + if (!fixup_exception_mc(regs))
>>>> + return false;
>>>
>>> I thought we still wanted to signal the task in this case? Or do you expect to
>>> add that into `fixup_exception_mc()` ?
>>
>> Yeah, here return false and will signal to task in do_sea() ->
>> arm64_notify_die().
>
> I mean when we do the fixup.
>
> I thought the idea was to apply the fixup (to stop the kernel from crashing),
> but still to deliver a fatal signal to the user task since we can't do what the
> user task asked us to.
>
Yes, that's what i mean. :)
>>>> +
>>>> + set_thread_esr(0, esr);
>>>
>>> Why are we not setting the address? Is that deliberate, or an oversight?
>>
>> Here set fault_address to 0, i refer to the logic of arm64_notify_die().
>>
>> void arm64_notify_die(...)
>> {
>> if (user_mode(regs)) {
>> WARN_ON(regs != current_pt_regs());
>> current->thread.fault_address = 0;
>> current->thread.fault_code = err;
>>
>> arm64_force_sig_fault(signo, sicode, far, str);
>> } else {
>> die(str, regs, err);
>> }
>> }
>>
>> I don't know exactly why and do you know why arm64_notify_die() did this? :)
>
> To be honest, I don't know, and that looks equally suspicious to me.
>
> Looking at the git history, that was added in commit:
>
> 9141300a5884b57c ("arm64: Provide read/write fault information in compat signal handlers")
>
> ... so maybe Catalin recalls why.
>
> Perhaps the assumption is just that this will be fatal and so unimportant? ...
> but in that case the same logic would apply to the ESR value, so it's not clear
> to me.
OK, let's proceed as set to 0, if there is any change later, the two
positions shall be changed together.
Thanks,
Tong.
>
> Mark.
>
> .
Powered by blists - more mailing lists