[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fd051aec-9aad-a608-59d6-7bee3a340801@gmx.de>
Date: Mon, 21 Aug 2023 13:59:18 +0200
From: Helge Deller <deller@....de>
To: Will Deacon <will@...nel.org>,
Shuai Xue <xueshuai@...ux.alibaba.com>
Cc: catalin.marinas@....com, James.Bottomley@...senPartnership.com,
dave.hansen@...ux.intel.com, luto@...nel.org, peterz@...radead.org,
tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, x86@...nel.org,
hpa@...or.com, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-parisc@...r.kernel.org
Subject: Re: [PATCH] HWPOISON: add a pr_err message when forcibly send a
sigbus
On 8/21/23 12:50, Will Deacon wrote:
> On Sat, Aug 19, 2023 at 06:22:12PM +0800, Shuai Xue wrote:
>> When a process tries to access a page that is already offline
>
> How does this happen?
>
>> the kernel will send a sigbus signal with the BUS_MCEERR_AR code. This
>> signal is typically handled by a registered sigbus handler in the
>> process. However, if the process does not have a registered sigbus
>> handler, it is important for end users to be informed about what
>> happened.
>>
>> To address this, add an error message similar to those implemented on
>> the x86, powerpc, and parisc platforms.
>>
>> Signed-off-by: Shuai Xue <xueshuai@...ux.alibaba.com>
>> ---
>> arch/arm64/mm/fault.c | 2 ++
>> arch/parisc/mm/fault.c | 5 ++---
>> arch/x86/mm/fault.c | 3 +--
>> 3 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 3fe516b32577..38e2186882bd 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -679,6 +679,8 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>> } else if (fault & (VM_FAULT_HWPOISON_LARGE | VM_FAULT_HWPOISON)) {
>> unsigned int lsb;
>>
>> + pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
>> + current->comm, current->pid, far);
>> lsb = PAGE_SHIFT;
>> if (fault & VM_FAULT_HWPOISON_LARGE)
>> lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
>
> Hmm, I'm not convinced by this. We have 'show_unhandled_signals' already,
> and there's plenty of code in memory-failure.c for handling poisoned pages
> reported by e.g. GHES. I don't think dumping extra messages in dmesg from
> the arch code really adds anything.
I added the parisc hunk in commit 606f95e42558 due to the memory fault injections by the LTP
testsuite (madvise07). Not sure if there were any other kernel messages when this happened.
Helge
Powered by blists - more mailing lists