[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2f965887-19b5-47bf-98ca-d40b3ec05e75@oracle.com>
Date: Tue, 7 May 2024 10:54:10 -0700
From: Jane Chu <jane.chu@...cle.com>
To: Oscar Salvador <osalvador@...e.de>
Cc: linmiaohe@...wei.com, nao.horiguchi@...il.com, akpm@...ux-foundation.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/3] mm/memory-failure: try to send SIGBUS even if unmap
failed
On 5/7/2024 2:02 AM, Oscar Salvador wrote:
> On Wed, May 01, 2024 at 05:24:56PM -0600, Jane Chu wrote:
>> For years when it comes down to kill a process due to hwpoison,
>> a SIGBUS is delivered only if unmap has been successful.
>> Otherwise, a SIGKILL is delivered. And the reason for that is
>> to prevent the involved process from accessing the hwpoisoned
>> page again.
>>
>> Since then a lot has changed, a hwpoisoned page is marked and
>> upon being re-accessed, the process will be killed immediately.
>> So let's take out the '!unmap_success' factor and try to deliver
>> SIGBUS if possible.
> I am missing some details here.
> An unmapped hwpoison page will trigger a fault and will return
> VM_FAULT_HWPOISON all the way down and then deliver SIGBUS,
> but if the page was not unmapped, how will this be catch upon
> re-accessing? Will the system deliver a MCE event?
>
I actually managed to hit the re-access case with an older version of
Linux -
MCE occurred, but unmap failed, no SIGBUS and test process re-access
the same address over and over (hence MCE after MCE), as the CPU
was unable to make forward progress. In reality, this issue is fixed with
kill_accessing_processes(). The comment for this patch refers to
comment made
about '!unmap_access' long time ago.
thanks,
-jane
Powered by blists - more mailing lists