[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACw3F53KmKRJyH+ajicyDUgGbPZT=U3VE4n+Jt3E62BxEiiCGA@mail.gmail.com>
Date: Thu, 21 Aug 2025 11:23:48 -0700
From: Jiaqi Yan <jiaqiyan@...gle.com>
To: Kyle Meyer <kyle.meyer@....com>
Cc: akpm@...ux-foundation.org, david@...hat.com, tony.luck@...el.com,
bp@...en8.de, linmiaohe@...wei.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, vbabka@...e.cz,
rppt@...nel.org, surenb@...gle.com, mhocko@...e.com, nao.horiguchi@...il.com,
jane.chu@...cle.com, osalvador@...e.de
Subject: Re: [PATCH] mm/memory-failure: Do not call action_result() on already
poisoned pages
On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@....com> wrote:
>
> Calling action_result() on already poisoned pages causes issues:
>
> * The amount of hardware corrupted memory is incorrectly incremented.
> * NUMA node memory failure statistics are incorrectly updated.
> * Redundant "already poisoned" messages are printed.
All agreed.
>
> Do not call action_result() on already poisoned pages and drop unused
> MF_MSG_ALREADY_POISONED.
Hi Kyle,
Patch looks great to me, just one thought...
Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
but changing action_result for MF_MSG_ALREADY_POISONED?
- don't num_poisoned_pages_inc(pfn)
- don't update_per_node_mf_stats(pfn, result)
- still pr_err("%#lx: recovery action for %s: %s\n", ...)
- meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
in memory_failure and try_memory_failure_hugetlb
This way, all the MF recovery result kernel logs out will be sitting
in one place, action_result, instead of scattering around all over the
place.
>
> Fixes: b8b9488d50b7 ("mm/memory-failure: improve memory failure action_result messages")
> Signed-off-by: Kyle Meyer <kyle.meyer@....com>
> ---
> include/linux/mm.h | 1 -
> include/ras/ras_event.h | 1 -
> mm/memory-failure.c | 3 ---
> 3 files changed, 5 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1ae97a0b8ec7..09ce81ef7afc 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4005,7 +4005,6 @@ enum mf_action_page_type {
> MF_MSG_BUDDY,
> MF_MSG_DAX,
> MF_MSG_UNSPLIT_THP,
> - MF_MSG_ALREADY_POISONED,
> MF_MSG_UNKNOWN,
> };
>
> diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> index c8cd0f00c845..f62a52f5bd81 100644
> --- a/include/ras/ras_event.h
> +++ b/include/ras/ras_event.h
> @@ -374,7 +374,6 @@ TRACE_EVENT(aer_event,
> EM ( MF_MSG_BUDDY, "free buddy page" ) \
> EM ( MF_MSG_DAX, "dax page" ) \
> EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" ) \
> - EM ( MF_MSG_ALREADY_POISONED, "already poisoned" ) \
> EMe ( MF_MSG_UNKNOWN, "unknown page" )
>
> /*
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index e2e685b971bb..7839ec83bc1d 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -948,7 +948,6 @@ static const char * const action_page_types[] = {
> [MF_MSG_BUDDY] = "free buddy page",
> [MF_MSG_DAX] = "dax page",
> [MF_MSG_UNSPLIT_THP] = "unsplit thp",
> - [MF_MSG_ALREADY_POISONED] = "already poisoned",
> [MF_MSG_UNKNOWN] = "unknown page",
> };
>
> @@ -2090,7 +2089,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
> if (flags & MF_ACTION_REQUIRED) {
> folio = page_folio(p);
> res = kill_accessing_process(current, folio_pfn(folio), flags);
> - action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> }
> return res;
> } else if (res == -EBUSY) {
> @@ -2283,7 +2281,6 @@ int memory_failure(unsigned long pfn, int flags)
> res = kill_accessing_process(current, pfn, flags);
> if (flags & MF_COUNT_INCREASED)
> put_page(p);
> - action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> goto unlock_mutex;
> }
>
> --
> 2.50.1
>
>
Powered by blists - more mailing lists