lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aKd1K3ueTacGTf1W@hpe.com>
Date: Thu, 21 Aug 2025 14:36:11 -0500
From: Kyle Meyer <kyle.meyer@....com>
To: Jiaqi Yan <jiaqiyan@...gle.com>
Cc: akpm@...ux-foundation.org, david@...hat.com, tony.luck@...el.com,
        bp@...en8.de, linmiaohe@...wei.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
        lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, vbabka@...e.cz,
        rppt@...nel.org, surenb@...gle.com, mhocko@...e.com,
        nao.horiguchi@...il.com, jane.chu@...cle.com, osalvador@...e.de
Subject: Re: [PATCH] mm/memory-failure: Do not call action_result() on
 already poisoned pages

On Thu, Aug 21, 2025 at 11:23:48AM -0700, Jiaqi Yan wrote:
> On Thu, Aug 21, 2025 at 9:46 AM Kyle Meyer <kyle.meyer@....com> wrote:
> >
> > Calling action_result() on already poisoned pages causes issues:
> >
> > * The amount of hardware corrupted memory is incorrectly incremented.
> > * NUMA node memory failure statistics are incorrectly updated.
> > * Redundant "already poisoned" messages are printed.
> 
> All agreed.
> 
> >
> > Do not call action_result() on already poisoned pages and drop unused
> > MF_MSG_ALREADY_POISONED.
> 
> Hi Kyle,
> 
> Patch looks great to me, just one thought...
> 
> Alternatively, have you thought about keeping MF_MSG_ALREADY_POISONED
> but changing action_result for MF_MSG_ALREADY_POISONED?
> - don't num_poisoned_pages_inc(pfn)
> - don't update_per_node_mf_stats(pfn, result)
> - still pr_err("%#lx: recovery action for %s: %s\n", ...)
> - meanwhile remove "pr_err("%#lx: already hardware poisoned\n", pfn)"
> in memory_failure and try_memory_failure_hugetlb

I did consider that approach but I was concerned about passing
MF_MSG_ALREADY_POISONED to action_result() with MF_FAILED. The message is a
bit misleading.

How about introducing a new MF action result? Maybe MF_NONE? The message could
look something like:

Memory failure: 0xXXXXXXXX: recovery action for already poisoned page: None

> This way, all the MF recovery result kernel logs out will be sitting
> in one place, action_result, instead of scattering around all over the
> place.

That sounds better to me.
 
> >
> > Fixes: b8b9488d50b7 ("mm/memory-failure: improve memory failure action_result messages")
> > Signed-off-by: Kyle Meyer <kyle.meyer@....com>
> > ---
> >  include/linux/mm.h      | 1 -
> >  include/ras/ras_event.h | 1 -
> >  mm/memory-failure.c     | 3 ---
> >  3 files changed, 5 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 1ae97a0b8ec7..09ce81ef7afc 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4005,7 +4005,6 @@ enum mf_action_page_type {
> >         MF_MSG_BUDDY,
> >         MF_MSG_DAX,
> >         MF_MSG_UNSPLIT_THP,
> > -       MF_MSG_ALREADY_POISONED,
> >         MF_MSG_UNKNOWN,
> >  };
> >
> > diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
> > index c8cd0f00c845..f62a52f5bd81 100644
> > --- a/include/ras/ras_event.h
> > +++ b/include/ras/ras_event.h
> > @@ -374,7 +374,6 @@ TRACE_EVENT(aer_event,
> >         EM ( MF_MSG_BUDDY, "free buddy page" )                          \
> >         EM ( MF_MSG_DAX, "dax page" )                                   \
> >         EM ( MF_MSG_UNSPLIT_THP, "unsplit thp" )                        \
> > -       EM ( MF_MSG_ALREADY_POISONED, "already poisoned" )              \
> >         EMe ( MF_MSG_UNKNOWN, "unknown page" )
> >
> >  /*
> > diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> > index e2e685b971bb..7839ec83bc1d 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -948,7 +948,6 @@ static const char * const action_page_types[] = {
> >         [MF_MSG_BUDDY]                  = "free buddy page",
> >         [MF_MSG_DAX]                    = "dax page",
> >         [MF_MSG_UNSPLIT_THP]            = "unsplit thp",
> > -       [MF_MSG_ALREADY_POISONED]       = "already poisoned",
> >         [MF_MSG_UNKNOWN]                = "unknown page",
> >  };
> >
> > @@ -2090,7 +2089,6 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
> >                 if (flags & MF_ACTION_REQUIRED) {
> >                         folio = page_folio(p);
> >                         res = kill_accessing_process(current, folio_pfn(folio), flags);
> > -                       action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> >                 }
> >                 return res;
> >         } else if (res == -EBUSY) {
> > @@ -2283,7 +2281,6 @@ int memory_failure(unsigned long pfn, int flags)
> >                         res = kill_accessing_process(current, pfn, flags);
> >                 if (flags & MF_COUNT_INCREASED)
> >                         put_page(p);
> > -               action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
> >                 goto unlock_mutex;
> >         }
> >
> > --
> > 2.50.1
> >
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ