linux-kernel - Re: [PATCH v5] mm: shrink skip folio mapped by an exiting process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4wE9pZoGoWoJD6R09uAppNqeycUdFUy0E8ZSUi4VcwaMQ@mail.gmail.com>
Date: Tue, 9 Jul 2024 09:34:43 +1200
From: Barry Song <baohua@...nel.org>
To: zhiguojiang <justinjiang@...o.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org, 
	LKML <linux-kernel@...r.kernel.org>, opensource.kernel@...o.com, 
	David Hildenbrand <david@...hat.com>, Matthew Wilcox <willy@...radead.org>
Subject: Re: [PATCH v5] mm: shrink skip folio mapped by an exiting process

On Tue, Jul 9, 2024 at 1:11 AM zhiguojiang <justinjiang@...o.com> wrote:
>
>
>
> 在 2024/7/8 20:41, Barry Song 写道:
> >
> >
> > zhiguojiang <justinjiang@...o.com> 于 2024年7月9日周二 00:25写道：
> >
> >
> >
> >     在 2024/7/8 20:17, zhiguojiang 写道:
> >     >
> >     >
> >     > 在 2024/7/8 19:02, Barry Song 写道:
> >     >> On Mon, Jul 8, 2024 at 9:04 PM Zhiguo Jiang <justinjiang@...o.com>
> >     >> wrote:
> >     >>> The releasing process of the non-shared anonymous folio mapped
> >     >>> solely by
> >     >>> an exiting process may go through two flows: 1) the anonymous
> >     folio is
> >     >>> firstly is swaped-out into swapspace and transformed into a
> >     swp_entry
> >     >>> in shrink_folio_list; 2) then the swp_entry is released in the
> >     process
> >     >>> exiting flow. This will increase the cpu load of releasing a
> >     non-shared
> >     >>> anonymous folio mapped solely by an exiting process, because
> >     the folio
> >     >>> go through swap-out and the releasing the swapspace and swp_entry.
> >     >>>
> >     >>> When system is low memory, it is more likely to occur, because
> >     more
> >     >>> backend applidatuions will be killed.
> >     >>>
> >     >>> The modification is that shrink skips the non-shared anonymous
> >     folio
> >     >>> solely mapped by an exting process and the folio is only released
> >     >>> directly in the process exiting flow, which will save swap-out
> >     time
> >     >>> and alleviate the load of the process exiting.
> >     >>>
> >     >>> Signed-off-by: Zhiguo Jiang <justinjiang@...o.com>
> >     >>> ---
> >     >>>
> >     >>> Change log:
> >     >>> v4->v5:
> >     >>> 1.Modify to skip non-shared anonymous folio only.
> >     >>> 2.Update comments for pra->referenced = -1.
> >     >>> v3->v4:
> >     >>> 1.Modify that the unshared folios mapped only in exiting task
> >     are skip.
> >     >>> v2->v3:
> >     >>> Nothing.
> >     >>> v1->v2:
> >     >>> 1.The VM_EXITING added in v1 patch is removed, because it will
> >     fail
> >     >>> to compile in 32-bit system.
> >     >>>
> >     >>>   mm/rmap.c   | 13 +++++++++++++
> >     >>>   mm/vmscan.c |  7 ++++++-
> >     >>>   2 files changed, 19 insertions(+), 1 deletion(-)
> >     >>>
> >     >>> diff --git a/mm/rmap.c b/mm/rmap.c
> >     >>> index 26806b49a86f..5b5281d71dbb
> >     >>> --- a/mm/rmap.c
> >     >>> +++ b/mm/rmap.c
> >     >>> @@ -843,6 +843,19 @@ static bool folio_referenced_one(struct
> >     folio
> >     >>> *folio,
> >     >>>          int referenced = 0;
> >     >>>          unsigned long start = address, ptes = 0;
> >     >>>
> >     >>> +       /*
> >     >>> +        * Skip the non-shared anonymous folio mapped solely by
> >     >>> +        * the single exiting process, and release it directly
> >     >>> +        * in the process exiting.
> >     >>> +        */
> >     >>> +       if ((!atomic_read(&vma->vm_mm->mm_users) ||
> >     >>> +               test_bit(MMF_OOM_SKIP, &vma->vm_mm->flags)) &&
> >     >>> +               folio_test_anon(folio) &&
> >     >>> folio_test_swapbacked(folio) &&
> >     >>> + !folio_likely_mapped_shared(folio)) {
> >     >>> +               pra->referenced = -1;
> >     >>> +               return false;
> >     >>> +       }
> >     >>> +
> >     >>>          while (page_vma_mapped_walk(&pvmw)) {
> >     >>>                  address = pvmw.address;
> >     > Sure, I agree with your modification suggestions. This way,
> >     using PTL
> >     > indeed sure
> >     > that the folio is mapped by this process.
> >     > Thanks
> >     >> As David suggested, what about the below?
> >     >>
> >     >> @@ -883,6 +870,21 @@ static bool folio_referenced_one(struct folio
> >     >> *folio,
> >     >>                          continue;
> >     >>                  }
> >     >>
> >     >> +               /*
> >     >> +                * Skip the non-shared anonymous folio mapped
> >     solely by
> >     >> +                * the single exiting process, and release it
> >     directly
> >     >> +                * in the process exiting.
> >     >> +                */
> >     >> +               if ((!atomic_read(&vma->vm_mm->mm_users) ||
> >     >> + test_bit(MMF_OOM_SKIP,
> >     >> &vma->vm_mm->flags)) &&
> >     >> + folio_test_anon(folio) &&
> >     >> folio_test_swapbacked(folio) &&
> >     >> + !folio_likely_mapped_shared(folio)) {
> >     >> +                       pra->referenced = -1;
> >     >> + page_vma_mapped_walk_done(&pvmw);
> >     >> +                       return false;
> >     >> +               }
> >     >> +
> >     >>                  if (pvmw.pte) {
> >     >>                          if (lru_gen_enabled() &&
> >     >> pte_young(ptep_get(pvmw.pte))) {
> >     >>
> >     >>
> >     >> By the way, I am not convinced that using test_bit(MMF_OOM_SKIP,
> >     >> &vma->vm_mm->flags) is
> >     >> correct (I think it is wrong).   For example, global_init can
> >     >> directly have it:
> >     >>                  if (is_global_init(p)) {
> >     >>                          can_oom_reap = false;
> >     >>                          set_bit(MMF_OOM_SKIP, &mm->flags);
> >     >>                          pr_info("oom killer %d (%s) has mm
> >     pinned by
> >     >> %d (%s)\n",
> >     >> task_pid_nr(victim),
> >     >> victim->comm,
> >     >> task_pid_nr(p), p->comm);
> >     >>                          continue;
> >     >>                  }
> >     >>
> >     >> And exit_mmap() automatically has MMF_OOM_SKIP.
> >     >>
> >     >> What is the purpose of this check? Is there a better way to
> >     determine
> >     >> if a process is an
> >     >> OOM target? What about check_stable_address_space() ?
> >     > 1.Sorry, I overlook the situation with if (is_global_init(p)),
> >     > MMF_OOM_SKIP is indeed not suitable.
> >     >
> >     > 2.check_stable_address_space() can indicate oom_reaper, but it
> >     seems
> >     > unable to identify the situation where the process exits normally.
> >     > What about task_is_dying()? static inline bool
> >     task_is_dying(void) {
> >     > return tsk_is_oom_victim(current) ||
> >     fatal_signal_pending(current) ||
> >     > (current->flags & PF_EXITING); } Thanks
> >     We can migrate task_is_dying() from mm/memcontrol.c to
> >     include/linux/oom.h
> >     > static inline bool task_is_dying(void)
> >     > {
> >     >     return tsk_is_oom_victim(current) ||
> >     fatal_signal_pending(current) ||
> >     >         (current->flags & PF_EXITING);
> >     > }
> >
> >
> > no. current is kswapd.
> Hi Barry,
>
> It seems feasible for check_stable_address_space() replacing MMF_OOM_SKIP.
> check_stable_address_space() can indicate oom kill, and
> !atomic_read(&vma->vm_mm->mm_users)
> can indicate the normal process exiting.
>
>          /*
>           * Skip the non-shared anonymous folio mapped solely by
>           * the single exiting process, and release it directly
>           * in the process exiting.
>           */
>          if ((!atomic_read(&vma->vm_mm->mm_users) ||
>              check_stable_address_space(vma->vm_mm)) &&
>              folio_test_anon(folio) && folio_test_swapbacked(folio) &&
>              !folio_likely_mapped_shared(folio)) {
>              pra->referenced = -1;
>              page_vma_mapped_walk_done(&pvmw);
>              return false;
>          }
>

Yes, + David, Willy (when you send a new version, please CC people who have
participated and describe how you have addressed comments from those
people.)

I also think we actually can remove "folio_test_anon(folio)".

So It could be,

@@ -883,6 +871,21 @@ static bool folio_referenced_one(struct folio *folio,
                        continue;
                }

+               /*
+                * Skip the non-shared swapbacked folio mapped solely by
+                * the exiting or OOM-reaped process. This avoids redundant
+                * swap-out followed by an immediate unmap.
+                */
+               if ((!atomic_read(&vma->vm_mm->mm_users) ||
+                    check_stable_address_space(vma->vm_mm)) &&
+                    folio_test_swapbacked(folio) &&
+                    !folio_likely_mapped_shared(folio)) {
+                       pra->referenced = -1;
+                       page_vma_mapped_walk_done(&pvmw);
+                       return false;
+               }
+
                if (pvmw.pte) {
                        if (lru_gen_enabled() &&
                            pte_young(ptep_get(pvmw.pte))) {

> Thanks
> Zhiguo
> >
> >
> >     >>
> >     >>
> >     >>> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >     >>> index 0761f91b407f..bae7a8bf6b3d
> >     >>> --- a/mm/vmscan.c
> >     >>> +++ b/mm/vmscan.c
> >     >>> @@ -863,7 +863,12 @@ static enum folio_references
> >     >>> folio_check_references(struct folio *folio,
> >     >>>          if (vm_flags & VM_LOCKED)
> >     >>>                  return FOLIOREF_ACTIVATE;
> >     >>>
> >     >>> -       /* rmap lock contention: rotate */
> >     >>> +       /*
> >     >>> +        * There are two cases to consider.
> >     >>> +        * 1) Rmap lock contention: rotate.
> >     >>> +        * 2) Skip the non-shared anonymous folio mapped solely by
> >     >>> +        *    the single exiting process.
> >     >>> +        */
> >     >>>          if (referenced_ptes == -1)
> >     >>>                  return FOLIOREF_KEEP;
> >     >>>
> >     >>> --
> >     >>> 2.39.0
> >     >>>
> >     >> Thanks
> >     >> Barry
> >     >
> >
>

Thanks
Barry