linux-kernel - Re: [PATCH v7 2/2] mm/oom_kill: The OOM reaper traverses the VMA maple tree in reverse order

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7rvwvuifkav5oz4ftfuziq23wek2bn6ygvrfotpaweypuy7obv@hjuf3eknscii>
Date: Wed, 3 Sep 2025 15:02:34 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Michal Hocko <mhocko@...e.com>
Cc: zhongjinji <zhongjinji@...or.com>, rientjes@...gle.com,
        shakeel.butt@...ux.dev, akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, tglx@...utronix.de,
        lorenzo.stoakes@...cle.com, surenb@...gle.com, liulu.liu@...or.com,
        feng.han@...or.com
Subject: Re: [PATCH v7 2/2] mm/oom_kill: The OOM reaper traverses the VMA
 maple tree in reverse order

* Michal Hocko <mhocko@...e.com> [250903 08:58]:
> On Wed 03-09-25 17:27:29, zhongjinji wrote:
> > Although the oom_reaper is delayed and it gives the oom victim chance to
> > clean up its address space this might take a while especially for
> > processes with a large address space footprint. In those cases
> > oom_reaper might start racing with the dying task and compete for shared
> > resources - e.g. page table lock contention has been observed.
> > 
> > Reduce those races by reaping the oom victim from the other end of the
> > address space.
> > 
> > It is also a significant improvement for process_mrelease(). When a process
> > is killed, process_mrelease is used to reap the killed process and often
> > runs concurrently with the dying task. The test data shows that after
> > applying the patch, lock contention is greatly reduced during the procedure
> > of reaping the killed process.
> 
> Thank you this is much better!
> 
> > Without the patch:
> > |--99.74%-- oom_reaper
> > |  |--76.67%-- unmap_page_range
> > |  |  |--33.70%-- __pte_offset_map_lock
> > |  |  |  |--98.46%-- _raw_spin_lock
> > |  |  |--27.61%-- free_swap_and_cache_nr
> > |  |  |--16.40%-- folio_remove_rmap_ptes
> > |  |  |--12.25%-- tlb_flush_mmu
> > |  |--12.61%-- tlb_finish_mmu
> > 
> > With the patch:
> > |--98.84%-- oom_reaper
> > |  |--53.45%-- unmap_page_range
> > |  |  |--24.29%-- [hit in function]
> > |  |  |--48.06%-- folio_remove_rmap_ptes
> > |  |  |--17.99%-- tlb_flush_mmu
> > |  |  |--1.72%-- __pte_offset_map_lock
> > |  |--30.43%-- tlb_finish_mmu
> 
> Just curious. Do I read this correctly that the overall speedup is
> mostly eaten by contention over tlb_finish_mmu?

The tlb_finish_mmu() taking less time indicates that it's probably not
doing much work, afaict.  These numbers would be better if exit_mmap()
was also added to show a more complete view of how the system is
affected - I suspect the tlb_finish_mmu time will have disappeared from
that side of things.

Comments in the code of this stuff has many arch specific statements,
which makes me wonder if this is safe (probably?) and beneficial for
everyone?  At the least, it would be worth mentioning which arch was
used for the benchmark - I am guessing arm64 considering the talk of
android, coincidently arm64 would benefit the most fwiu.

mmu_notifier_release(mm) is called early in the exit_mmap() path should
cause the mmu notifiers to be non-blocking (according to the comment in
v6.0 source of exit_mmap [1].

> 
> > Signed-off-by: zhongjinji <zhongjinji@...or.com>
> 
> Anyway, the change on its own makes sense to me
> Acked-by: Michal Hocko <mhocko@...e.com>
> 
> Thanks for working on the changelog improvements.

[1]. https://elixir.bootlin.com/linux/v6.0.19/source/mm/mmap.c#L3089

...

Thanks,
Liam