lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250825133855.30229-1-zhongjinji@honor.com>
Date: Mon, 25 Aug 2025 21:38:53 +0800
From: zhongjinji <zhongjinji@...or.com>
To: <mhocko@...e.com>
CC: <rientjes@...gle.com>, <shakeel.butt@...ux.dev>,
	<akpm@...ux-foundation.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>, <tglx@...utronix.de>,
	<liam.howlett@...cle.com>, <lorenzo.stoakes@...cle.com>,
	<liulu.liu@...or.com>, <feng.han@...or.com>, <zhongjinji@...or.com>
Subject: [PATCH v5 0/2] Do not delay oom reaper when the victim is frozen

patch 1 do not delay oom reaper when the victim is frozen, patch 2 makes
the OOM reaper and exit_mmap() traverse the maple tree in opposite orders
to reduce PTE lock contention caused by unmapping the same vma.

About patch 1:
Patch 1 uses frozen() to check the frozen state of a single thread to
determine if a process is frozen, rather than checking all threads,
because the frozen state of all threads in a process will eventually be
consistent. There is no need to strictly confirm that all threads are
frozen; it is only necessary to check whether the process has been frozen
or is about to be frozen.

When a process is frozen, if it cannot be unfrozen promptly, the delayed
two-second oom reaper cannot guarantee that robust futexes will not be
reaped. So the processes holding robust futexes should not be frozen.
This patch will not make issue [1] worse.

About patch 2:
I tested the changes of patch 2 on Android. The reproduction steps are as
follows: Start a process, then kill it like oom kill does, and actively add
it to the oom reaper.

The perf data applying patch 1 but not patch 2:
|--99.74%-- oom_reaper
|  |--76.67%-- unmap_page_range
|  |  |--33.70%-- __pte_offset_map_lock
|  |  |  |--98.46%-- _raw_spin_lock
|  |  |--27.61%-- free_swap_and_cache_nr
|  |  |--16.40%-- folio_remove_rmap_ptes
|  |  |--12.25%-- tlb_flush_mmu
|  |--12.61%-- tlb_finish_mmu

The perf data applying patch 1 and patch 2:
|--98.84%-- oom_reaper
|  |--53.45%-- unmap_page_range
|  |  |--24.29%-- [hit in function]
|  |  |--48.06%-- folio_remove_rmap_ptes
|  |  |--17.99%-- tlb_flush_mmu
|  |  |--1.72%-- __pte_offset_map_lock
|  |  
|  |--30.43%-- tlb_finish_mmu

It is obvious that the lock contention on the pte spinlock will be very
intense when they traverse the tree along the same path.

On low-memory Android devices, high memory pressure often requires killing
processes to free memory, which is generally accepted on Android. lmkd, a
user-space program that actively kills processes, needs to asynchronously
call process_mrelease to release memory from killed processes, similar to
the oom reaper. At the same time, OOM events are not rare. Therefore,
reducing lock contention on __oom_reap_task_mm is meaningful.

Link: https://lore.kernel.org/all/20220414144042.677008-1-npache@redhat.com/T/#u [1]

---
v4 -> v5:
1. Detect the frozen state of the process instead of checking the futex state,
   as special handling of futex locks should be avoided during OOM kill [2].
2. Use mas_find_rev() to traverse the VMA tree instead of vma_prev(), because
   vma_prev() may skip the first VMA and should not be used here. [3]
3. Just check ishould_delay_oom_reap() in queue_oom_reaper() since it is not
   hot path. [4]

v4 link:
https://lore.kernel.org/linux-mm/20250814135555.17493-1-zhongjinji@honor.com/

v3 -> v4:
1. Rename check_robust_futex() to process_has_robust_futex() for clearer
   intent.
2. Because the delay_reap parameter was added to task_will_free_mem(),
   the function is renamed to should_reap_task() to better clarify
   its purpose.
3. Add should_delay_oom_reap() to decide whether to delay OOM reap.
4. Modify the OOM reaper to traverse the maple tree in reverse order; see patch
   3 for details.
These changes improve code readability and enhance OOM reaper behavior.

v3 link:  
https://lore.kernel.org/all/20250804030341.18619-1-zhongjinji@honor.com/
https://lore.kernel.org/all/20250804030341.18619-2-zhongjinji@honor.com/

v2 -> v3:
1. It mainly fixed the error in the Subject prefix, changing it from futex to
   mm/oom_kill.
v2 link:
https://lore.kernel.org/linux-mm/20250801153649.23244-1-zhongjinji@honor.com/
https://lore.kernel.org/linux-mm/20250801153649.23244-2-zhongjinji@honor.com/

v1 -> v2:
1. Check the robust_list of all threads instead of just a single thread.
v1 link:
https://lore.kernel.org/linux-mm/20250731102904.8615-1-zhongjinji@honor.com/

Reference:
https://lore.kernel.org/linux-mm/aKRWtjRhE_HgFlp2@tiehlicka/ [2]
https://lore.kernel.org/linux-mm/26larxehoe3a627s4fxsqghriwctays4opm4hhme3uk7ybjc5r@pmwh4s4yv7lm/
[3]
https://lore.kernel.org/linux-mm/d5013a33-c08a-44c5-a67f-9dc8fd73c969@lucifer.local/ [4]

*** BLURB HERE ***

zhongjinji (2):
  mm/oom_kill: Do not delay oom reaper when the victim is frozen
  mm/oom_kill: Have the OOM reaper and exit_mmap() traverse the maple
    tree in opposite order

 mm/oom_kill.c | 49 ++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 46 insertions(+), 3 deletions(-)

-- 
2.17.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ