lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250829065550.29571-1-zhongjinji@honor.com>
Date: Fri, 29 Aug 2025 14:55:48 +0800
From: zhongjinji <zhongjinji@...or.com>
To: <mhocko@...e.com>
CC: <rientjes@...gle.com>, <shakeel.butt@...ux.dev>,
	<akpm@...ux-foundation.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>, <tglx@...utronix.de>,
	<liam.howlett@...cle.com>, <lorenzo.stoakes@...cle.com>, <surenb@...gle.com>,
	<liulu.liu@...or.com>, <feng.han@...or.com>, <tianxiaobin@...or.com>,
	<fengbaopeng@...or.com>, <zhongjinji@...or.com>
Subject: [PATCH v6 0/2] Do not delay OOM reaper when the victim is frozen

An overview of the relationship between patch 1 and patch 2:
With patch 1 applied, the OOM reaper is no longer delayed when the victim
process is frozen. If the victim process is thawed in time, the OOM reaper
and the exit_mmap() thread may run concurrently, which can lead to
significant spinlock contention. Patch 2 mitigates this issue by traversing
the maple tree in reverse order, reducing the likelihood of such lock
contention.

The attached test data was collected on Android. It shows that when the OOM
reaper and exit_mmap are executed at the same time, pte spinlock contention
becomes more intense. This results in increased running time for both
processes, which in turn means higher system load. It also shows that
reverse-order traversal of the VMA maple tree by the OOM reaper can
significantly reduce pte spinlock contention.

The test data indicate that it can significantly reduce spinlock contention
and decrease the load (measured by process running time) of both oom_reaper
and exit_mmap by 30%.

The perf data applying patch 1 but not patch 2:
|--99.74%-- oom_reaper
|  |--76.67%-- unmap_page_range
|  |  |--33.70%-- __pte_offset_map_lock
|  |  |  |--98.46%-- _raw_spin_lock
|  |  |--27.61%-- free_swap_and_cache_nr
|  |  |--16.40%-- folio_remove_rmap_ptes
|  |  |--12.25%-- tlb_flush_mmu
|  |--12.61%-- tlb_finish_mmu

The perf data applying patch 1 and patch 2:
|--98.84%-- oom_reaper
|  |--53.45%-- unmap_page_range
|  |  |--24.29%-- [hit in function]
|  |  |--48.06%-- folio_remove_rmap_ptes
|  |  |--17.99%-- tlb_flush_mmu
|  |  |--1.72%-- __pte_offset_map_lock
|  |  
|  |--30.43%-- tlb_finish_mmu

This is test data regarding the process running time.

With oom reaper (reverse traverse):
  Thread            TID     State     Wall duration (ms)
  RxComputationT   13708    Running   60.69
  oom_reaper        81      Running   46.49
  Total (ms): 107.18

With oom reaper:
  Thread            TID     State     Wall duration (ms)
  vdp:vidtask:m    14040    Running   81.85
  oom_reaper        81      Running   69.32
  Total (ms): 151.17

Without oom reaper:
  Thread            TID     State     Wall duration (ms)
  tp-background     12424   Running   106.02
  Total (ms): 106.02

Note: RxComputationT, vdp:vidtask:m, and tp-background are threads of the
same process, and they are the last threads to exit.

---
v5 -> v6:
- Use mas_for_each_rev() for VMA traversal [6]
- Simplify the judgment of whether to delay in queue_oom_reaper() [7]
- Refine changelog to better capture the essence of the changes [8]
- Use READ_ONCE(tsk->frozen) instead of checking mm and additional
  checks inside for_each_process(), as it is sufficient [9]
- Add report tags because fengbaopeng and tianxiaobin reported the
  high load issue of the reaper

v4 -> v5:
- Detect frozen state directly, avoid special futex handling. [3]
- Use mas_find_rev() for VMA traversal to avoid skipping entries. [4]
- Only check should_delay_oom_reap() in queue_oom_reaper(). [5]

v3 -> v4:
- Renamed functions and parameters for clarity. [2]
- Added should_delay_oom_reap() for OOM reap decisions.
- Traverse maple tree in reverse for improved behavior.

v2 -> v3:
- Fixed Subject prefix error.

v1 -> v2:
- Check robust_list for all threads, not just one. [1]

Reference:
[1] https://lore.kernel.org/linux-mm/u3mepw3oxj7cywezna4v72y2hvyc7bafkmsbirsbfuf34zpa7c@b23sc3rvp2gp/
[2] https://lore.kernel.org/linux-mm/87cy99g3k6.ffs@tglx/
[3] https://lore.kernel.org/linux-mm/aKRWtjRhE_HgFlp2@tiehlicka/
[4] https://lore.kernel.org/linux-mm/26larxehoe3a627s4fxsqghriwctays4opm4hhme3uk7ybjc5r@pmwh4s4yv7lm/
[5] https://lore.kernel.org/linux-mm/d5013a33-c08a-44c5-a67f-9dc8fd73c969@lucifer.local/
[6] https://lore.kernel.org/linux-mm/nwh7gegmvoisbxlsfwslobpbqku376uxdj2z32owkbftvozt3x@4dfet73fh2yy/
[7] https://lore.kernel.org/linux-mm/af4edeaf-d3c9-46a9-a300-dbaf5936e7d6@lucifer.local/
[8] https://lore.kernel.org/linux-mm/aK71W1ITmC_4I_RY@tiehlicka/
[9] https://lore.kernel.org/linux-mm/jzzdeczuyraup2zrspl6b74muf3bly2a3acejfftcldfmz4ekk@s5mcbeim34my/

The earlier post:
v5: https://lore.kernel.org/linux-mm/20250825133855.30229-1-zhongjinji@honor.com/
v4: https://lore.kernel.org/linux-mm/20250814135555.17493-1-zhongjinji@honor.com/
v3: https://lore.kernel.org/linux-mm/20250804030341.18619-1-zhongjinji@honor.com/
v2: https://lore.kernel.org/linux-mm/20250801153649.23244-1-zhongjinji@honor.com/
v1: https://lore.kernel.org/linux-mm/20250731102904.8615-1-zhongjinji@honor.com/

zhongjinji (2):
  mm/oom_kill: Do not delay oom reaper when the victim is frozen
  mm/oom_kill: The OOM reaper traverses the VMA maple tree in reverse
    order

 mm/oom_kill.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

-- 
2.17.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ