[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250829065550.29571-1-zhongjinji@honor.com>
Date: Fri, 29 Aug 2025 14:55:48 +0800
From: zhongjinji <zhongjinji@...or.com>
To: <mhocko@...e.com>
CC: <rientjes@...gle.com>, <shakeel.butt@...ux.dev>,
<akpm@...ux-foundation.org>, <linux-mm@...ck.org>,
<linux-kernel@...r.kernel.org>, <tglx@...utronix.de>,
<liam.howlett@...cle.com>, <lorenzo.stoakes@...cle.com>, <surenb@...gle.com>,
<liulu.liu@...or.com>, <feng.han@...or.com>, <tianxiaobin@...or.com>,
<fengbaopeng@...or.com>, <zhongjinji@...or.com>
Subject: [PATCH v6 0/2] Do not delay OOM reaper when the victim is frozen
An overview of the relationship between patch 1 and patch 2:
With patch 1 applied, the OOM reaper is no longer delayed when the victim
process is frozen. If the victim process is thawed in time, the OOM reaper
and the exit_mmap() thread may run concurrently, which can lead to
significant spinlock contention. Patch 2 mitigates this issue by traversing
the maple tree in reverse order, reducing the likelihood of such lock
contention.
The attached test data was collected on Android. It shows that when the OOM
reaper and exit_mmap are executed at the same time, pte spinlock contention
becomes more intense. This results in increased running time for both
processes, which in turn means higher system load. It also shows that
reverse-order traversal of the VMA maple tree by the OOM reaper can
significantly reduce pte spinlock contention.
The test data indicate that it can significantly reduce spinlock contention
and decrease the load (measured by process running time) of both oom_reaper
and exit_mmap by 30%.
The perf data applying patch 1 but not patch 2:
|--99.74%-- oom_reaper
| |--76.67%-- unmap_page_range
| | |--33.70%-- __pte_offset_map_lock
| | | |--98.46%-- _raw_spin_lock
| | |--27.61%-- free_swap_and_cache_nr
| | |--16.40%-- folio_remove_rmap_ptes
| | |--12.25%-- tlb_flush_mmu
| |--12.61%-- tlb_finish_mmu
The perf data applying patch 1 and patch 2:
|--98.84%-- oom_reaper
| |--53.45%-- unmap_page_range
| | |--24.29%-- [hit in function]
| | |--48.06%-- folio_remove_rmap_ptes
| | |--17.99%-- tlb_flush_mmu
| | |--1.72%-- __pte_offset_map_lock
| |
| |--30.43%-- tlb_finish_mmu
This is test data regarding the process running time.
With oom reaper (reverse traverse):
Thread TID State Wall duration (ms)
RxComputationT 13708 Running 60.69
oom_reaper 81 Running 46.49
Total (ms): 107.18
With oom reaper:
Thread TID State Wall duration (ms)
vdp:vidtask:m 14040 Running 81.85
oom_reaper 81 Running 69.32
Total (ms): 151.17
Without oom reaper:
Thread TID State Wall duration (ms)
tp-background 12424 Running 106.02
Total (ms): 106.02
Note: RxComputationT, vdp:vidtask:m, and tp-background are threads of the
same process, and they are the last threads to exit.
---
v5 -> v6:
- Use mas_for_each_rev() for VMA traversal [6]
- Simplify the judgment of whether to delay in queue_oom_reaper() [7]
- Refine changelog to better capture the essence of the changes [8]
- Use READ_ONCE(tsk->frozen) instead of checking mm and additional
checks inside for_each_process(), as it is sufficient [9]
- Add report tags because fengbaopeng and tianxiaobin reported the
high load issue of the reaper
v4 -> v5:
- Detect frozen state directly, avoid special futex handling. [3]
- Use mas_find_rev() for VMA traversal to avoid skipping entries. [4]
- Only check should_delay_oom_reap() in queue_oom_reaper(). [5]
v3 -> v4:
- Renamed functions and parameters for clarity. [2]
- Added should_delay_oom_reap() for OOM reap decisions.
- Traverse maple tree in reverse for improved behavior.
v2 -> v3:
- Fixed Subject prefix error.
v1 -> v2:
- Check robust_list for all threads, not just one. [1]
Reference:
[1] https://lore.kernel.org/linux-mm/u3mepw3oxj7cywezna4v72y2hvyc7bafkmsbirsbfuf34zpa7c@b23sc3rvp2gp/
[2] https://lore.kernel.org/linux-mm/87cy99g3k6.ffs@tglx/
[3] https://lore.kernel.org/linux-mm/aKRWtjRhE_HgFlp2@tiehlicka/
[4] https://lore.kernel.org/linux-mm/26larxehoe3a627s4fxsqghriwctays4opm4hhme3uk7ybjc5r@pmwh4s4yv7lm/
[5] https://lore.kernel.org/linux-mm/d5013a33-c08a-44c5-a67f-9dc8fd73c969@lucifer.local/
[6] https://lore.kernel.org/linux-mm/nwh7gegmvoisbxlsfwslobpbqku376uxdj2z32owkbftvozt3x@4dfet73fh2yy/
[7] https://lore.kernel.org/linux-mm/af4edeaf-d3c9-46a9-a300-dbaf5936e7d6@lucifer.local/
[8] https://lore.kernel.org/linux-mm/aK71W1ITmC_4I_RY@tiehlicka/
[9] https://lore.kernel.org/linux-mm/jzzdeczuyraup2zrspl6b74muf3bly2a3acejfftcldfmz4ekk@s5mcbeim34my/
The earlier post:
v5: https://lore.kernel.org/linux-mm/20250825133855.30229-1-zhongjinji@honor.com/
v4: https://lore.kernel.org/linux-mm/20250814135555.17493-1-zhongjinji@honor.com/
v3: https://lore.kernel.org/linux-mm/20250804030341.18619-1-zhongjinji@honor.com/
v2: https://lore.kernel.org/linux-mm/20250801153649.23244-1-zhongjinji@honor.com/
v1: https://lore.kernel.org/linux-mm/20250731102904.8615-1-zhongjinji@honor.com/
zhongjinji (2):
mm/oom_kill: Do not delay oom reaper when the victim is frozen
mm/oom_kill: The OOM reaper traverses the VMA maple tree in reverse
order
mm/oom_kill.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
--
2.17.1
Powered by blists - more mailing lists