[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eee7d740-cf71-40d3-a037-543ae28c187a@vivo.com>
Date: Wed, 10 Sep 2025 22:01:35 +0800
From: Lei Liu <liulei.rjpt@...o.com>
To: Kairui Song <ryncsn@...il.com>
Cc: Michal Hocko <mhocko@...e.com>, David Rientjes <rientjes@...gle.com>,
Shakeel Butt <shakeel.butt@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>,
Kemeng Shi <shikemeng@...weicloud.com>, Nhat Pham <nphamcs@...il.com>,
Baoquan He <bhe@...hat.com>, Barry Song <baohua@...nel.org>,
Chris Li <chrisl@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>, David Hildenbrand <david@...hat.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka
<vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Brendan Jackman
<jackmanb@...gle.com>, Zi Yan <ziy@...dia.com>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Chen Yu <yu.c.chen@...el.com>, Hao Jia <jiahao1@...iang.com>,
"Kirill A. Shutemov" <kas@...nel.org>, Usama Arif <usamaarif642@...il.com>,
Oleg Nesterov <oleg@...hat.com>, Christian Brauner <brauner@...nel.org>,
Mateusz Guzik <mjguzik@...il.com>, Steven Rostedt <rostedt@...dmis.org>,
Andrii Nakryiko <andrii@...nel.org>, Al Viro <viro@...iv.linux.org.uk>,
Fushuai Wang <wangfushuai@...du.com>,
"open list:MEMORY MANAGEMENT - OOM KILLER" <linux-mm@...ck.org>,
open list <linux-kernel@...r.kernel.org>,
"open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)"
<cgroups@...r.kernel.org>
Subject: Re: [PATCH v0 0/2] mm: swap: Gather swap entries and batch async
release
On 2025/9/9 15:30, Kairui Song wrote:
> [You don't often get email from ryncsn@...il.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> On Tue, Sep 9, 2025 at 3:04 PM Lei Liu <liulei.rjpt@...o.com> wrote:
> Hi Lei,
>
>> 1. Problem Scenario
>> On systems with ZRAM and swap enabled, simultaneous process exits create
>> contention. The primary bottleneck occurs during swap entry release
>> operations, causing exiting processes to monopolize CPU resources. This
>> leads to scheduling delays for high-priority processes.
>>
>> 2. Android Use Case
>> During camera launch, LMKD terminates background processes to free memory.
>> Exiting processes compete for CPU cycles, delaying the camera preview
>> thread and causing visible stuttering - directly impacting user
>> experience.
>>
>> 3. Root Cause Analysis
>> When background applications heavily utilize swap space, process exit
>> profiling reveals 55% of time spent in free_swap_and_cache_nr():
>>
>> Function Duration (ms) Percentage
>> do_signal 791.813 **********100%
>> do_group_exit 791.813 **********100%
>> do_exit 791.813 **********100%
>> exit_mm 577.859 *******73%
>> exit_mmap 577.497 *******73%
>> zap_pte_range 558.645 *******71%
>> free_swap_and_cache_nr 433.381 *****55%
>> free_swap_slot 403.568 *****51%
> Thanks for sharing this case.
>
> One problem is that now the free_swap_slot function no longer exists
> after 0ff67f990bd4. Have you tested the latest kernel? Or what is the
> actual overhead here?
>
> Some batch freeing optimizations are introduced. And we have reworked
> the whole locking mechanism for swap, so even on a system with 96t the
> contention seems barely observable with common workloads.
>
> And another series is further reducing the contention and the overall
> overhead (24% faster freeing for phase 1):
> https://lore.kernel.org/linux-mm/20250905191357.78298-1-ryncsn@gmail.com/
>
> Will these be helpful for you? I think optimizing the root problem is
> better than just deferring the overhead with async workers, which may
> increase the overall overhead and complexity.
Hi Kairui
Thank you for your optimization suggestions. We believe your patch may
help ou
r scenario. We'll try integrating it to evaluate benefits. However, it
may not
fully solve our issue. Below is our problem description:
Flame graph of time distribution for TikTok process exit (~400MB swapped):
do_notify_resume 3.89%
get_signal 3.89%
do_signal_exit 3.88%
do_exit 3.88%
mmput 3.22%
exit_mmap 3.22%
unmap_vmas 3.08%
unmap_page_range 3.07%
free_swap_and_cache_nr 1.31%****
swap_entry_range_free 1.17%****
zram_slot_free_notify 1.11%****
zram_free_hw_entry_dc 0.43%
free_zspage[zsmalloc] 0.09%
CPU: 8-core ARM64 (14.21GHz+33.5GHz+4*2.7GHz), 12GB RAM
Process with ~400MB swap exit situation:
Exit takes 200-300ms, ~4% CPU load
With more zram compression/swap, exit time increases to 400-500ms
free_swap_and_cache_nr avg: 0.5ms, max: ~1.5ms (running time)
free_swap_and_cache_nr dominates exit time (33%, up to 50% in worst cases
). Main time is zram resource freeing (0.25ms per operation). With dozens
of simultaneous exits, cumulative time becomes significant.
Optimization approach:
Focus isn't on optimizing hot functions (limited improvement potential).
High load comes from too many simultaneous exits. We'll make time-consumin
g interfaces in do_exit asynchronous to accelerate exit completion while
allowing non-swap page (file/anonymous) freeing by other processes.
Camera startup scenario:
20-30 background apps, anonymous pages compressed to zram (200-500MB).
Camera launch triggers lmkd to kill 10+ apps - their exits consume 25%+
CPU. System services/third-party processes use 60%+ CPU, leaving camera
startup process CPU-starved and delayed.
Sincere wishes,
Lei
>
>
>> swap_entry_free 393.863 *****50%
>> swap_range_free 372.602 ****47%
>>
>> 4. Optimization Approach
>> a) For processes exceeding swap entry threshold: aggregate and isolate
>> swap entries to enable fast exit
>> b) Asynchronously release batched entries when isolation reaches
>> configured threshold
>>
>> 5. Performance Gains (User Scenario: Camera Cold Launch)
>> a) 74% reduction in process exit latency (>500ms cases)
>> b) ~4% lower peak CPU load during concurrent process exits
>> c) ~70MB additional free memory during camera preview initialization
>> d) 40% reduction in camera preview stuttering probability
>>
>> 6. Prior Art & Improvements
>> Reference: Zhiguo Jiang's patch
>> (https://lore.kernel.org/all/20240805153639.1057-1-justinjiang@vivo.com/)
>>
>> Key enhancements:
>> a) Reimplemented logic moved from mmu_gather.c to swapfile.c for clarity
>> b) Async release delegated to workqueue kworkers with configurable
>> max_active for NUMA-optimized concurrency
>>
>> Lei Liu (2):
>> mm: swap: Gather swap entries and batch async release core
>> mm: swap: Forced swap entries release under memory pressure
>>
>> include/linux/oom.h | 23 ++++++
>> include/linux/swapfile.h | 2 +
>> include/linux/vm_event_item.h | 1 +
>> kernel/exit.c | 2 +
>> mm/memcontrol.c | 6 --
>> mm/memory.c | 4 +-
>> mm/page_alloc.c | 4 +
>> mm/swapfile.c | 134 ++++++++++++++++++++++++++++++++++
>> mm/vmstat.c | 1 +
>> 9 files changed, 170 insertions(+), 7 deletions(-)
>>
>> --
>> 2.34.1
>>
>>
Powered by blists - more mailing lists