lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF8kJuMJCHJwNtnPiBKz=OJg8L7m7MVWydWRoQMK1HpiFvLDpQ@mail.gmail.com>
Date: Wed, 10 Sep 2025 09:05:02 -0700
From: Chris Li <chrisl@...nel.org>
To: Lei Liu <liulei.rjpt@...o.com>
Cc: Suren Baghdasaryan <surenb@...gle.com>, Shakeel Butt <shakeel.butt@...ux.dev>, 
	Michal Hocko <mhocko@...e.com>, David Rientjes <rientjes@...gle.com>, 
	Andrew Morton <akpm@...ux-foundation.org>, Kemeng Shi <shikemeng@...weicloud.com>, 
	Kairui Song <kasong@...cent.com>, Nhat Pham <nphamcs@...il.com>, Baoquan He <bhe@...hat.com>, 
	Barry Song <baohua@...nel.org>, Johannes Weiner <hannes@...xchg.org>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, Muchun Song <muchun.song@...ux.dev>, 
	David Hildenbrand <david@...hat.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, 
	"Liam R. Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka <vbabka@...e.cz>, 
	Mike Rapoport <rppt@...nel.org>, Brendan Jackman <jackmanb@...gle.com>, Zi Yan <ziy@...dia.com>, 
	"Peter Zijlstra (Intel)" <peterz@...radead.org>, Chen Yu <yu.c.chen@...el.com>, 
	Hao Jia <jiahao1@...iang.com>, "Kirill A. Shutemov" <kas@...nel.org>, 
	Usama Arif <usamaarif642@...il.com>, Oleg Nesterov <oleg@...hat.com>, 
	Christian Brauner <brauner@...nel.org>, Mateusz Guzik <mjguzik@...il.com>, 
	Steven Rostedt <rostedt@...dmis.org>, Andrii Nakryiko <andrii@...nel.org>, 
	Al Viro <viro@...iv.linux.org.uk>, Fushuai Wang <wangfushuai@...du.com>, 
	"open list:MEMORY MANAGEMENT - OOM KILLER" <linux-mm@...ck.org>, open list <linux-kernel@...r.kernel.org>, 
	"open list:CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)" <cgroups@...r.kernel.org>
Subject: Re: [PATCH v0 0/2] mm: swap: Gather swap entries and batch async release

On Wed, Sep 10, 2025 at 7:14 AM Lei Liu <liulei.rjpt@...o.com> wrote:
> >> On Android, I suppose most of the memory is associated with single or
> >> small set of processes and parallelizing memory freeing would be
> >> challenging. BTW is LMKD using process_mrelease() to release the killed
> >> process memory?
> > Yes, LMKD has a reaper thread which wakes up and calls
> > process_mrelease() after the main LMKD thread issued SIGKILL.
>
> Hi Suren
>
> our current issue is that after lmkd kills a process,|exit_mm|takes
> considerable time. The interface you provided might help quickly free
> memory, potentially allowing us to release some memory from processes
> before lmkd kills them. This could be a good idea.
>
> We will take your suggestion into consideration.

Hi Lei,

I do want to help your usage case. With my previous analysis of the
swap fault time breakdown. The amount of time it spends on batching
freeing the swap entry is not that much. Yes, it has a long tail, but
that is on a very small percentage of page faults. It shouldn't have
such a huge impact on the global average time.

https://services.google.com/fh/files/misc/zswap-breakdown.png
https://services.google.com/fh/files/misc/zswap-breakdown-detail.png

That is what I am trying to get to, the batch free of swap entry is
just the surface level. By itself it does not contribute much. Your
exit latency is largely a different issue.

However, the approach you take, (I briefly go over your patch) is to
add another batch layer for the swap entry free. Which impacts not
only the exit() path, it impacts other non exit() freeing of swap
entry as well. The swap entry is a resource best managed by the swap
allocator. The swap allocator knows best when it is the best place to
cache it vs freeing it under pressure. The extra batch of swap entry
free (before triggering the threshold) is just swap entry seating in
the batch queue. The allocator has no internal knowledge of this batch
behavior and it is interfering with the global view of swap entry
allocator. You need to address this before your patch can be
re-considered.

It feels like a CFO needs to do a company wide budget and revenue
projection. The sales department is having a side pocket account to
defer the revenue and sand bagging the sales number, which can
jeopardize the CFO's ability to budget and project . BTW, what I
describe is probably illegal for public companies. Kids, don't try
this at home.

I think you can do some of the following:
1) redo the test with the latest kernel which does not have the swap
slot caching batching any more. Report back what you got.
2) try out the process_mrelease().

Please share your findings, I am happy to work with you to address the
problem you encounter.

Chris

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ