linux-kernel - Re: [PATCH v2] mm: optimize the redundant loop of mm_update_next

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240626064359.79119-1-alexjlzheng@tencent.com>
Date: Wed, 26 Jun 2024 14:43:59 +0800
From: Jinliang Zheng <alexjlzheng@...il.com>
To: mhocko@...e.com
Cc: akpm@...ux-foundation.org,
	alexjlzheng@...il.com,
	alexjlzheng@...cent.com,
	axboe@...nel.dk,
	brauner@...nel.org,
	ebiederm@...ssion.com,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	mjguzik@...il.com,
	oleg@...hat.com,
	tandersen@...flix.com,
	willy@...radead.org
Subject: Re: [PATCH v2] mm: optimize the redundant loop of mm_update_next_owner()

On Fri, 21 Jun 2024 10:50:10 +0200, Michal Hocko wrote:
> On Thu 20-06-24 19:30:19, Oleg Nesterov wrote:
> > Can't review, I forgot everything about mm_update_next_owner().
> > So I am sorry for the noise I am going to add, feel free to ignore.
> > Just in case, I see nothing wrong in this patch.
> > 
> > On 06/20, alexjlzheng@...il.com wrote:
> > >
> > > When mm_update_next_owner() is racing with swapoff (try_to_unuse()) or /proc or
> > > ptrace or page migration (get_task_mm()), it is impossible to find an
> > > appropriate task_struct in the loop whose mm_struct is the same as the target
> > > mm_struct.
> > >
> > > If the above race condition is combined with the stress-ng-zombie and
> > > stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
> > > write_lock_irq() for tasklist_lock.
> > >
> > > Recognize this situation in advance and exit early.
> > 
> > But this patch won't help if (say) ptrace_access_vm() sleeps while
> > for_each_process() tries to find another owner, right?
> > 
> > > @@ -484,6 +484,8 @@ void mm_update_next_owner(struct mm_struct *mm)
> > >  	 * Search through everything else, we should not get here often.
> > >  	 */
> > >  	for_each_process(g) {
> > > +		if (atomic_read(&mm->mm_users) <= 1)
> > > +			break;
> > 
> > I think this deserves a comment to explain that this is optimization
> > for the case we race with the pending mmput(). mm_update_next_owner()
> > checks mm_users at the start.
> > 
> > And. Can we drop tasklist and use rcu_read_lock() before for_each_process?
> > Yes, this will probably need more changes even if possible...
> > 
> > 
> > Or even better. Can't we finally kill mm_update_next_owner() and turn the
> > ugly mm->owner into mm->mem_cgroup ?
> 
> Yes, dropping the mm->owner should be a way to go. Replacing that by
> mem_cgroup sounds like an improvemnt. I have a vague recollection that

Sorry for the late reply.

Replacing that by mem_cgroup maybe a good idea, a rcu lock looks good, too.
But before the above optimization is implemented, I recommend using this
patch to alleviate it. Both [PATCH] and [PATCH v2] are acceptable, they only
differ in the commit log.

Thanks for your reply. :)
Jinliang Zheng

> this has some traps on the way. E.g. tasks sharing the mm but living in
> different cgroups. Things have changes since the last time I've checked
> and for example memcg charge migration on task move will be deprecated
> soon so chances are that there are less roadblocks on the way.
> -- 
> Michal Hocko
> SUSE Labs