linux-kernel - Re: [PATCH v2] mm: optimize the redundant loop of mm_update_next

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240620172958.GA2058@redhat.com>
Date: Thu, 20 Jun 2024 19:30:19 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: alexjlzheng@...il.com, Michal Hocko <mhocko@...nel.org>,
	"Eric W. Biederman" <ebiederm@...ssion.com>
Cc: akpm@...ux-foundation.org, brauner@...nel.org, axboe@...nel.dk,
	tandersen@...flix.com, willy@...radead.org, mjguzik@...il.com,
	alexjlzheng@...cent.com, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [PATCH v2] mm: optimize the redundant loop of
 mm_update_next_owner()

Can't review, I forgot everything about mm_update_next_owner().
So I am sorry for the noise I am going to add, feel free to ignore.
Just in case, I see nothing wrong in this patch.

On 06/20, alexjlzheng@...il.com wrote:
>
> When mm_update_next_owner() is racing with swapoff (try_to_unuse()) or /proc or
> ptrace or page migration (get_task_mm()), it is impossible to find an
> appropriate task_struct in the loop whose mm_struct is the same as the target
> mm_struct.
>
> If the above race condition is combined with the stress-ng-zombie and
> stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
> write_lock_irq() for tasklist_lock.
>
> Recognize this situation in advance and exit early.

But this patch won't help if (say) ptrace_access_vm() sleeps while
for_each_process() tries to find another owner, right?

> @@ -484,6 +484,8 @@ void mm_update_next_owner(struct mm_struct *mm)
>  	 * Search through everything else, we should not get here often.
>  	 */
>  	for_each_process(g) {
> +		if (atomic_read(&mm->mm_users) <= 1)
> +			break;

I think this deserves a comment to explain that this is optimization
for the case we race with the pending mmput(). mm_update_next_owner()
checks mm_users at the start.

And. Can we drop tasklist and use rcu_read_lock() before for_each_process?
Yes, this will probably need more changes even if possible...

Or even better. Can't we finally kill mm_update_next_owner() and turn the
ugly mm->owner into mm->mem_cgroup ?

Michal, Eric, iirc you had the patch(es) which do this?

Oleg.