lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aLVOICSkyvVRKD94@tiehlicka>
Date: Mon, 1 Sep 2025 09:41:20 +0200
From: Michal Hocko <mhocko@...e.com>
To: zhongjinji <zhongjinji@...or.com>
Cc: rientjes@...gle.com, shakeel.butt@...ux.dev, akpm@...ux-foundation.org,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	tglx@...utronix.de, liam.howlett@...cle.com,
	lorenzo.stoakes@...cle.com, surenb@...gle.com, liulu.liu@...or.com,
	feng.han@...or.com, tianxiaobin@...or.com, fengbaopeng@...or.com
Subject: Re: [PATCH v6 2/2] mm/oom_kill: The OOM reaper traverses the VMA
 maple tree in reverse order

On Fri 29-08-25 14:55:50, zhongjinji wrote:
> When a process is OOM killed without reaper delay, the oom reaper and the
> exit_mmap() thread likely run simultaneously. They traverse the vma's maple
> tree along the same path and may easily unmap the same vma, causing them to
> compete for the pte spinlock.
> 
> When a process exits, exit_mmap() traverses the vma's maple tree from low
> to high addresses. To reduce the chance of unmapping the same vma
> simultaneously, the OOM reaper should traverse the vma's tree from high to
> low address.
> 
> Reported-by: tianxiaobin <tianxiaobin@...or.com>
> Reported-by: fengbaopeng <fengbaopeng@...or.com>
> 
> Signed-off-by: zhongjinji <zhongjinji@...or.com>

The changelog could be improved because it is a bit confusing at this
stage. I haven't payed a close attention to previous discussion (sorry)
but there are two Reported-bys without any actual problem statement
(sure contention could happen but so what? What was the observed
behavior). Also the first paragraph states that "without reaper delay"
there is a problem but the only situation we do not have a dealay is
when the task is frozen and there is no racing there.

As already said in the previous response I think this makes conceptual
sense especially for oom victims with large address spaces which take
more that the OOM_REAPER_DELAY to die. Maybe you want to use that as a
justiciation. My wording would be
"
Although the oom_reaper is delayed and it gives the oom victim chance to
clean up its address space this might take a while especially for
processes with a large address space footprint. In those cases
oom_reaper might start racing with the dying task and compete for shared
resources - e.g. page table lock contention has been observed.

Reduce those races by reaping the oom victim from the other end of the
address space.
"

Anyway, with a changelog clarified.
Acked-by: Michal Hocko <mhocko@...e.com>

> ---
>  mm/oom_kill.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index a5e9074896a1..01665a666bf1 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -516,7 +516,7 @@ static bool __oom_reap_task_mm(struct mm_struct *mm)
>  {
>  	struct vm_area_struct *vma;
>  	bool ret = true;
> -	VMA_ITERATOR(vmi, mm, 0);
> +	MA_STATE(mas, &mm->mm_mt, ULONG_MAX, ULONG_MAX);
>  
>  	/*
>  	 * Tell all users of get_user/copy_from_user etc... that the content
> @@ -526,7 +526,12 @@ static bool __oom_reap_task_mm(struct mm_struct *mm)
>  	 */
>  	set_bit(MMF_UNSTABLE, &mm->flags);
>  
> -	for_each_vma(vmi, vma) {
> +	/*
> +	 * When two tasks unmap the same vma at the same time, they may contend
> +	 * for the pte spinlock. To reduce the probability of unmapping the same vma
> +	 * as exit_mmap, the OOM reaper traverses the vma maple tree in reverse order.
> +	 */
> +	mas_for_each_rev(&mas, vma, 0) {
>  		if (vma->vm_flags & (VM_HUGETLB|VM_PFNMAP))
>  			continue;
>  
> -- 
> 2.17.1

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ