linux-kernel - Re: [PATCH v2] memcg: allow exiting tasks to write back data to swap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z4aXU-piAytmZpbs@tiehlicka>
Date: Tue, 14 Jan 2025 17:56:51 +0100
From: Michal Hocko <mhocko@...e.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Yosry Ahmed <yosryahmed@...gle.com>, Rik van Riel <riel@...riel.com>,
	Balbir Singh <balbirs@...dia.com>,
	Roman Gushchin <roman.gushchin@...ux.dev>,
	hakeel Butt <shakeel.butt@...ux.dev>,
	Muchun Song <muchun.song@...ux.dev>,
	Andrew Morton <akpm@...ux-foundation.org>, cgroups@...r.kernel.org,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	kernel-team@...a.com, Nhat Pham <nphamcs@...il.com>
Subject: Re: [PATCH v2] memcg: allow exiting tasks to write back data to swap

On Tue 14-01-25 17:54:17, Michal Hocko wrote:
> On Tue 14-01-25 17:46:39, Michal Hocko wrote:
> > On Tue 14-01-25 11:09:55, Johannes Weiner wrote:
> > > Hi,
> > > 
> > > On Mon, Dec 16, 2024 at 04:39:12PM +0100, Michal Hocko wrote:
> > > > On Thu 12-12-24 13:30:12, Johannes Weiner wrote:
> > [...]
> > > > > If we return -ENOMEM to an OOM victim in a fault, the fault handler
> > > > > will re-trigger OOM, which will find the existing OOM victim and do
> > > > > nothing, then restart the fault.
> > > > 
> > > > IIRC the task will handle the pending SIGKILL if the #PF fails. If the
> > > > charge happens from the exit path then we rely on ENOMEM returned from
> > > > gup as a signal to back off. Do we have any caller that keeps retrying
> > > > on ENOMEM?
> > > 
> > > We managed to extract a stack trace of the livelocked task:
> > > 
> > > obj_cgroup_may_swap
> > > zswap_store
> > > swap_writepage
> > > shrink_folio_list
> > > shrink_lruvec
> > > shrink_node
> > > do_try_to_free_pages
> > > try_to_free_mem_cgroup_pages
> > 
> > OK, so this is the reclaim path and it fails due to reasons you mention
> > below. This will retry several times until it hits mem_cgroup_oom which
> > will bail in mem_cgroup_out_of_memory because of task_is_dying (returns
> > true) and retry the charge + reclaim (as the oom killer hasn't done
> > anything) with passed_oom = true this time and eventually got to nomem
> > path and returns ENOMEM.  SUSE Labs
> 
> Btw. is there any actual reason why we cannot go nomem without going
> to the oom killer (just to bail out) and go through the whole cycle
> again? That seems arbitrary and simply burning a lot of cycle without
> much chances to make any better outcome
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 7b3503d12aaf..eb45eaf0acfc 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2268,8 +2268,7 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
>  	if (gfp_mask & __GFP_RETRY_MAYFAIL)
>  		goto nomem;
>  
> -	/* Avoid endless loop for tasks bypassed by the oom killer */
> -	if (passed_oom && task_is_dying())
> +	if (task_is_dying())
>  		goto nomem;
>  
>  	/*

Just to clarify, only if we have strong reasons to keep bail out in the
oom killer path. If we go with the change proposed in the other email,
this doesn't make sense.
-- 
Michal Hocko
SUSE Labs