[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <28af480e-b99a-447d-999e-1dcb961d2709@nvidia.com>
Date: Fri, 13 Dec 2024 13:54:10 +1100
From: Balbir Singh <balbirs@...dia.com>
To: Rik van Riel <riel@...riel.com>, Roman Gushchin <roman.gushchin@...ux.dev>
Cc: Yosry Ahmed <yosryahmed@...gle.com>, Johannes Weiner
<hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>,
hakeel Butt <shakeel.butt@...ux.dev>, Muchun Song <muchun.song@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>, cgroups@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, kernel-team@...a.com,
Nhat Pham <nphamcs@...il.com>
Subject: Re: [PATCH v2] memcg: allow exiting tasks to write back data to swap
On 12/13/24 07:00, Rik van Riel wrote:
> On Thu, 12 Dec 2024 18:31:57 +0000
> Roman Gushchin <roman.gushchin@...ux.dev> wrote:
>
>> Is it about a single task or groups of tasks or the entire cgroup?
>> If former, why it's a problem? A tight memcg limit can slow things down
>> in general and I don't see why we should treat the exit() path differently.
>>
> I think the exit path does need to be treated a little differently,
> since this exit may be the only way such a cgroup can free up memory.
>
>> If it's about the entire cgroup and we have essentially a deadlock,
>> I feel like we need to look into the oom reaper side.
>
> You mean something like the below?
>
> I have not tested it yet, because we don't have any stuck
> cgroups right now among the workloads that I'm monitoring.
>
> ---8<---
>
> From c0e545fd45bd3ee24cd79b3d3e9b375e968ef460 Mon Sep 17 00:00:00 2001
> From: Rik van Riel <riel@...riel.com>
> Date: Thu, 12 Dec 2024 14:50:49 -0500
> Subject: [PATCH] memcg,oom: speed up reclaim for exiting tasks
>
> When a memcg reaches its memory limit, and reclaim becomes unavailable
> or slow for some reason, for example only zswap is available, but zswap
> writeback is disabled, it can take a long time for tasks to exit, and
> for the cgroup to get back to normal (or cleaned up).
>
> Speed up memcg reclaim for exiting tasks by limiting how much work
> reclaim does, and by invoking the OOM reaper if reclaim does not
> free up enough memory to allow the task to make progress.
>
> Signed-off-by: Rik van Riel <riel@...riel.com>
> ---
> include/linux/oom.h | 8 ++++++++
> mm/memcontrol.c | 11 +++++++++++
> mm/oom_kill.c | 6 +-----
> 3 files changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 1e0fc6931ce9..b2d9cf936664 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -111,4 +111,12 @@ extern void oom_killer_enable(void);
>
> extern struct task_struct *find_lock_task_mm(struct task_struct *p);
>
> +#ifdef CONFIG_MMU
> +extern void queue_oom_reaper(struct task_struct *tsk);
> +#else
> +static intern void queue_oom_reaper(struct task_struct *tsk)
> +{
> +}
> +#endif
> +
> #endif /* _INCLUDE_LINUX_OOM_H */
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 7b3503d12aaf..21f42758d430 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2231,6 +2231,9 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
> if (!gfpflags_allow_blocking(gfp_mask))
> goto nomem;
>
> + if (unlikely(current->flags & PF_EXITING))
> + gfp_mask |= __GFP_NORETRY;
> +
if (task_is_dying())
gfp_mask |= __GFP_NORETRY
> memcg_memory_event(mem_over_limit, MEMCG_MAX);
> raised_max_event = true;
>
> @@ -2284,6 +2287,14 @@ int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
> goto retry;
> }
> nomem:
> + /*
> + * We ran out of memory while inside exit. Maybe the OOM
> + * reaper can help reduce cgroup memory use and get things
> + * moving again?
> + */
> + if (unlikely(current->flags & PF_EXITING))
> + queue_oom_reaper(current);
> +
I am not sure this is helpful without task_will_free_mem(), the dying
task shouldn't get sent to the OOM killer when we run out of memory.
I did notice that we have heuristics around task_is_dying() and
passed_oom, sounds like the end result of your changes would be to
ignore the heuristics of passed_oom
Balbir Singh.
Powered by blists - more mailing lists