[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z4aU7dn_TKeeTmP_@tiehlicka>
Date: Tue, 14 Jan 2025 17:46:37 +0100
From: Michal Hocko <mhocko@...e.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Yosry Ahmed <yosryahmed@...gle.com>, Rik van Riel <riel@...riel.com>,
Balbir Singh <balbirs@...dia.com>,
Roman Gushchin <roman.gushchin@...ux.dev>,
hakeel Butt <shakeel.butt@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>, cgroups@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
kernel-team@...a.com, Nhat Pham <nphamcs@...il.com>
Subject: Re: [PATCH v2] memcg: allow exiting tasks to write back data to swap
On Tue 14-01-25 11:09:55, Johannes Weiner wrote:
> Hi,
>
> On Mon, Dec 16, 2024 at 04:39:12PM +0100, Michal Hocko wrote:
> > On Thu 12-12-24 13:30:12, Johannes Weiner wrote:
[...]
> > > If we return -ENOMEM to an OOM victim in a fault, the fault handler
> > > will re-trigger OOM, which will find the existing OOM victim and do
> > > nothing, then restart the fault.
> >
> > IIRC the task will handle the pending SIGKILL if the #PF fails. If the
> > charge happens from the exit path then we rely on ENOMEM returned from
> > gup as a signal to back off. Do we have any caller that keeps retrying
> > on ENOMEM?
>
> We managed to extract a stack trace of the livelocked task:
>
> obj_cgroup_may_swap
> zswap_store
> swap_writepage
> shrink_folio_list
> shrink_lruvec
> shrink_node
> do_try_to_free_pages
> try_to_free_mem_cgroup_pages
OK, so this is the reclaim path and it fails due to reasons you mention
below. This will retry several times until it hits mem_cgroup_oom which
will bail in mem_cgroup_out_of_memory because of task_is_dying (returns
true) and retry the charge + reclaim (as the oom killer hasn't done
anything) with passed_oom = true this time and eventually got to nomem
path and returns ENOMEM. This should propaged -ENOMEM down the path
> charge_memcg
> mem_cgroup_swapin_charge_folio
> __read_swap_cache_async
> swapin_readahead
> do_swap_page
> handle_mm_fault
> do_user_addr_fault
> exc_page_fault
> asm_exc_page_fault
> __get_user
All the way here and return the failure to futex_cleanup which doesn't
retry __get_user on the failure AFAICS (exit_robust_list). But I might
be missing something, it's been quite some time since I've looked into
futex code.
> futex_cleanup
> fuxtex_exit_release
> do_exit
> do_group_exit
> get_signal
> arch_do_signal_or_restart
> exit_to_user_mode_prepare
> syscall_exit_to_user_mode
> do_syscall
> entry_SYSCALL_64
> syscall
>
> Both memory.max and memory.zswap.max are hit. I don't see how this
> could ever make forward progress - the futex fault will retry until it
> succeeds.
I must be missing something but I do not see the retry, could you point
me where this is happening please?
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists