[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z4aYSdEamukBGAZi@tiehlicka>
Date: Tue, 14 Jan 2025 18:00:57 +0100
From: Michal Hocko <mhocko@...e.com>
To: Rik van Riel <riel@...riel.com>
Cc: Johannes Weiner <hannes@...xchg.org>,
Yosry Ahmed <yosryahmed@...gle.com>,
Balbir Singh <balbirs@...dia.com>,
Roman Gushchin <roman.gushchin@...ux.dev>,
hakeel Butt <shakeel.butt@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>, cgroups@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
kernel-team@...a.com, Nhat Pham <nphamcs@...il.com>
Subject: Re: [PATCH v2] memcg: allow exiting tasks to write back data to swap
On Tue 14-01-25 11:51:18, Rik van Riel wrote:
> On Tue, 2025-01-14 at 17:46 +0100, Michal Hocko wrote:
> > On Tue 14-01-25 11:09:55, Johannes Weiner wrote:
> >
> > >
> > > We managed to extract a stack trace of the livelocked task:
> > >
> > > obj_cgroup_may_swap
> > > zswap_store
> > > swap_writepage
> > > shrink_folio_list
> > > shrink_lruvec
> > > shrink_node
> > > do_try_to_free_pages
> > > try_to_free_mem_cgroup_pages
> >
> > OK, so this is the reclaim path and it fails due to reasons you
> > mention
> > below. This will retry several times until it hits mem_cgroup_oom
> > which
> > will bail in mem_cgroup_out_of_memory because of task_is_dying
> > (returns
> > true) and retry the charge + reclaim (as the oom killer hasn't done
> > anything) with passed_oom = true this time and eventually got to
> > nomem
> > path and returns ENOMEM. This should propaged -ENOMEM down the path
> >
> > > charge_memcg
> > > mem_cgroup_swapin_charge_folio
> > > __read_swap_cache_async
> > > swapin_readahead
> > > do_swap_page
> > > handle_mm_fault
> > > do_user_addr_fault
> > > exc_page_fault
> > > asm_exc_page_fault
> > > __get_user
> >
> > All the way here and return the failure to futex_cleanup which
> > doesn't
> > retry __get_user on the failure AFAICS (exit_robust_list). But I
> > might
> > be missing something, it's been quite some time since I've looked
> > into
> > futex code.
>
> Can you explain how -ENOMEM would get propagated down
> past the page fault handler?
>
> This isn't get_user_pages(), which can just pass
> -ENOMEM on to the caller.
>
> If there is code to pass -ENOMEM on past the page
> fault exception handler, I have not been able to
> find it. How does this work?
This might be me misunderstading get_user machinery but doesn't it
return a failure on PF handler returing ENOMEM?
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists