[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YC+B2KvJVSgfVDTe@dhcp22.suse.cz>
Date: Fri, 19 Feb 2021 10:16:08 +0100
From: Michal Hocko <mhocko@...e.com>
To: Tim Chen <tim.c.chen@...ux.intel.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Dave Hansen <dave.hansen@...el.com>,
Ying Huang <ying.huang@...el.com>, linux-mm@...ck.org,
cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 3/3] mm: Fix missing mem cgroup soft limit tree updates
On Wed 17-02-21 12:41:36, Tim Chen wrote:
> On a per node basis, the mem cgroup soft limit tree on each node tracks
> how much a cgroup has exceeded its soft limit memory limit and sorts
> the cgroup by its excess usage. On page release, the trees are not
> updated right away, until we have gathered a batch of pages belonging to
> the same cgroup. This reduces the frequency of updating the soft limit tree
> and locking of the tree and associated cgroup.
>
> However, the batch of pages could contain pages from multiple nodes but
> only the soft limit tree from one node would get updated. Change the
> logic so that we update the tree in batch of pages, with each batch of
> pages all in the same mem cgroup and memory node. An update is issued for
> the batch of pages of a node collected till now whenever we encounter
> a page belonging to a different node. Note that this batching for
> the same node logic is only relevant for v1 cgroup that has a memory
> soft limit.
Let me paste the discussion related to this patch from other reply:
> >> For patch 3 regarding the uncharge_batch, it
> >> is more of an observation that we should uncharge in batch of same node
> >> and not prompted by actual workload.
> >> Thinking more about this, the worst that could happen
> >> is we could have some entries in the soft limit tree that overestimate
> >> the memory used. The worst that could happen is a soft page reclaim
> >> on that cgroup. The overhead from extra memcg event update could
> >> be more than a soft page reclaim pass. So let's drop patch 3
> >> for now.
> >
> > I would still prefer to handle that in the soft limit reclaim path and
> > check each memcg for the soft limit reclaim excess before the reclaim.
> >
>
> Something like this?
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8bddee75f5cb..b50cae3b2a1a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3472,6 +3472,14 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
> if (!mz)
> break;
>
> + /*
> + * Soft limit tree is updated based on memcg events sampling.
> + * We could have missed some updates on page uncharge and
> + * the cgroup is below soft limit. Skip useless soft reclaim.
> + */
> + if (!soft_limit_excess(mz->memcg))
> + continue;
> +
> nr_scanned = 0;
> reclaimed = mem_cgroup_soft_reclaim(mz->memcg, pgdat,
Yes I meant something like this but then I have looked more closely and
this shouldn't be needed afterall. __mem_cgroup_largest_soft_limit_node
already does all the work
if (!soft_limit_excess(mz->memcg) ||
!css_tryget(&mz->memcg->css))
goto retry;
so this shouldn't really happen.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists