[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200710122917.GB3022@dhcp22.suse.cz>
Date: Fri, 10 Jul 2020 14:29:17 +0200
From: Michal Hocko <mhocko@...nel.org>
To: Roman Gushchin <guro@...com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
Shakeel Butt <shakeelb@...gle.com>, linux-mm@...ck.org,
kernel-team@...com, linux-kernel@...r.kernel.org,
Domas Mituzas <domas@...com>, Tejun Heo <tj@...nel.org>,
Chris Down <chris@...isdown.name>
Subject: Re: [PATCH] mm: memcontrol: avoid workload stalls when lowering
memory.high
On Thu 09-07-20 12:47:18, Roman Gushchin wrote:
> Memory.high limit is implemented in a way such that the kernel
> penalizes all threads which are allocating a memory over the limit.
> Forcing all threads into the synchronous reclaim and adding some
> artificial delays allows to slow down the memory consumption and
> potentially give some time for userspace oom handlers/resource control
> agents to react.
>
> It works nicely if the memory usage is hitting the limit from below,
> however it works sub-optimal if a user adjusts memory.high to a value
> way below the current memory usage. It basically forces all workload
> threads (doing any memory allocations) into the synchronous reclaim
> and sleep. This makes the workload completely unresponsive for
> a long period of time and can also lead to a system-wide contention on
> lru locks. It can happen even if the workload is not actually tight on
> memory and has, for example, a ton of cold pagecache.
>
> In the current implementation writing to memory.high causes an atomic
> update of page counter's high value followed by an attempt to reclaim
> enough memory to fit into the new limit. To fix the problem described
> above, all we need is to change the order of execution: try to push
> the memory usage under the limit first, and only then set the new
> high limit.
Shakeel would this help with your pro-active reclaim usecase? It would
require to reset the high limit right after the reclaim returns which is
quite ugly but it would at least not require a completely new interface.
You would simply do
high = current - to_reclaim
echo $high > memory.high
echo infinity > memory.high # To prevent direct reclaim
# allocation stalls
The primary reason to set the high limit in advance was to catch
potential runaways more effectively because they would just get
throttled while memory_high_write is reclaiming. With this change
the reclaim here might be just playing never ending catch up. On the
plus side a break out from the reclaim loop would just enforce the limit
so if the operation takes too long then the reclaim burden will move
over to consumers eventually. So I do not see any real danger.
> Signed-off-by: Roman Gushchin <guro@...com>
> Reported-by: Domas Mituzas <domas@...com>
> Cc: Johannes Weiner <hannes@...xchg.org>
> Cc: Michal Hocko <mhocko@...nel.org>
> Cc: Tejun Heo <tj@...nel.org>
> Cc: Shakeel Butt <shakeelb@...gle.com>
> Cc: Chris Down <chris@...isdown.name>
Acked-by: Michal Hocko <mhocko@...e.com>
> ---
> mm/memcontrol.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b8424aa56e14..4b71feee7c42 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6203,8 +6203,6 @@ static ssize_t memory_high_write(struct kernfs_open_file *of,
> if (err)
> return err;
>
> - page_counter_set_high(&memcg->memory, high);
> -
> for (;;) {
> unsigned long nr_pages = page_counter_read(&memcg->memory);
> unsigned long reclaimed;
> @@ -6228,6 +6226,8 @@ static ssize_t memory_high_write(struct kernfs_open_file *of,
> break;
> }
>
> + page_counter_set_high(&memcg->memory, high);
> +
> return nbytes;
> }
>
> --
> 2.26.2
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists