[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALvZod5BKXs52A2R-d=aOsjB7idBejsMDgQUKc1H_6y=PuBsew@mail.gmail.com>
Date: Mon, 13 Jul 2020 07:50:51 -0700
From: Shakeel Butt <shakeelb@...gle.com>
To: Chris Down <chris@...isdown.name>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Linux MM <linux-mm@...ck.org>,
Cgroups <cgroups@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Kernel Team <kernel-team@...com>
Subject: Re: [PATCH v2 1/2] mm, memcg: reclaim more aggressively before high
allocator throttling
On Mon, Jul 13, 2020 at 4:42 AM Chris Down <chris@...isdown.name> wrote:
>
> In Facebook production, we've seen cases where cgroups have been put
> into allocator throttling even when they appear to have a lot of slack
> file caches which should be trivially reclaimable.
>
> Looking more closely, the problem is that we only try a single cgroup
> reclaim walk for each return to usermode before calculating whether or
> not we should throttle. This single attempt doesn't produce enough
> pressure to shrink for cgroups with a rapidly growing amount of file
> caches prior to entering allocator throttling.
>
> As an example, we see that threads in an affected cgroup are stuck in
> allocator throttling:
>
> # for i in $(cat cgroup.threads); do
> > grep over_high "/proc/$i/stack"
> > done
> [<0>] mem_cgroup_handle_over_high+0x10b/0x150
> [<0>] mem_cgroup_handle_over_high+0x10b/0x150
> [<0>] mem_cgroup_handle_over_high+0x10b/0x150
>
> ...however, there is no I/O pressure reported by PSI, despite a lot of
> slack file pages:
>
> # cat memory.pressure
> some avg10=78.50 avg60=84.99 avg300=84.53 total=5702440903
> full avg10=78.50 avg60=84.99 avg300=84.53 total=5702116959
> # cat io.pressure
> some avg10=0.00 avg60=0.00 avg300=0.00 total=78051391
> full avg10=0.00 avg60=0.00 avg300=0.00 total=78049640
> # grep _file memory.stat
> inactive_file 1370939392
> active_file 661635072
>
> This patch changes the behaviour to retry reclaim either until the
> current task goes below the 10ms grace period, or we are making no
> reclaim progress at all. In the latter case, we enter reclaim throttling
> as before.
>
> To a user, there's no intuitive reason for the reclaim behaviour to
> differ from hitting memory.high as part of a new allocation, as opposed
> to hitting memory.high because someone lowered its value. As such this
> also brings an added benefit: it unifies the reclaim behaviour between
> the two.
>
> There's precedent for this behaviour: we already do reclaim retries when
> writing to memory.{high,max}, in max reclaim, and in the page allocator
> itself.
>
> Signed-off-by: Chris Down <chris@...isdown.name>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Johannes Weiner <hannes@...xchg.org>
> Cc: Tejun Heo <tj@...nel.org>
> Cc: Michal Hocko <mhocko@...nel.org>
Reviewed-by: Shakeel Butt <shakeelb@...gle.com>
Powered by blists - more mailing lists