[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231129032154.3710765-1-yosryahmed@google.com>
Date: Wed, 29 Nov 2023 03:21:48 +0000
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Shakeel Butt <shakeelb@...gle.com>,
Muchun Song <muchun.song@...ux.dev>,
Ivan Babrou <ivan@...udflare.com>, Tejun Heo <tj@...nel.org>,
"Michal Koutný" <mkoutny@...e.com>,
Waiman Long <longman@...hat.com>, kernel-team@...udflare.com,
Wei Xu <weixugc@...gle.com>, Greg Thelen <gthelen@...gle.com>,
Domenico Cerasuolo <cerasuolodomenico@...il.com>,
linux-mm@...ck.org, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, Yosry Ahmed <yosryahmed@...gle.com>
Subject: [mm-unstable v4 0/5] mm: memcg: subtree stats flushing and thresholds
This series attempts to address shortages in today's approach for memcg
stats flushing, namely occasionally stale or expensive stat reads. The
series does so by changing the threshold that we use to decide whether
to trigger a flush to be per memcg instead of global (patch 3), and then
changing flushing to be per memcg (i.e. subtree flushes) instead of
global (patch 5).
Patch 3 & 5 are the core of the series, and they include more details
and testing results. The rest are either cleanups or prep work.
This series replaces the "memcg: more sophisticated stats flushing"
series [1], which also replaces another series, in a long list of
attempts to improve memcg stats flushing. It is not a new version of
the same patchset as it is a completely different approach. This is
based on collected feedback from discussions on lkml in all previous
attempts. Hopefully, this is the final attempt.
There was a reported regression in v2 [2] for will-it-scale::fallocate
benchmark. I believe this regression should not affect production
workloads. This specific benchmark is allocating and freeing memory
(using fallocate/ftruncate) at a rate that is much faster to make actual
use of the memory. Testing this series on 100+ machines running
production workloads did not show any practical regressions in page
fault latency or allocation latency, but it showed great improvements in
stats read time. I do not have numbers about the exact improvements for
this series, but combined with another optimization for cgroup v1 [3] we
see 5-10x improvements. A significant chunk of that is coming from the
cgroup v1 optimization, but this series also made an improvement as
reported by Domenico [4].
v3 -> v4:
- Rebased on top of mm-unstable + "workload-specific and memory
pressure-driven zswap writeback" series to fix conflicts [5].
v3: https://lore.kernel.org/all/20231116022411.2250072-1-yosryahmed@google.com/
[1]https://lore.kernel.org/lkml/20230913073846.1528938-1-yosryahmed@google.com/
[2]https://lore.kernel.org/lkml/202310202303.c68e7639-oliver.sang@intel.com/
[3]https://lore.kernel.org/lkml/20230803185046.1385770-1-yosryahmed@google.com/
[4]https://lore.kernel.org/lkml/CAFYChMv_kv_KXOMRkrmTN-7MrfgBHMcK3YXv0dPYEL7nK77e2A@mail.gmail.com/
[5]https://lore.kernel.org/all/20231127234600.2971029-1-nphamcs@gmail.com/
Yosry Ahmed (5):
mm: memcg: change flush_next_time to flush_last_time
mm: memcg: move vmstats structs definition above flushing code
mm: memcg: make stats flushing threshold per-memcg
mm: workingset: move the stats flush into workingset_test_recent()
mm: memcg: restore subtree stats flushing
include/linux/memcontrol.h | 8 +-
mm/memcontrol.c | 272 +++++++++++++++++++++----------------
mm/vmscan.c | 2 +-
mm/workingset.c | 42 ++++--
4 files changed, 188 insertions(+), 136 deletions(-)
--
2.43.0.rc1.413.gea7ed67945-goog
Powered by blists - more mailing lists