[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200601032204.124624-1-gthelen@google.com>
Date: Sun, 31 May 2020 20:22:04 -0700
From: Greg Thelen <gthelen@...gle.com>
To: Hugh Dickins <hughd@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Kirill Tkhai <ktkhai@...tuozzo.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Greg Thelen <gthelen@...gle.com>, stable@...r.kernel.org
Subject: [PATCH] shmem, memcg: enable memcg aware shrinker
Since v4.19 commit b0dedc49a2da ("mm/vmscan.c: iterate only over charged
shrinkers during memcg shrink_slab()") a memcg aware shrinker is only
called when the per-memcg per-node shrinker_map indicates that the
shrinker may have objects to release to the memcg and node.
shmem_unused_huge_count and shmem_unused_huge_scan support the per-tmpfs
shrinker which advertises per memcg and numa awareness. The shmem
shrinker releases memory by splitting hugepages that extend beyond
i_size.
Shmem does not currently set bits in shrinker_map. So, starting with
b0dedc49a2da, memcg reclaim avoids calling the shmem shrinker under
pressure. This leads to undeserved memcg OOM kills.
Example that reliably sees memcg OOM kill in unpatched kernel:
FS=/tmp/fs
CONTAINER=/cgroup/memory/tmpfs_shrinker
mkdir -p $FS
mount -t tmpfs -o huge=always nodev $FS
# Create 1000 MB container, which shouldn't suffer OOM.
mkdir $CONTAINER
echo 1000M > $CONTAINER/memory.limit_in_bytes
echo $BASHPID >> $CONTAINER/cgroup.procs
# Create 4000 files. Ideally each file uses 4k data page + a little
# metadata. Assume 8k total per-file, 32MB (4000*8k) should easily
# fit within container's 1000 MB. But if data pages use 2MB
# hugepages (due to aggressive huge=always) then files consume 8GB,
# which hits memcg 1000 MB limit.
for i in {1..4000}; do
echo . > $FS/$i
done
v5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg
aware") maintains the per-node per-memcg shrinker bitmap for THP
shrinker. But there's no such logic in shmem. Make shmem set the
per-memcg per-node shrinker bits when it modifies inodes to have
shrinkable pages.
Fixes: b0dedc49a2da ("mm/vmscan.c: iterate only over charged shrinkers during memcg shrink_slab()")
Cc: <stable@...r.kernel.org> # 4.19+
Signed-off-by: Greg Thelen <gthelen@...gle.com>
---
mm/shmem.c | 61 +++++++++++++++++++++++++++++++-----------------------
1 file changed, 35 insertions(+), 26 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index bd8840082c94..e11090f78cb5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1002,6 +1002,33 @@ static int shmem_getattr(const struct path *path, struct kstat *stat,
return 0;
}
+/*
+ * Expose inode and optional page to shrinker as having a possibly splittable
+ * hugepage that reaches beyond i_size.
+ */
+static void shmem_shrinker_add(struct shmem_sb_info *sbinfo,
+ struct inode *inode, struct page *page)
+{
+ struct shmem_inode_info *info = SHMEM_I(inode);
+
+ spin_lock(&sbinfo->shrinklist_lock);
+ /*
+ * _careful to defend against unlocked access to ->shrink_list in
+ * shmem_unused_huge_shrink()
+ */
+ if (list_empty_careful(&info->shrinklist)) {
+ list_add_tail(&info->shrinklist, &sbinfo->shrinklist);
+ sbinfo->shrinklist_len++;
+ }
+ spin_unlock(&sbinfo->shrinklist_lock);
+
+#ifdef CONFIG_MEMCG
+ if (page && PageTransHuge(page))
+ memcg_set_shrinker_bit(page->mem_cgroup, page_to_nid(page),
+ inode->i_sb->s_shrink.id);
+#endif
+}
+
static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
{
struct inode *inode = d_inode(dentry);
@@ -1048,17 +1075,13 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
* to shrink under memory pressure.
*/
if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
- spin_lock(&sbinfo->shrinklist_lock);
- /*
- * _careful to defend against unlocked access to
- * ->shrink_list in shmem_unused_huge_shrink()
- */
- if (list_empty_careful(&info->shrinklist)) {
- list_add_tail(&info->shrinklist,
- &sbinfo->shrinklist);
- sbinfo->shrinklist_len++;
- }
- spin_unlock(&sbinfo->shrinklist_lock);
+ struct page *page;
+
+ page = find_get_page(inode->i_mapping,
+ (newsize & HPAGE_PMD_MASK) >> PAGE_SHIFT);
+ shmem_shrinker_add(sbinfo, inode, page);
+ if (page)
+ put_page(page);
}
}
}
@@ -1889,21 +1912,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
if (PageTransHuge(page) &&
DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE) <
hindex + HPAGE_PMD_NR - 1) {
- /*
- * Part of the huge page is beyond i_size: subject
- * to shrink under memory pressure.
- */
- spin_lock(&sbinfo->shrinklist_lock);
- /*
- * _careful to defend against unlocked access to
- * ->shrink_list in shmem_unused_huge_shrink()
- */
- if (list_empty_careful(&info->shrinklist)) {
- list_add_tail(&info->shrinklist,
- &sbinfo->shrinklist);
- sbinfo->shrinklist_len++;
- }
- spin_unlock(&sbinfo->shrinklist_lock);
+ shmem_shrinker_add(sbinfo, inode, page);
}
/*
--
2.27.0.rc0.183.gde8f92d652-goog
Powered by blists - more mailing lists