[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210402191638.3249835-3-schatzberg.dan@gmail.com>
Date: Fri, 2 Apr 2021 12:16:33 -0700
From: Dan Schatzberg <schatzberg.dan@...il.com>
To: unlisted-recipients:; (no To-header on input)
Cc: Jens Axboe <axboe@...nel.dk>, Tejun Heo <tj@...nel.org>,
Zefan Li <lizefan.x@...edance.com>,
Johannes Weiner <hannes@...xchg.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...nel.org>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Hugh Dickins <hughd@...gle.com>,
Shakeel Butt <shakeelb@...gle.com>,
Roman Gushchin <guro@...com>,
Muchun Song <songmuchun@...edance.com>,
Yang Shi <shy828301@...il.com>,
Alex Shi <alex.shi@...ux.alibaba.com>,
Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
Wei Yang <richard.weiyang@...il.com>,
linux-block@...r.kernel.org (open list:BLOCK LAYER),
linux-kernel@...r.kernel.org (open list),
cgroups@...r.kernel.org (open list:CONTROL GROUP (CGROUP)),
linux-mm@...ck.org (open list:MEMORY MANAGEMENT),
Chris Down <chris@...isdown.name>
Subject: [PATCH 2/3] mm: Charge active memcg when no mm is set
set_active_memcg() worked for kernel allocations but was silently
ignored for user pages.
This patch establishes a precedence order for who gets charged:
1. If there is a memcg associated with the page already, that memcg is
charged. This happens during swapin.
2. If an explicit mm is passed, mm->memcg is charged. This happens
during page faults, which can be triggered in remote VMs (eg gup).
3. Otherwise consult the current process context. If there is an
active_memcg, use that. Otherwise, current->mm->memcg.
Previously, if a NULL mm was passed to mem_cgroup_charge (case 3) it
would always charge the root cgroup. Now it looks up the active_memcg
first (falling back to charging the root cgroup if not set).
Signed-off-by: Dan Schatzberg <schatzberg.dan@...il.com>
Acked-by: Johannes Weiner <hannes@...xchg.org>
Acked-by: Tejun Heo <tj@...nel.org>
Acked-by: Chris Down <chris@...isdown.name>
Reviewed-by: Shakeel Butt <shakeelb@...gle.com>
---
mm/filemap.c | 2 +-
mm/memcontrol.c | 48 +++++++++++++++++++++++++++++++-----------------
mm/shmem.c | 4 ++--
3 files changed, 34 insertions(+), 20 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index c03463cb72d6..38648f7d2106 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -872,7 +872,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
page->index = offset;
if (!huge) {
- error = mem_cgroup_charge(page, current->mm, gfp);
+ error = mem_cgroup_charge(page, NULL, gfp);
if (error)
goto error;
charged = true;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c0b83a396299..d2939d6602b3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -886,13 +886,24 @@ struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
}
EXPORT_SYMBOL(mem_cgroup_from_task);
+static __always_inline struct mem_cgroup *active_memcg(void)
+{
+ if (in_interrupt())
+ return this_cpu_read(int_active_memcg);
+ else
+ return current->active_memcg;
+}
+
/**
* get_mem_cgroup_from_mm: Obtain a reference on given mm_struct's memcg.
* @mm: mm from which memcg should be extracted. It can be NULL.
*
- * Obtain a reference on mm->memcg and returns it if successful. Otherwise
- * root_mem_cgroup is returned. However if mem_cgroup is disabled, NULL is
- * returned.
+ * Obtain a reference on mm->memcg and returns it if successful. If mm
+ * is NULL, then the memcg is chosen as follows:
+ * 1) The active memcg, if set.
+ * 2) current->mm->memcg, if available
+ * 3) root memcg
+ * If mem_cgroup is disabled, NULL is returned.
*/
struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm)
{
@@ -901,13 +912,23 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm)
if (mem_cgroup_disabled())
return NULL;
+ /*
+ * Page cache insertions can happen without an
+ * actual mm context, e.g. during disk probing
+ * on boot, loopback IO, acct() writes etc.
+ */
+ if (unlikely(!mm)) {
+ memcg = active_memcg();
+ if (unlikely(memcg)) {
+ /* remote memcg must hold a ref */
+ css_get(&memcg->css);
+ return memcg;
+ }
+ mm = current->mm;
+ }
+
rcu_read_lock();
do {
- /*
- * Page cache insertions can happen without an
- * actual mm context, e.g. during disk probing
- * on boot, loopback IO, acct() writes etc.
- */
if (unlikely(!mm))
memcg = root_mem_cgroup;
else {
@@ -921,14 +942,6 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm)
}
EXPORT_SYMBOL(get_mem_cgroup_from_mm);
-static __always_inline struct mem_cgroup *active_memcg(void)
-{
- if (in_interrupt())
- return this_cpu_read(int_active_memcg);
- else
- return current->active_memcg;
-}
-
static __always_inline bool memcg_kmem_bypass(void)
{
/* Allow remote memcg charging from any context. */
@@ -6537,7 +6550,8 @@ static int __mem_cgroup_charge(struct page *page, struct mem_cgroup *memcg,
* @gfp_mask: reclaim mode
*
* Try to charge @page to the memcg that @mm belongs to, reclaiming
- * pages according to @gfp_mask if necessary.
+ * pages according to @gfp_mask if necessary. if @mm is NULL, try to
+ * charge to the active memcg.
*
* Do not use this for pages allocated for swapin.
*
diff --git a/mm/shmem.c b/mm/shmem.c
index 5cfd2fb6e52b..524fa5aa0459 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1694,7 +1694,7 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index,
{
struct address_space *mapping = inode->i_mapping;
struct shmem_inode_info *info = SHMEM_I(inode);
- struct mm_struct *charge_mm = vma ? vma->vm_mm : current->mm;
+ struct mm_struct *charge_mm = vma ? vma->vm_mm : NULL;
struct page *page;
swp_entry_t swap;
int error;
@@ -1815,7 +1815,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
}
sbinfo = SHMEM_SB(inode->i_sb);
- charge_mm = vma ? vma->vm_mm : current->mm;
+ charge_mm = vma ? vma->vm_mm : NULL;
page = pagecache_get_page(mapping, index,
FGP_ENTRY | FGP_HEAD | FGP_LOCK, 0);
--
2.30.2
Powered by blists - more mailing lists