lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200512215813.GA487759@cmpxchg.org>
Date:   Tue, 12 May 2020 17:58:13 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Qian Cai <cai@....pw>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Alex Shi <alex.shi@...ux.alibaba.com>,
        Joonsoo Kim <js1304@...il.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        Hugh Dickins <hughd@...gle.com>,
        Michal Hocko <mhocko@...e.com>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        Roman Gushchin <guro@...com>, Linux-MM <linux-mm@...ck.org>,
        cgroups@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
        kernel-team@...com
Subject: Re: [PATCH 12/19] mm: memcontrol: convert anon and file-thp to new
 mem_cgroup_charge() API

On Tue, May 12, 2020 at 10:38:54AM -0400, Qian Cai wrote:
> > On May 8, 2020, at 2:30 PM, Johannes Weiner <hannes@...xchg.org> wrote:
> > 
> > With the page->mapping requirement gone from memcg, we can charge anon
> > and file-thp pages in one single step, right after they're allocated.
> > 
> > This removes two out of three API calls - especially the tricky commit
> > step that needed to happen at just the right time between when the
> > page is "set up" and when it's "published" - somewhat vague and fluid
> > concepts that varied by page type. All we need is a freshly allocated
> > page and a memcg context to charge.
> > 
> > v2: prevent double charges on pre-allocated hugepages in khugepaged
> > 
> > Signed-off-by: Johannes Weiner <hannes@...xchg.org>
> > Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@....com>
> > ---
> > include/linux/mm.h      |  4 +---
> > kernel/events/uprobes.c | 11 +++--------
> > mm/filemap.c            |  2 +-
> > mm/huge_memory.c        |  9 +++------
> > mm/khugepaged.c         | 35 ++++++++++-------------------------
> > mm/memory.c             | 36 ++++++++++--------------------------
> > mm/migrate.c            |  5 +----
> > mm/swapfile.c           |  6 +-----
> > mm/userfaultfd.c        |  5 +----
> > 9 files changed, 31 insertions(+), 82 deletions(-)
> []
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > 
> > @@ -1198,10 +1193,11 @@ static void collapse_huge_page(struct mm_struct *mm,
> > out_up_write:
> > 	up_write(&mm->mmap_sem);
> > out_nolock:
> > +	if (*hpage)
> > +		mem_cgroup_uncharge(*hpage);
> > 	trace_mm_collapse_huge_page(mm, isolated, result);
> > 	return;
> > out:
> > -	mem_cgroup_cancel_charge(new_page, memcg);
> > 	goto out_up_write;
> > }
> []
> 
> Some memory pressure will crash this new code. It looks like somewhat racy.
> 
> if (!page->mem_cgroup)
> 
> where page == NULL in mem_cgroup_uncharge().

Thanks for the report, sorry about the inconvenience.

Hm, the page is exclusive at this point, nobody else should be
touching it. After all, khugepaged might reuse the preallocated page
for another pmd if this one fails to collapse.

Looking at the code, I think it's page itself that's garbage, not
page->mem_cgroup changing. If you have CONFIG_NUMA and the allocation
fails, *hpage could contain an ERR_PTR instead of being NULL.

I think we need the following fixlet:

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index f2e0a5e5cfbb..f6161e17da26 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1193,7 +1193,7 @@ static void collapse_huge_page(struct mm_struct *mm,
 out_up_write:
 	up_write(&mm->mmap_sem);
 out_nolock:
-	if (*hpage)
+	if (!IS_ERR_OR_NULL(*hpage))
 		mem_cgroup_uncharge(*hpage);
 	trace_mm_collapse_huge_page(mm, isolated, result);
 	return;
@@ -1928,7 +1928,7 @@ static void collapse_file(struct mm_struct *mm,
 	unlock_page(new_page);
 out:
 	VM_BUG_ON(!list_empty(&pagelist));
-	if (*hpage)
+	if (!IS_ERR_OR_NULL(*hpage))
 		mem_cgroup_uncharge(*hpage);
 	/* TODO: tracepoints */
 }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ