linux-kernel - [PATCH] arch, mm: introduce arch_tlb_gather_mmu_lazy (was: Re: [RESEND PATCH] mm, oom_reaper: gather each vma to prevent) leaking TLB entry

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171110122635.q26xdxytgdfjy5q3@dhcp22.suse.cz>
Date:   Fri, 10 Nov 2017 13:26:35 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Minchan Kim <minchan@...nel.org>
Cc:     Wang Nan <wangnan0@...wei.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, will.deacon@....com,
        Bob Liu <liubo95@...wei.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Rientjes <rientjes@...gle.com>,
        Ingo Molnar <mingo@...nel.org>, Roman Gushchin <guro@...com>,
        Konstantin Khlebnikov <khlebnikov@...dex-team.ru>,
        Andrea Arcangeli <aarcange@...hat.com>
Subject: [PATCH] arch, mm: introduce arch_tlb_gather_mmu_lazy (was: Re:
 [RESEND PATCH] mm, oom_reaper: gather each vma to prevent) leaking TLB entry

On Fri 10-11-17 11:15:29, Michal Hocko wrote:
> On Fri 10-11-17 09:19:33, Minchan Kim wrote:
> > On Tue, Nov 07, 2017 at 09:54:53AM +0000, Wang Nan wrote:
> > > tlb_gather_mmu(&tlb, mm, 0, -1) means gathering the whole virtual memory
> > > space. In this case, tlb->fullmm is true. Some archs like arm64 doesn't
> > > flush TLB when tlb->fullmm is true:
> > > 
> > >   commit 5a7862e83000 ("arm64: tlbflush: avoid flushing when fullmm == 1").
> > > 
> > > Which makes leaking of tlb entries.
> > 
> > That means soft-dirty which has used tlb_gather_mmu with fullmm could be
> > broken via losing write-protection bit once it supports arm64 in future?
> > 
> > If so, it would be better to use TASK_SIZE rather than -1 in tlb_gather_mmu.
> > Of course, it's a off-topic.
> 
> I wouldn't play tricks like that. And maybe the API itself could be more
> explicit. E.g. add a lazy parameter which would allow arch specific code
> to not flush if it is sure that nobody can actually stumble over missed
> flush. E.g. the following?

This one has a changelog and even compiles on my crosscompile test
---
>From 7f0fcd2cab379ddac5611b2a520cdca8a77a235b Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@...e.com>
Date: Fri, 10 Nov 2017 11:27:17 +0100
Subject: [PATCH] arch, mm: introduce arch_tlb_gather_mmu_lazy

5a7862e83000 ("arm64: tlbflush: avoid flushing when fullmm == 1") has
introduced an optimization to not flush tlb when we are tearing the
whole address space down. Will goes on to explain

: Basically, we tag each address space with an ASID (PCID on x86) which
: is resident in the TLB. This means we can elide TLB invalidation when
: pulling down a full mm because we won't ever assign that ASID to
: another mm without doing TLB invalidation elsewhere (which actually
: just nukes the whole TLB).

This all is nice but tlb_gather users are not aware of that and this can
actually cause some real problems. E.g. the oom_reaper tries to reap the
whole address space but it might race with threads accessing the memory [1].
It is possible that soft-dirty handling might suffer from the same
problem [2].

Introduce an explicit lazy variant tlb_gather_mmu_lazy which allows the
behavior arm64 implements for the fullmm case and replace it by an
explicit lazy flag in the mmu_gather structure. exit_mmap path is then
turned into the explicit lazy variant. Other architectures simply ignore
the flag.

[1] http://lkml.kernel.org/r/20171106033651.172368-1-wangnan0@huawei.com
[2] http://lkml.kernel.org/r/20171110001933.GA12421@bbox
Signed-off-by: Michal Hocko <mhocko@...e.com>
---
 arch/arm/include/asm/tlb.h   |  3 ++-
 arch/arm64/include/asm/tlb.h |  2 +-
 arch/ia64/include/asm/tlb.h  |  3 ++-
 arch/s390/include/asm/tlb.h  |  3 ++-
 arch/sh/include/asm/tlb.h    |  2 +-
 arch/um/include/asm/tlb.h    |  2 +-
 include/asm-generic/tlb.h    |  6 ++++--
 include/linux/mm_types.h     |  2 ++
 mm/memory.c                  | 17 +++++++++++++++--
 mm/mmap.c                    |  2 +-
 10 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index d5562f9ce600..fe9042aee8e9 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -149,7 +149,8 @@ static inline void tlb_flush_mmu(struct mmu_gather *tlb)
 
 static inline void
 arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-			unsigned long start, unsigned long end)
+			unsigned long start, unsigned long end,
+			bool lazy)
 {
 	tlb->mm = mm;
 	tlb->fullmm = !(start | (end+1));
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index ffdaea7954bb..7adde19b2bcc 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -43,7 +43,7 @@ static inline void tlb_flush(struct mmu_gather *tlb)
 	 * The ASID allocator will either invalidate the ASID or mark
 	 * it as used.
 	 */
-	if (tlb->fullmm)
+	if (tlb->lazy)
 		return;
 
 	/*
diff --git a/arch/ia64/include/asm/tlb.h b/arch/ia64/include/asm/tlb.h
index cbe5ac3699bf..50c440f5b7bc 100644
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -169,7 +169,8 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb)
 
 static inline void
 arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-			unsigned long start, unsigned long end)
+			unsigned long start, unsigned long end,
+			bool lazy)
 {
 	tlb->mm = mm;
 	tlb->max = ARRAY_SIZE(tlb->local);
diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
index 2eb8ff0d6fca..2310657b64c4 100644
--- a/arch/s390/include/asm/tlb.h
+++ b/arch/s390/include/asm/tlb.h
@@ -49,7 +49,8 @@ extern void tlb_remove_table(struct mmu_gather *tlb, void *table);
 
 static inline void
 arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-			unsigned long start, unsigned long end)
+			unsigned long start, unsigned long end,
+			bool lazy)
 {
 	tlb->mm = mm;
 	tlb->start = start;
diff --git a/arch/sh/include/asm/tlb.h b/arch/sh/include/asm/tlb.h
index 51a8bc967e75..ae4c50a7c1ec 100644
--- a/arch/sh/include/asm/tlb.h
+++ b/arch/sh/include/asm/tlb.h
@@ -37,7 +37,7 @@ static inline void init_tlb_gather(struct mmu_gather *tlb)
 
 static inline void
 arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-		unsigned long start, unsigned long end)
+		unsigned long start, unsigned long end, bool lazy)
 {
 	tlb->mm = mm;
 	tlb->start = start;
diff --git a/arch/um/include/asm/tlb.h b/arch/um/include/asm/tlb.h
index 344d95619d03..f24af66d07a4 100644
--- a/arch/um/include/asm/tlb.h
+++ b/arch/um/include/asm/tlb.h
@@ -46,7 +46,7 @@ static inline void init_tlb_gather(struct mmu_gather *tlb)
 
 static inline void
 arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-		unsigned long start, unsigned long end)
+		unsigned long start, unsigned long end, bool lazy)
 {
 	tlb->mm = mm;
 	tlb->start = start;
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index faddde44de8c..e6f0b8715e52 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -101,7 +101,8 @@ struct mmu_gather {
 	unsigned int		fullmm : 1,
 	/* we have performed an operation which
 	 * requires a complete flush of the tlb */
-				need_flush_all : 1;
+				need_flush_all : 1,
+				lazy : 1;
 
 	struct mmu_gather_batch *active;
 	struct mmu_gather_batch	local;
@@ -113,7 +114,8 @@ struct mmu_gather {
 #define HAVE_GENERIC_MMU_GATHER
 
 void arch_tlb_gather_mmu(struct mmu_gather *tlb,
-	struct mm_struct *mm, unsigned long start, unsigned long end);
+	struct mm_struct *mm, unsigned long start, unsigned long end,
+	bool lazy);
 void tlb_flush_mmu(struct mmu_gather *tlb);
 void arch_tlb_finish_mmu(struct mmu_gather *tlb,
 			 unsigned long start, unsigned long end, bool force);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 2a728317cba0..3208bea0356f 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -523,6 +523,8 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
 struct mmu_gather;
 extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
 				unsigned long start, unsigned long end);
+extern void tlb_gather_mmu_lazy(struct mmu_gather *tlb, struct mm_struct *mm,
+				unsigned long start, unsigned long end);
 extern void tlb_finish_mmu(struct mmu_gather *tlb,
 				unsigned long start, unsigned long end);
 
diff --git a/mm/memory.c b/mm/memory.c
index 590709e84a43..7dfdd4d8224f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -218,13 +218,15 @@ static bool tlb_next_batch(struct mmu_gather *tlb)
 }
 
 void arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
-				unsigned long start, unsigned long end)
+				unsigned long start, unsigned long end,
+				bool lazy)
 {
 	tlb->mm = mm;
 
 	/* Is it from 0 to ~0? */
 	tlb->fullmm     = !(start | (end+1));
 	tlb->need_flush_all = 0;
+	tlb->lazy	= lazy;
 	tlb->local.next = NULL;
 	tlb->local.nr   = 0;
 	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
@@ -408,7 +410,18 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table)
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
 			unsigned long start, unsigned long end)
 {
-	arch_tlb_gather_mmu(tlb, mm, start, end);
+	arch_tlb_gather_mmu(tlb, mm, start, end, false);
+	inc_tlb_flush_pending(tlb->mm);
+}
+
+/* tlb_gather_mmu_lazy
+ * 	Basically same as tlb_gather_mmu except it allows architectures to
+ * 	skip tlb flushing if they can ensure that nobody will reuse tlb entries
+ */
+void tlb_gather_mmu_lazy(struct mmu_gather *tlb, struct mm_struct *mm,
+			unsigned long start, unsigned long end)
+{
+	arch_tlb_gather_mmu(tlb, mm, start, end, true);
 	inc_tlb_flush_pending(tlb->mm);
 }
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 680506faceae..43594a6a2eac 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2997,7 +2997,7 @@ void exit_mmap(struct mm_struct *mm)
 
 	lru_add_drain();
 	flush_cache_mm(mm);
-	tlb_gather_mmu(&tlb, mm, 0, -1);
+	tlb_gather_mmu_lazy(&tlb, mm, 0, -1);
 	/* update_hiwater_rss(mm) here? but nobody should be looking */
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	unmap_vmas(&tlb, vma, 0, -1);
-- 
2.14.2
-- 
Michal Hocko
SUSE Labs