lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240813-fix_fencei_optimization-v1-2-2aadc2cdde95@rivosinc.com>
Date: Tue, 13 Aug 2024 16:02:18 -0700
From: Charlie Jenkins <charlie@...osinc.com>
To: Paul Walmsley <paul.walmsley@...ive.com>, 
 Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>, 
 Alexandre Ghiti <alexghiti@...osinc.com>, Atish Patra <atishp@...osinc.com>, 
 Samuel Holland <samuel.holland@...ive.com>, 
 Andrea Parri <parri.andrea@...il.com>
Cc: Palmer Dabbelt <palmer@...osinc.com>, linux-riscv@...ts.infradead.org, 
 linux-kernel@...r.kernel.org, Charlie Jenkins <charlie@...osinc.com>
Subject: [PATCH 2/2] riscv: Eagerly flush in flush_icache_deferred()

It is possible for mm->context.icache_stale_mask to be set between
switch_mm() and switch_to() when there are two threads on different CPUs
executing in the same mm:

<---- Thread 1 starts running here on CPU1

<---- Thread 2 starts running here with same mm

T2: prctl(PR_RISCV_SET_ICACHE_FLUSH_CTX, PR_RISCV_CTX_SW_FENCEI_ON, PR_RISCV_SCOPE_PER_PROCESS);
T2: <-- kernel sets current->mm->context.force_icache_flush to true
T2: <modification of instructions>
T2: fence.i

T1: fence.i (to synchronize with other thread, has some logic to
             determine when to do this)
T1: <-- thread 1 is preempted
T1: <-- thread 1 is placed onto CPU3 and starts context switch sequence
T1 (kernel): flush_icache_deferred() -> skips flush because switch_to_should_flush_icache() returns true
				     -> thread has migrated and task->mm->context.force_icache_flush is true

T2: prctl(PR_RISCV_SET_ICACHE_FLUSH_CTX, PR_RISCV_CTX_SW_FENCEI_OFF, PR_RISCV_SCOPE_PER_PROCESS);
T2 (kernel): kernel sets current->mm->context.force_icache_flush = false

T1 (kernel): switch_to() calls switch_to_should_flush_icache() which now
	     returns false because task->mm->context.force_icache_flush
	     is false due to the other thread emitting
	     PR_RISCV_CTX_SW_FENCEI_OFF.
T1 (back in userspace): Instruction cache was never flushed on context
			switch to CPU3, and thus may execute incorrect
			instructions.

This commit fixes this issue by also flushing in switch_to() when
task->mm->context.icache_stale_mask is set, and always flushing in
flush_icache_deferred().

It is possible for switch_mm() to be called without being followed by
switch_to() such as with the riscv efi driver in efi_virtmap_load(), so
we cannot do all of the flushing in switch_to().

To avoid flushing multiple times in a single context switch, clear
mm->context.icache_stale_mask in switch_mm() and only flush in
switch_to() if it was set again in the meantime. Set icache_stale_mask
when handling prctl and in switch_to() if
task->mm->context.force_icache_flush is set to prepare for next context
switch.

Signed-off-by: Charlie Jenkins <charlie@...osinc.com>
Fixes: 6b9391b581fd ("riscv: Include riscv_set_icache_flush_ctx prctl")
---
 arch/riscv/include/asm/switch_to.h | 19 ++++++++++++++++---
 arch/riscv/mm/cacheflush.c         |  1 +
 arch/riscv/mm/context.c            |  6 +-----
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 7594df37cc9f..5701cfcf2b68 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -76,11 +76,24 @@ extern struct task_struct *__switch_to(struct task_struct *,
 static inline bool switch_to_should_flush_icache(struct task_struct *task)
 {
 #ifdef CONFIG_SMP
-	bool stale_mm = task->mm && task->mm->context.force_icache_flush;
-	bool stale_thread = task->thread.force_icache_flush;
+	bool stale_mm = false;
 	bool thread_migrated = smp_processor_id() != task->thread.prev_cpu;
+	bool stale_thread = thread_migrated && task->thread.force_icache_flush;
+
+	if (task->mm) {
+		/*
+		 * The mm is only stale if the respective CPU bit in
+		 * icache_stale_mask is set. force_icache_flush indicates that
+		 * icache_stale_mask should be set again before returning to userspace.
+		 */
+		stale_mm = cpumask_test_cpu(smp_processor_id(),
+					    &task->mm->context.icache_stale_mask);
+		cpumask_assign_cpu(smp_processor_id(),
+				   &task->mm->context.icache_stale_mask,
+				   task->mm->context.force_icache_flush);
+	}
 
-	return thread_migrated && (stale_mm || stale_thread);
+	return stale_mm || stale_thread;
 #else
 	return false;
 #endif
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index b81672729887..c4a9ac2ad7ab 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -230,6 +230,7 @@ int riscv_set_icache_flush_ctx(unsigned long ctx, unsigned long scope)
 		switch (scope) {
 		case PR_RISCV_SCOPE_PER_PROCESS:
 			current->mm->context.force_icache_flush = true;
+			cpumask_setall(&current->mm->context.icache_stale_mask);
 			break;
 		case PR_RISCV_SCOPE_PER_THREAD:
 			current->thread.force_icache_flush = true;
diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
index 4abe3de23225..9e82e2a99441 100644
--- a/arch/riscv/mm/context.c
+++ b/arch/riscv/mm/context.c
@@ -306,11 +306,7 @@ static inline void flush_icache_deferred(struct mm_struct *mm, unsigned int cpu,
 		 */
 		smp_mb();
 
-		/*
-		 * If cache will be flushed in switch_to, no need to flush here.
-		 */
-		if (!(task && switch_to_should_flush_icache(task)))
-			local_flush_icache_all();
+		local_flush_icache_all();
 	}
 #endif
 }

-- 
2.45.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ