lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 10 Apr 2024 13:18:45 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>,
 Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
 "H . Peter Anvin" <hpa@...or.com>
Cc: linux-kernel@...r.kernel.org, "levi . yun" <yeoreum.yun@....com>,
 stable@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Juri Lelli <juri.lelli@...hat.com>,
 Dietmar Eggemann <dietmar.eggemann@....com>, Ben Segall
 <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
 Daniel Bristot de Oliveira <bristot@...hat.com>,
 Valentin Schneider <vschneid@...hat.com>,
 Catalin Marinas <catalin.marinas@....com>,
 Mark Rutland <mark.rutland@....com>, Will Deacon <will@...nel.org>,
 Aaron Lu <aaron.lu@...el.com>
Subject: Re: [PATCH] sched: Add missing memory barrier in switch_mm_cid

This fix has received an Acked-by from ARM64 maintainer Catalin Marinas. [1]

I'm CCing x86 maintainers whom are not also scheduler maintainers
as well so they can give their input.

This is still waiting for feedback from scheduler maintainers.

[1] https://lore.kernel.org/lkml/ZhUVpwwqKxWKgU0Q@arm.com/

On 2024-03-08 10:07, Mathieu Desnoyers wrote:
> Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb()
> which the core scheduler code has depended upon since commit:
> 
>      commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid")
> 
> If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can
> unset the actively used cid when it fails to observe active task after it
> sets lazy_put.
> 
> There *is* a memory barrier between storing to rq->curr and _return to
> userspace_ (as required by membarrier), but the rseq mm_cid has stricter
> requirements: the barrier needs to be issued between store to rq->curr
> and switch_mm_cid(), which happens earlier than:
> 
> - spin_unlock(),
> - switch_to().
> 
> So it's fine when the architecture switch_mm happens to have that barrier
> already, but less so when the architecture only provides the full barrier
> in switch_to() or spin_unlock().
> 
> It is a bug in the rseq switch_mm_cid() implementation. All architectures
> that don't have memory barriers in switch_mm(), but rather have the full
> barrier either in finish_lock_switch() or switch_to() have them too late
> for the needs of switch_mm_cid().
> 
> Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the
> generic barrier.h header, and use it in switch_mm_cid() for scheduler
> transitions where switch_mm() is expected to provide a memory barrier.
> 
> Architectures can override smp_mb__after_switch_mm() if their
> switch_mm() implementation provides an implicit memory barrier.
> Override it with a no-op on x86 which implicitly provide this memory
> barrier by writing to CR3.
> 
> Link: https://lore.kernel.org/lkml/20240305145335.2696125-1-yeoreum.yun@arm.com/
> Reported-by: levi.yun <yeoreum.yun@....com>
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
> Cc: <stable@...r.kernel.org> # 6.4.x
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Vincent Guittot <vincent.guittot@...aro.org>
> Cc: Juri Lelli <juri.lelli@...hat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@....com>
> Cc: Ben Segall <bsegall@...gle.com>
> Cc: Mel Gorman <mgorman@...e.de>
> Cc: Daniel Bristot de Oliveira <bristot@...hat.com>
> Cc: Valentin Schneider <vschneid@...hat.com>
> Cc: levi.yun <yeoreum.yun@....com>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> Cc: Catalin Marinas <catalin.marinas@....com>
> Cc: Mark Rutland <mark.rutland@....com>
> Cc: Will Deacon <will@...nel.org>
> Cc: Aaron Lu <aaron.lu@...el.com>
> ---
>   arch/x86/include/asm/barrier.h |  3 +++
>   include/asm-generic/barrier.h  |  8 ++++++++
>   kernel/sched/sched.h           | 20 ++++++++++++++------
>   3 files changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 35389b2af88e..0d5e54201eb2 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -79,6 +79,9 @@ do {									\
>   #define __smp_mb__before_atomic()	do { } while (0)
>   #define __smp_mb__after_atomic()	do { } while (0)
>   
> +/* Writing to CR3 provides a full memory barrier in switch_mm(). */
> +#define smp_mb__after_switch_mm()	do { } while (0)
> +
>   #include <asm-generic/barrier.h>
>   
>   /*
> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
> index 961f4d88f9ef..5a6c94d7a598 100644
> --- a/include/asm-generic/barrier.h
> +++ b/include/asm-generic/barrier.h
> @@ -296,5 +296,13 @@ do {									\
>   #define io_stop_wc() do { } while (0)
>   #endif
>   
> +/*
> + * Architectures that guarantee an implicit smp_mb() in switch_mm()
> + * can override smp_mb__after_switch_mm.
> + */
> +#ifndef smp_mb__after_switch_mm
> +#define smp_mb__after_switch_mm()	smp_mb()
> +#endif
> +
>   #endif /* !__ASSEMBLY__ */
>   #endif /* __ASM_GENERIC_BARRIER_H */
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 2e5a95486a42..044d842c696c 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -79,6 +79,8 @@
>   # include <asm/paravirt_api_clock.h>
>   #endif
>   
> +#include <asm/barrier.h>
> +
>   #include "cpupri.h"
>   #include "cpudeadline.h"
>   
> @@ -3481,13 +3483,19 @@ static inline void switch_mm_cid(struct rq *rq,
>   		 * between rq->curr store and load of {prev,next}->mm->pcpu_cid[cpu].
>   		 * Provide it here.
>   		 */
> -		if (!prev->mm)                          // from kernel
> +		if (!prev->mm) {                        // from kernel
>   			smp_mb();
> -		/*
> -		 * user -> user transition guarantees a memory barrier through
> -		 * switch_mm() when current->mm changes. If current->mm is
> -		 * unchanged, no barrier is needed.
> -		 */
> +		} else {				// from user
> +			/*
> +			 * user -> user transition relies on an implicit
> +			 * memory barrier in switch_mm() when
> +			 * current->mm changes. If the architecture
> +			 * switch_mm() does not have an implicit memory
> +			 * barrier, it is emitted here.  If current->mm
> +			 * is unchanged, no barrier is needed.
> +			 */
> +			smp_mb__after_switch_mm();
> +		}
>   	}
>   	if (prev->mm_cid_active) {
>   		mm_cid_snapshot_time(rq, prev->mm);

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ