[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <mhng-7775ab3c-847e-48f0-adaa-590285b8bbca@palmerdabbelt-glaptop>
Date: Tue, 25 May 2021 22:30:12 -0700 (PDT)
From: Palmer Dabbelt <palmer@...belt.com>
To: jszhang3@...l.ustc.edu.cn
CC: Paul Walmsley <paul.walmsley@...ive.com>, aou@...s.berkeley.edu,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] riscv: Optimize switch_mm by passing "cpu" to flush_icache_deferred()
On Tue, 11 May 2021 10:42:31 PDT (-0700), jszhang3@...l.ustc.edu.cn wrote:
> From: Jisheng Zhang <jszhang@...nel.org>
>
> Directly passing the cpu to flush_icache_deferred() rather than calling
> smp_processor_id() again.
>
> Here are some performance numbers:
>
> With a run of hackbench 30 times on a single core riscv64 Qemu instance
> with 1GB memory:
>
> without this patch: mean 36.934
> with this patch: mean 36.104 (improved by 2.24%)
I don't really put any stock in QEMU performance numbers for this sort
of thing, but for something like this where we're just skipping an
expensive call I don't really see a reason to even need performance
numbers at all as we can just consider it an obvious optimization.
> Signed-off-by: Jisheng Zhang <jszhang@...nel.org>
> ---
> arch/riscv/mm/context.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/mm/context.c b/arch/riscv/mm/context.c
> index 68aa312fc352..6d445f2888ec 100644
> --- a/arch/riscv/mm/context.c
> +++ b/arch/riscv/mm/context.c
> @@ -281,10 +281,9 @@ static inline void set_mm(struct mm_struct *mm, unsigned int cpu)
> * actually performs that local instruction cache flush, which implicitly only
> * refers to the current hart.
> */
> -static inline void flush_icache_deferred(struct mm_struct *mm)
> +static inline void flush_icache_deferred(struct mm_struct *mm, unsigned int cpu)
> {
> #ifdef CONFIG_SMP
> - unsigned int cpu = smp_processor_id();
> cpumask_t *mask = &mm->context.icache_stale_mask;
>
> if (cpumask_test_cpu(cpu, mask)) {
This proceeds to perform only a local icache flush, which means it will
break if any callers use a different CPU number. That large comment at
the top alludes to this behavior, but I went ahead and added a line to
make that more explicit.
> @@ -320,5 +319,5 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
>
> set_mm(next, cpu);
>
> - flush_icache_deferred(next);
> + flush_icache_deferred(next, cpu);
> }
Thanks, this is on for-next.
Powered by blists - more mailing lists