[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87jzpldl1l.fsf@mail.lhotse>
Date: Mon, 11 Dec 2023 22:40:38 +1100
From: Michael Ellerman <mpe@...erman.id.au>
To: Luming Yu <luming.yu@...ngroup.cn>, linuxppc-dev@...ts.ozlabs.org,
linux-kernel@...r.kernel.org, npiggin@...il.com,
christophe.leroy@...roup.eu
Cc: luming.yu@...il.com, ke.zhao@...ngroup.cn, dawei.li@...ngroup.cn,
shenghui.qu@...ngroup.cn, Luming Yu <luming.yu@...ngroup.cn>
Subject: Re: [PATCH 1/2] powerpc/locking: implement this_cpu_cmpxchg local API
Hi Luming Yu,
Luming Yu <luming.yu@...ngroup.cn> writes:
> ppc appears to have already supported cmpxchg-local atomic semantics
> that is defined by the kernel convention of the feature.
> Add this_cpu_cmpxchg ppc local for the performance benefit of arch
> sepcific implementation than asm-generic c verison of the locking API.
Implementing this has been suggested before but it wasn't clear that it
was actually going to perform better than the generic version.
On 64-bit we have interrupt soft masking, so disabling interrupts is
relatively cheap. So the generic this_cpu_cmpxhg using irq disable just
becomes a few loads & stores, no atomic ops required.
In contrast implementing it using __cmpxchg_local() will use
ldarx/stdcx etc. which will be more expensive.
Have you done any performance measurements?
It probably is a bit fishy that we don't mask PMU interrupts vs
this_cpu_cmpxchg() etc., but I don't think it's actually a bug given the
few places using this_cpu_cmpxchg(). Though I could be wrong about that.
> diff --git a/arch/powerpc/include/asm/percpu.h b/arch/powerpc/include/asm/percpu.h
> index 8e5b7d0b851c..ceab5df6e7ab 100644
> --- a/arch/powerpc/include/asm/percpu.h
> +++ b/arch/powerpc/include/asm/percpu.h
> @@ -18,5 +18,22 @@
> #include <asm-generic/percpu.h>
>
> #include <asm/paca.h>
> +#include <asm/cmpxchg.h>
> +#ifdef this_cpu_cmpxchg_1
> +#undef this_cpu_cmpxchg_1
If we need to undef then I think something has gone wrong with the
header inclusion order, the model should be that the arch defines what
it has and the generic code provides fallbacks if the arch didn't define
anything.
> +#define this_cpu_cmpxchg_1(pcp, oval, nval) __cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 1)
I think this is unsafe vs preemption. The raw_cpu_ptr() can generate the
per-cpu address, but then the task can be preempted and moved to a
different CPU before the ldarx/stdcx do the cmpxchg.
The arm64 implementation had the same bug until they fixed it.
> +#endif
> +#ifdef this_cpu_cmpxchg_2
> +#undef this_cpu_cmpxchg_2
> +#define this_cpu_cmpxchg_2(pcp, oval, nval) __cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 2)
> +#endif
> +#ifdef this_cpu_cmpxchg_4
> +#undef this_cpu_cmpxchg_4
> +#define this_cpu_cmpxchg_4(pcp, oval, nval) __cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 4)
> +#endif
> +#ifdef this_cpu_cmpxchg_8
> +#undef this_cpu_cmpxchg_8
> +#define this_cpu_cmpxchg_8(pcp, oval, nval) __cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 8)
> +#endif
>
> #endif /* _ASM_POWERPC_PERCPU_H_ */
cheers
Powered by blists - more mailing lists