linux-kernel - Re: [PATCH 1/2] powerpc/locking: implement this_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87jzpldl1l.fsf@mail.lhotse>
Date:   Mon, 11 Dec 2023 22:40:38 +1100
From:   Michael Ellerman <mpe@...erman.id.au>
To:     Luming Yu <luming.yu@...ngroup.cn>, linuxppc-dev@...ts.ozlabs.org,
        linux-kernel@...r.kernel.org, npiggin@...il.com,
        christophe.leroy@...roup.eu
Cc:     luming.yu@...il.com, ke.zhao@...ngroup.cn, dawei.li@...ngroup.cn,
        shenghui.qu@...ngroup.cn, Luming Yu <luming.yu@...ngroup.cn>
Subject: Re: [PATCH 1/2] powerpc/locking: implement this_cpu_cmpxchg local API

Hi Luming Yu,

Luming Yu <luming.yu@...ngroup.cn> writes:
> ppc appears to have already supported cmpxchg-local atomic semantics
> that is defined by the kernel convention of the feature.
> Add this_cpu_cmpxchg ppc local for the performance benefit of arch
> sepcific implementation than asm-generic c verison of the locking API.

Implementing this has been suggested before but it wasn't clear that it
was actually going to perform better than the generic version.

On 64-bit we have interrupt soft masking, so disabling interrupts is
relatively cheap. So the generic this_cpu_cmpxhg using irq disable just
becomes a few loads & stores, no atomic ops required.

In contrast implementing it using __cmpxchg_local() will use
ldarx/stdcx etc. which will be more expensive.

Have you done any performance measurements?

It probably is a bit fishy that we don't mask PMU interrupts vs
this_cpu_cmpxchg() etc., but I don't think it's actually a bug given the
few places using this_cpu_cmpxchg(). Though I could be wrong about that.

> diff --git a/arch/powerpc/include/asm/percpu.h b/arch/powerpc/include/asm/percpu.h
> index 8e5b7d0b851c..ceab5df6e7ab 100644
> --- a/arch/powerpc/include/asm/percpu.h
> +++ b/arch/powerpc/include/asm/percpu.h
> @@ -18,5 +18,22 @@
>  #include <asm-generic/percpu.h>
>  
>  #include <asm/paca.h>
> +#include <asm/cmpxchg.h>
> +#ifdef this_cpu_cmpxchg_1
> +#undef this_cpu_cmpxchg_1

If we need to undef then I think something has gone wrong with the
header inclusion order, the model should be that the arch defines what
it has and the generic code provides fallbacks if the arch didn't define
anything.

> +#define this_cpu_cmpxchg_1(pcp, oval, nval)	__cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 1)

I think this is unsafe vs preemption. The raw_cpu_ptr() can generate the
per-cpu address, but then the task can be preempted and moved to a
different CPU before the ldarx/stdcx do the cmpxchg.

The arm64 implementation had the same bug until they fixed it.

> +#endif 
> +#ifdef this_cpu_cmpxchg_2
> +#undef this_cpu_cmpxchg_2
> +#define this_cpu_cmpxchg_2(pcp, oval, nval)	__cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 2)
> +#endif
> +#ifdef this_cpu_cmpxchg_4
> +#undef this_cpu_cmpxchg_4
> +#define this_cpu_cmpxchg_4(pcp, oval, nval)	__cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 4)
> +#endif
> +#ifdef this_cpu_cmpxchg_8
> +#undef this_cpu_cmpxchg_8
> +#define this_cpu_cmpxchg_8(pcp, oval, nval)	__cmpxchg_local(raw_cpu_ptr(&(pcp)), oval, nval, 8)
> +#endif
>  
>  #endif /* _ASM_POWERPC_PERCPU_H_ */

cheers