lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZjJwos7KpvzhoK_f@FVFF77S0Q05N.cambridge.arm.com>
Date: Wed, 1 May 2024 17:41:06 +0100
From: Mark Rutland <mark.rutland@....com>
To: Puranjay Mohan <puranjay@...nel.org>
Cc: Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will@...nel.org>, Sumit Garg <sumit.garg@...aro.org>,
	Stephen Boyd <swboyd@...omium.org>,
	Douglas Anderson <dianders@...omium.org>,
	"Peter Zijlstra (Intel)" <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
	bpf@...r.kernel.org, puranjay12@...il.com,
	Ard Biesheuvel <ardb@...nel.org>
Subject: Re: [PATCH] arm64: implement raw_smp_processor_id() using thread_info

Hi Puranjay,

On Wed, May 01, 2024 at 03:42:36PM +0000, Puranjay Mohan wrote:
> ARM64 defines THREAD_INFO_IN_TASK which means the cpu id can be found
> from current_thread_info()->cpu.

Nice!

This is something that we'd wanted to do, but there were some historical
reasons that prevented that. I think it'd be worth describing that in the
commit message, e.g.

| Historically, arm64 implemented raw_smp_processor_id() as a read of
| current_thread_info()->cpu. This changed when arm64 moved thread_info into
| task struct, as at the time CONFIG_THREAD_INFO_IN_TASK made core code use
| thread_struct::cpu for the cpu number, and due to header dependencies
| prevented using this in raw_smp_processor_id(). As a workaround, we moved to
| using a percpu variable in commit:
|
|   57c82954e77fa12c ("arm64: make cpu number a percpu variable")
|
| Since then, thread_info::cpu was reintroduced, and core code was made to use
| this in commits:
|
|   001430c1910df65a ("arm64: add CPU field to struct thread_info")
|   bcf9033e5449bdca ("sched: move CPU field back into thread_info if THREAD_INFO_IN_TASK=y")
|
| Consequently it is possible to use current_thread_info()->cpu again.

> Implement raw_smp_processor_id() using the above. This decreases the
> number of emitted instructions like in the following example:
> 
> Dump of assembler code for function bpf_get_smp_processor_id:
>    0xffff8000802cd608 <+0>:     nop
>    0xffff8000802cd60c <+4>:     nop
>    0xffff8000802cd610 <+8>:     adrp    x0, 0xffff800082138000
>    0xffff8000802cd614 <+12>:    mrs     x1, tpidr_el1
>    0xffff8000802cd618 <+16>:    add     x0, x0, #0x8
>    0xffff8000802cd61c <+20>:    ldrsw   x0, [x0, x1]
>    0xffff8000802cd620 <+24>:    ret
> 
> After this patch:
> 
> Dump of assembler code for function bpf_get_smp_processor_id:
>    0xffff8000802c9130 <+0>:     nop
>    0xffff8000802c9134 <+4>:     nop
>    0xffff8000802c9138 <+8>:     mrs     x0, sp_el0
>    0xffff8000802c913c <+12>:    ldr     w0, [x0, #24]
>    0xffff8000802c9140 <+16>:    ret
> 
> A microbenchmark[1] was built to measure the performance improvement
> provided by this change. It calls the following function given number of
> times and finds the runtime overhead:
> 
> static noinline int get_cpu_id(void)
> {
> 	return smp_processor_id();
> }
> 
> Run the benchmark like:
>  modprobe smp_processor_id nr_function_calls=1000000000
> 
>       +--------------------------+------------------------+
>       |        | Number of Calls |    Time taken          |
>       +--------+-----------------+------------------------+
>       | Before |   1000000000    |   1602888401ns         |
>       +--------+-----------------+------------------------+
>       | After  |   1000000000    |   1206212658ns         |
>       +--------+-----------------+------------------------+
>       |  Difference (decrease)   |   396675743ns (24.74%) |
>       +---------------------------------------------------+
> 
> This improvement is in this very specific microbenchmark but it proves
> the point.
> 
> The percpu variable cpu_number is left as it is because it is used in
> set_smp_ipi_range()
> 
> [1] https://github.com/puranjaymohan/linux/commit/77d3fdd
> 
> Signed-off-by: Puranjay Mohan <puranjay@...nel.org>
> ---
>  arch/arm64/include/asm/smp.h | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
> index efb13112b408..88fd2ab805ec 100644
> --- a/arch/arm64/include/asm/smp.h
> +++ b/arch/arm64/include/asm/smp.h
> @@ -34,13 +34,9 @@
>  DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
>  
>  /*
> - * We don't use this_cpu_read(cpu_number) as that has implicit writes to
> - * preempt_count, and associated (compiler) barriers, that we'd like to avoid
> - * the expense of. If we're preemptible, the value can be stale at use anyway.
> - * And we can't use this_cpu_ptr() either, as that winds up recursing back
> - * here under CONFIG_DEBUG_PREEMPT=y.
> + * This relies on THREAD_INFO_IN_TASK, but arm64 defines that unconditionally.
>   */
> -#define raw_smp_processor_id() (*raw_cpu_ptr(&cpu_number))
> +#define raw_smp_processor_id() (current_thread_info()->cpu)

I think we can (and should) delete the comment entirely.

Mark.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ