linux-kernel - Re: [PATCH] arm64: implement raw_smp_processor_id() using thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <76b26dc9-af60-ee7f-6be5-dc17937b4a51@gentwo.org>
Date: Wed, 1 May 2024 09:23:54 -0700 (PDT)
From: "Christoph Lameter (Ampere)" <cl@...two.org>
To: Puranjay Mohan <puranjay@...nel.org>
cc: Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>, 
    Sumit Garg <sumit.garg@...aro.org>, Stephen Boyd <swboyd@...omium.org>, 
    Douglas Anderson <dianders@...omium.org>, 
    "Peter Zijlstra (Intel)" <peterz@...radead.org>, 
    Thomas Gleixner <tglx@...utronix.de>, Mark Rutland <mark.rutland@....com>, 
    linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org, 
    bpf@...r.kernel.org, puranjay12@...il.com, 
    Russell King <linux@...linux.org.uk>
Subject: Re: [PATCH] arm64: implement raw_smp_processor_id() using
 thread_info

On Wed, 1 May 2024, Puranjay Mohan wrote:

> Dump of assembler code for function bpf_get_smp_processor_id:
>   0xffff8000802cd608 <+0>:     nop
>   0xffff8000802cd60c <+4>:     nop
>   0xffff8000802cd610 <+8>:     adrp    x0, 0xffff800082138000
>   0xffff8000802cd614 <+12>:    mrs     x1, tpidr_el1
>   0xffff8000802cd618 <+16>:    add     x0, x0, #0x8
>   0xffff8000802cd61c <+20>:    ldrsw   x0, [x0, x1]
>   0xffff8000802cd620 <+24>:    ret

In general arm64 has inefficient per cpu variable access. On x86 it is 
possible to access the processor id via a segment register relative 
access with a single instruction.

Arm64 calculates the address of a percpu variable for each access. This 
result in inefficiencies because:

1. The address calculation is processor specific. Therefore preemption 
needs to be disabled during the calculation of the address and while it is 
in use.

2. Additional registers are used causing the compiler to potentially 
generate less efficient code.

3. Even RMV instructions on percpu variables require the disabling of 
preemption due to the address calculation.

Russel King has a patchset for NUMA text replication and as part of that 
he introduces per cpu kernel page tables.

https://lwn.net/Articles/957023/

If we had per cpu page tables then we could create a mapping for a fixed 
address virtual memory range to the physical per cpu area for each cpu.

With that the address calculation would no longer be necessary for per 
cpu variable access and workarounds like this would not be necessary 
anymore.

The retrieval of the cpu id would be a single instruction that 
performs a load from a fixed virtual address. No preemption etc would be 
required.