linux-kernel - Re: [RFC v2 00/35] optimize cost of inter-process communication

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250530154250.15caab4e3991de779aabe02c@linux-foundation.org>
Date: Fri, 30 May 2025 15:42:50 -0700
From: Andrew Morton <akpm@...ux-foundation.org>
To: Bo Li <libo.gcs85@...edance.com>
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com, x86@...nel.org, luto@...nel.org,
 kees@...nel.org, david@...hat.com, juri.lelli@...hat.com,
 vincent.guittot@...aro.org, peterz@...radead.org, dietmar.eggemann@....com,
 hpa@...or.com, acme@...nel.org, namhyung@...nel.org, mark.rutland@....com,
 alexander.shishkin@...ux.intel.com, jolsa@...nel.org, irogers@...gle.com,
 adrian.hunter@...el.com, kan.liang@...ux.intel.com,
 viro@...iv.linux.org.uk, brauner@...nel.org, jack@...e.cz,
 lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, vbabka@...e.cz,
 rppt@...nel.org, surenb@...gle.com, mhocko@...e.com, rostedt@...dmis.org,
 bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, jannh@...gle.com,
 pfalcato@...e.de, riel@...riel.com, harry.yoo@...cle.com,
 linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
 linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
 duanxiongchun@...edance.com, yinhongbo@...edance.com,
 dengliang.1214@...edance.com, xieyongji@...edance.com,
 chaiwen.cc@...edance.com, songmuchun@...edance.com, yuanzhu@...edance.com,
 chengguozhu@...edance.com, sunjiadong.lff@...edance.com
Subject: Re: [RFC v2 00/35] optimize cost of inter-process communication

On Fri, 30 May 2025 17:27:28 +0800 Bo Li <libo.gcs85@...edance.com> wrote:

> During testing, the client transmitted 1 million 32-byte messages, and we
> computed the per-message average latency. The results are as follows:
> 
> *****************
> Without RPAL: Message length: 32 bytes, Total TSC cycles: 19616222534,
>  Message count: 1000000, Average latency: 19616 cycles
> With RPAL: Message length: 32 bytes, Total TSC cycles: 1703459326,
>  Message count: 1000000, Average latency: 1703 cycles
> *****************
> 
> These results confirm that RPAL delivers substantial latency improvements
> over the current epoll implementation—achieving a 17,913-cycle reduction
> (an ~91.3% improvement) for 32-byte messages.

Noted ;)

Quick question:

>  arch/x86/Kbuild                               |    2 +
>  arch/x86/Kconfig                              |    2 +
>  arch/x86/entry/entry_64.S                     |  160 ++
>  arch/x86/events/amd/core.c                    |   14 +
>  arch/x86/include/asm/pgtable.h                |   25 +
>  arch/x86/include/asm/pgtable_types.h          |   11 +
>  arch/x86/include/asm/tlbflush.h               |   10 +
>  arch/x86/kernel/asm-offsets.c                 |    3 +
>  arch/x86/kernel/cpu/common.c                  |    8 +-
>  arch/x86/kernel/fpu/core.c                    |    8 +-
>  arch/x86/kernel/nmi.c                         |   20 +
>  arch/x86/kernel/process.c                     |   25 +-
>  arch/x86/kernel/process_64.c                  |  118 +
>  arch/x86/mm/fault.c                           |  271 ++
>  arch/x86/mm/mmap.c                            |   10 +
>  arch/x86/mm/tlb.c                             |  172 ++
>  arch/x86/rpal/Kconfig                         |   21 +
>  arch/x86/rpal/Makefile                        |    6 +
>  arch/x86/rpal/core.c                          |  477 ++++
>  arch/x86/rpal/internal.h                      |   69 +
>  arch/x86/rpal/mm.c                            |  426 +++
>  arch/x86/rpal/pku.c                           |  196 ++
>  arch/x86/rpal/proc.c                          |  279 ++
>  arch/x86/rpal/service.c                       |  776 ++++++
>  arch/x86/rpal/thread.c                        |  313 +++

The changes are very x86-heavy.  Is that a necessary thing?  Would
another architecture need to implement a similar amount to enable RPAL?
IOW, how much of the above could be made arch-neutral?