lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <081775ba-276f-4bbd-a18a-175cf1f217e9@redhat.com>
Date: Fri, 30 May 2025 11:56:51 +0200
From: David Hildenbrand <david@...hat.com>
To: Bo Li <libo.gcs85@...edance.com>, tglx@...utronix.de, mingo@...hat.com,
 bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org, luto@...nel.org,
 kees@...nel.org, akpm@...ux-foundation.org, juri.lelli@...hat.com,
 vincent.guittot@...aro.org, peterz@...radead.org
Cc: dietmar.eggemann@....com, hpa@...or.com, acme@...nel.org,
 namhyung@...nel.org, mark.rutland@....com,
 alexander.shishkin@...ux.intel.com, jolsa@...nel.org, irogers@...gle.com,
 adrian.hunter@...el.com, kan.liang@...ux.intel.com, viro@...iv.linux.org.uk,
 brauner@...nel.org, jack@...e.cz, lorenzo.stoakes@...cle.com,
 Liam.Howlett@...cle.com, vbabka@...e.cz, rppt@...nel.org, surenb@...gle.com,
 mhocko@...e.com, rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
 vschneid@...hat.com, jannh@...gle.com, pfalcato@...e.de, riel@...riel.com,
 harry.yoo@...cle.com, linux-kernel@...r.kernel.org,
 linux-perf-users@...r.kernel.org, linux-fsdevel@...r.kernel.org,
 linux-mm@...ck.org, duanxiongchun@...edance.com, yinhongbo@...edance.com,
 dengliang.1214@...edance.com, xieyongji@...edance.com,
 chaiwen.cc@...edance.com, songmuchun@...edance.com, yuanzhu@...edance.com,
 chengguozhu@...edance.com, sunjiadong.lff@...edance.com
Subject: Re: [RFC v2 00/35] optimize cost of inter-process communication


> 
> ## Address space sharing
> 
> For address space sharing, RPAL partitions the entire userspace virtual
> address space and allocates non-overlapping memory ranges to each process.
> On x86_64 architectures, RPAL uses a memory range size covered by a
> single PUD (Page Upper Directory) entry, which is 512GB. This restricts
> each process’s virtual address space to 512GB on x86_64, sufficient for
> most applications in our scenario. The rationale is straightforward:
> address space sharing can be simply achieved by copying the PUD from one
> process’s page table to another’s. So one process can directly use the
> data pointer to access another's memory.
> 
> 
>   |------------| <- 0
>   |------------| <- 512 GB
>   |  Process A |
>   |------------| <- 2*512 GB
>   |------------| <- n*512 GB
>   |  Process B |
>   |------------| <- (n+1)*512 GB
>   |------------| <- STACK_TOP
>   |  Kernel    |
>   |------------|

Oh my.

It reminds me a bit about mshare -- just that mshare tries to do it in a 
less hacky way..

> 
> ## RPAL call
> 
> We refer to the lightweight userspace context switching mechanism as RPAL
> call. It enables the caller (or sender) thread of one process to directly
> switch to the callee (or receiver) thread of another process.
> 
> When Process A’s caller thread initiates an RPAL call to Process B’s
> callee thread, the CPU saves the caller’s context and loads the callee’s
> context. This enables direct userspace control flow transfer from the
> caller to the callee. After the callee finishes data processing, the CPU
> saves Process B’s callee context and switches back to Process A’s caller
> context, completing a full IPC cycle.
> 
> 
>   |------------|                |---------------------|
>   |  Process A |                |  Process B          |
>   | |-------|  |                | |-------|           |
>   | | caller| --- RPAL call --> | | callee|    handle |
>   | | thread| <------------------ | thread| -> event  |
>   | |-------|  |                | |-------|           |
>   |------------|                |---------------------|
> 
> # Security and compatibility with kernel subsystems
> 
> ## Memory protection between processes
> 
> Since processes using RPAL share the address space, unintended
> cross-process memory access may occur and corrupt the data of another
> process. To mitigate this, we leverage Memory Protection Keys (MPK) on x86
> architectures.
> 
> MPK assigns 4 bits in each page table entry to a "protection key", which
> is paired with a userspace register (PKRU). The PKRU register defines
> access permissions for memory regions protected by specific keys (for
> detailed implementation, refer to the kernel documentation "Memory
> Protection Keys"). With MPK, even though the address space is shared
> among processes, cross-process access is restricted: a process can only
> access the memory protected by a key if its PKRU register is configured
> with the corresponding permission. This ensures that processes cannot
> access each other’s memory unless an explicit PKRU configuration is set.
> 
> ## Page fault handling and TLB flushing
> 
> Due to the shared address space architecture, both page fault handling and
> TLB flushing require careful consideration. For instance, when Process A
> accesses Process B’s memory, a page fault may occur in Process A's
> context, but the faulting address belongs to Process B. In this case, we
> must pass Process B's mm_struct to the page fault handler.

In an mshare region, all faults would be rerouted to the mshare MM 
either way.

> 
> TLB flushing is more complex. When a thread flushes the TLB, since the
> address space is shared, not only other threads in the current process but
> also other processes that share the address space may access the
> corresponding memory (related to the TLB flush). Therefore, the cpuset used
> for TLB flushing should be the union of the mm_cpumasks of all processes
> that share the address space.

Oh my.

It all reminds me of mshare, just the context switch handling is 
different (and significantly ... more problematic).

Maybe something could be built on top of mshare, but I'm afraid the real 
magic is the address space sharing combined with the context switching 
... which sounds like a big can of worms.

So in the current form, I understand all the NACKs.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ