lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <44082771-a35b-4e8d-b08a-bd8cd340c9f2@redhat.com>
Date: Thu, 2 Oct 2025 09:45:37 +0200
From: David Hildenbrand <david@...hat.com>
To: Brendan Jackman <jackmanb@...gle.com>, peterz@...radead.org,
 bp@...en8.de, dave.hansen@...ux.intel.com, mingo@...hat.com,
 tglx@...utronix.de
Cc: akpm@...ux-foundation.org, derkling@...gle.com, junaids@...gle.com,
 linux-kernel@...r.kernel.org, linux-mm@...ck.org, reijiw@...gle.com,
 rientjes@...gle.com, rppt@...nel.org, vbabka@...e.cz, x86@...nel.org,
 yosry.ahmed@...ux.dev, Patrick Roy <roypat@...zon.co.uk>,
 Zi Yan <ziy@...dia.com>
Subject: Re: [Discuss] First steps for ASI (ASI is fast again)

> I won't re-hash the details of the problem here (see [1]) but in short: file
> pages aren't mapped into the physmap as seen from ASI's restricted address space.
> This causes a major overhead when e.g. read()ing files. The solution we've
> always envisaged (and which I very hastily tried to describe at LSF/MM/BPF this
> year) was to simply stop read() etc from touching the physmap.
> 
> This is achieved in this prototype by a mechanism that I've called the "ephmap".
> The ephmap is a special region of the kernel address space that is local to the
> mm (much like the "proclocal" idea from 2019 [2]). Users of the ephmap API can
> allocate a subregion of this, and provide pages that get mapped into their
> subregion. These subregions are CPU-local. This means that it's cheap to tear
> these mappings down, so they can be removed immediately after use (eph =
> "ephemeral"), eliminating the need for complex/costly tracking data structures.
> 
> (You might notice the ephmap is extremely similar to kmap_local_page() - see the
> commit that introduces it ("x86: mm: Introduce the ephmap") for discussion).
> 
> The ephmap can then be used for accessing file pages. It's also a generic
> mechanism for accessing sensitive data, for example it could be used for
> zeroing sensitive pages, or if necessary for copy-on-write of user pages.
> 

At some point we discussed on how to make secretmem pages movable so we 
end up having less unmovable pages in the system.

Secretmem pages have their directmap removed once allocated, and 
restored once free (truncated from the page cache).

In order to migrate them we would have to temporarily map them, and we 
obviously don't want to temporarily map them into the directmap.

Maybe the ephmap could be user for that use case, too.

Another, similar use case, would be guest_memfd with a similar approach 
that secretmem took: removing the direct map. While guest_memfd does not 
support page migration yet, there are some prototypes that allow 
migrating pages for non-CoCo (IOW: ordinary) VMs.

Maybe using the ephmap could be used here too.


I guess an interesting question would be: which MM to use when we are 
migrating a page out of random context: memory offlining, page 
compaction, memory-failure, alloc_contig_pages, ...

[...]

> 
> Despite my title these numbers are kinda disappointing to be honest, it's not
> where I wanted to be by now,

"ASI is faster again" :)

> but it's still an order-of-magnitude better than
> where we were for native FIO a few months ago. I believe almost all of this
> remaining slowdown is due to unnecessary ASI exits, the key areas being:
> 
> - On every context_switch(). Google's internal implementation has fixed this (we
>    only really need it when switching mms).
> 
> - Whenever zeroing sensitive pages from the allocator. This could potentially be
>    solved with the ephmap but requires a bit of care to avoid opening CPU attack
>    windows.
> 
> - In copy-on-write for user pages. The ephmap could also help here but the
>    current implementation doesn't support it (it only allows one allocation at a
>    time per context).
> 

But only the first point would actually be relevant for the FIO 
benchmark I assume, right?

So how confident are you that this is really going to be solvable. Or to 
ask from another angle: long-term how much slowdown do you expect and 
target?

-- 
Cheers

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ