linux-kernel - 32bit architectures and __HAVE_ARCH_PTE_SWP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <ceb85a8b-d6e8-830f-eddb-69ae1531e10e@redhat.com>
Date:   Tue, 22 Nov 2022 15:05:24 +0100
From:   David Hildenbrand <david@...hat.com>
To:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>
Cc:     "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>
Subject: 32bit architectures and __HAVE_ARCH_PTE_SWP_EXCLUSIVE

Hi all,

Spoiler: is there a real use case for > 16 GiB of swap in a single file 
on 32bit architectures?

I'm currently looking into implementing __HAVE_ARCH_PTE_SWP_EXCLUSIVE 
support for all remaining architectures. So far, I only implemented it 
for the most relevant enterprise architectures.

With __HAVE_ARCH_PTE_SWP_EXCLUSIVE, we remember when unmapping a page 
and replacing the present PTE by a swap PTE for swapout whether the 
anonymous page that was mapped was exclusive (PageAnonExclusive(), i.e., 
not COW-shared). When refaulting that page, whereby we replace the swap 
PTE by a present PTE, we can reuse that information to map that page 
writable and avoid unnecessary page copies due to COW, even if there are 
still unexpected references on the page.

While this would usually be a pure optimization, currently O_DIRECT 
still (wrongly) uses FOLL_GET instead of FOLL_PIN and can trigger in 
corner cases memory corruptions. So for that case, it is also a 
temporary fix until O_DIRECT properly uses FOLL_PIN. More details can be 
found in [1].

Ideally, I'd just implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all 
architectures. However, __HAVE_ARCH_PTE_SWP_EXCLUSIVE requires an 
additional bit in the swap PTE. While mostly unproblematic on 64bit, for 
32bit this implies that we'll have to "steal" one bit from the swap 
offset on most architectures, reducing the maximum swap size per file.

Assuming we previously supported 32 GiB per swap file (e.g., hexagon, 
csky), this number would get reduced to 16 GiB. The kernel would 
automatically truncate the oversized swap area and the system would 
continue working by using less space of that swapfile, but ... well, is 
there a but?

Usually (well, there is PAE on x86 ...), a 32bit system can address 4 
GiB of memory. Maximum swap size recommendation seem to be around 2--3 
times the memory size (2x without hibernation, 3x with hibernation). So 
it sounds like there is barely a use case for more swap space. Of course 
one can use multiple swap files.

So, is anybody aware of excessive swap space requirements on 32bit?

Note that I thought about storing the exclusive marker in the swap_map 
instead of in the swap PTE, but quickly decided to discard that idea 
because it results in significantly more complexity and the swap code is 
already horrible enough.

[1] https://lkml.kernel.org/r/20220329164329.208407-1-david@redhat.com

-- 
Thanks,

David / dhildenb