[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6e5d24de6a6661f83442741f6be8daf691a05a20.camel@intel.com>
Date: Thu, 18 Sep 2025 17:31:26 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "yang@...amperecomputing.com" <yang@...amperecomputing.com>,
"kevin.brodsky@....com" <kevin.brodsky@....com>,
"linux-hardening@...r.kernel.org" <linux-hardening@...r.kernel.org>
CC: "maz@...nel.org" <maz@...nel.org>, "luto@...nel.org" <luto@...nel.org>,
"willy@...radead.org" <willy@...radead.org>, "mbland@...orola.com"
<mbland@...orola.com>, "david@...hat.com" <david@...hat.com>,
"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"rppt@...nel.org" <rppt@...nel.org>, "joey.gouly@....com"
<joey.gouly@....com>, "akpm@...ux-foundation.org"
<akpm@...ux-foundation.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "catalin.marinas@....com"
<catalin.marinas@....com>, "Weiny, Ira" <ira.weiny@...el.com>,
"vbabka@...e.cz" <vbabka@...e.cz>, "pierre.langlois@....com"
<pierre.langlois@....com>, "jeffxu@...omium.org" <jeffxu@...omium.org>,
"linus.walleij@...aro.org" <linus.walleij@...aro.org>,
"lorenzo.stoakes@...cle.com" <lorenzo.stoakes@...cle.com>, "kees@...nel.org"
<kees@...nel.org>, "ryan.roberts@....com" <ryan.roberts@....com>,
"tglx@...utronix.de" <tglx@...utronix.de>, "jannh@...gle.com"
<jannh@...gle.com>, "peterz@...radead.org" <peterz@...radead.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>, "will@...nel.org" <will@...nel.org>,
"qperret@...gle.com" <qperret@...gle.com>, "linux-mm@...ck.org"
<linux-mm@...ck.org>, "broonie@...nel.org" <broonie@...nel.org>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH v5 00/18] pkeys-based page table hardening
On Thu, 2025-09-18 at 16:15 +0200, Kevin Brodsky wrote:
> This is where I have to apologise to Rick for not having studied his
> series more thoroughly, as patch 17 [2] covers this issue very well in
> the commit message.
>
> It seems fair to say there is no ideal or simple solution, though.
> Rick's patch reserves enough (PTE-mapped) memory for fully splitting the
> linear map, which is relatively simple but not very pleasant. Chatting
> with Ryan Roberts, we figured another approach, improving on solution 1
> mentioned in [2]. It would rely on allocating all PTPs from a special
> pool (without using set_memory_pkey() in pagetable_*_ctor), along those
> lines:
Oh I didn't realize ARM split the direct map now at runtime. IIRC it used to
just map at 4k if there were any permissions configured.
>
> 1. 2 pages are reserved at all times (with the appropriate pkey)
> 2. Try to allocate a 2M block. If needed, use a reserved page as PMD to
> split a PUD. If successful, set its pkey - the entire block can now be
> used for PTPs. Replenish the reserve from the block if needed.
> 3. If no block is available, make an order-2 allocation (4 pages). If
> needed, use 1-2 reserved pages to split PUD/PMD. Set the pkey of the 4
> pages, take 1-2 pages to replenish the reserve if needed.
Oh, good idea!
>
> This ensures that we never run out of PTPs for splitting. We may get
> into an OOM situation more easily due to the order-2 requirement, but
> the risk remains low compared to requiring a 2M block. A bigger concern
> is concurrency - do we need a per-CPU cache? Reserving a 2M block per
> CPU could be very much overkill.
>
> No matter which solution is used, this clearly increases the complexity
> of kpkeys_hardened_pgtables. Mike Rapoport has posted a number of RFCs
> [3][4] that aim at addressing this problem more generally, but no
> consensus seems to have emerged and I'm not sure they would completely
> solve this specific problem either.
>
> For now, my plan is to stick to solution 3 from [2], i.e. force the
> linear map to be PTE-mapped. This is easily done on arm64 as it is the
> default, and is required for rodata=full, unless [1] is applied and the
> system supports BBML2_NOABORT. See [1] for the potential performance
> improvements we'd be missing out on (~5% ballpark).
>
I continue to be surprised that allocation time pkey conversion is not a
performance disaster, even with the directmap pre-split.
> I'm not quite sure
> what the picture looks like on x86 - it may well be more significant as
> Rick suggested.
I think having more efficient direct map permissions is a solvable problem, but
each usage is just a little too small to justify the infrastructure for a good
solution. And each simple solution is a little too much overhead to justify the
usage. So there is a long tail of blocked usages:
- pkeys usages (page tables and secret protection)
- kernel shadow stacks
- More efficient executable code allocations (BPF, kprobe trampolines, etc)
Although the BPF folks started doing their own thing for this. But I don't think
there are any fundamentally unsolvable problems for a generic solution. It's a
question of a leading killer usage to justify the infrastructure. Maybe it will
be kernel shadow stack.
Powered by blists - more mailing lists