[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<SEZPR03MB67865557061A2494B1A1243CB44F2@SEZPR03MB6786.apcprd03.prod.outlook.com>
Date: Tue, 13 Feb 2024 19:15:07 +0000
From: Maxwell Bland <mbland@...orola.com>
To: Mark Rutland <mark.rutland@....com>
CC: "linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"catalin.marinas@....com" <catalin.marinas@....com>,
"will@...nel.org"
<will@...nel.org>,
"dennis@...nel.org" <dennis@...nel.org>,
"tj@...nel.org"
<tj@...nel.org>, "cl@...ux.com" <cl@...ux.com>,
"akpm@...ux-foundation.org"
<akpm@...ux-foundation.org>,
"shikemeng@...weicloud.com"
<shikemeng@...weicloud.com>,
"david@...hat.com" <david@...hat.com>,
"rppt@...nel.org" <rppt@...nel.org>,
"anshuman.khandual@....com"
<anshuman.khandual@....com>,
"willy@...radead.org" <willy@...radead.org>,
"ryan.roberts@....com" <ryan.roberts@....com>,
"rick.p.edgecombe@...el.com"
<rick.p.edgecombe@...el.com>,
"pcc@...gle.com" <pcc@...gle.com>,
"rmk+kernel@...linux.org.uk" <rmk+kernel@...linux.org.uk>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"gshan@...hat.com"
<gshan@...hat.com>,
"gregkh@...uxfoundation.org"
<gregkh@...uxfoundation.org>,
"Jonathan.Cameron@...wei.com"
<Jonathan.Cameron@...wei.com>,
"james.morse@....com" <james.morse@....com>,
Andrew Wheeler <awheeler@...orola.com>
Subject: Re: [PATCH] arm64: allow post-init vmalloc PXNTable
> From: Mark Rutland <mark.rutland@....com> On Tue, Feb 13, 2024 at 10:05:45AM
> -0600, Maxwell Bland wrote:
> > VMALLOC_START ffff800080000000 VMALLOC_END fffffbfff0000000 _text
> > ffffb6c0c1400000 _end ffffb6c0c3e40000
> >
> > Setting VMALLOC_END to _text in init would resolve this issue with the
> > caveat of a sizeable reduction in the size of available vmalloc memory due
> > to requirements on aslr randomness. However, there are circumstances where
> > this trade-off is necessary: in particular, hypervisor-level security
> > monitors where 1) the microarchitecture contains race conditions on PTE
> > level updates or 2) a per-PTE update verifier comes at a significant hit to
> > performance.
>
> Which "hypervisor-level security monitors" are you referring to?
Right now there are around 4 or 5 different attempts (from what I know: Moto,
Samsung, MediaTek, and Qualcomm) at making page tables immutable and reducing
the kernel threat surface to just dynamically allocated structs, e.g.
file_operations, in ARM, a revival of some of the ideas of:
https://wenboshen.org/publications/papers/tz-rkp-ccs14.pdf
Which are no longer possible to enforce for a number of reasons. As related to
this patch in particular: the performance hits involved in per-PTE update
verification are huge.
My goal is ultimately to prevent modern exploits like:
https://github.com/chompie1337/s8_2019_2215_poc
which modify dynamically allocated pointers, but trying to protect against these
exploits is disingenuous without first being able to enforce PXN on non-code
pages, i.e. there is a reason we do this in mm initialization, but we need to
enforce or support the enforcement of PXNTable dynamically too.
> We don't support any of those upstream AFAIK.
As is hopefully apparent from the above, though it will help downstream systems,
I do not see this patch as a support issue so much as a legitimate security
feature. There is the matter of deciding which subsystem should be responsible.
The generic vmalloc interface should provide a strong distinction between code
and data allocations, but enforcing this would become the responsibility of each
microarchitecture regardless.
>
> How much VA space are you potentially throwing away?
>
This is rough, I admit. )-: On the order of 70,000 GB, likely more in practice:
it restricts vmalloc to the region before _text. You may be thinking, "that is
ridiculous, c'mon Maxwell", and you would be right, but I was OK with this
trade-off for Moto systems, and was thinking the approach keeps the patch
changes small and simple.
I had a hard time thinking of a better way to do this while avoiding duplication
of vmalloc code into arm64 land. Potentially, though, it would be OK to add an
additional field to the generic vmalloc interface? I may need to reach out for
help here: maybe the solution to the issue will come more readily to those with
more experience.
> How does this work with other allocations of executable memory? e.g. modules,
> BPF?
It should work.
- arch/arm64/kernel/module.c uses __vmalloc_node_range with module_alloc_base
and module_alloc_end, bypassing the generic vmalloc_node region, and these
variables are decided based on a random offset between _text and _end.
- kernel/bpf/core.c uses bpf_jit_alloc_exec to create executable code regions,
which is a wrapper for module_alloc. In the interpreted BPF case, we do not
need to worry since the pages storing interpreted code are NX and can be
marked PXNTable regardless.
> I'm not keen on this as-is.
That's OK, so long as we agree enforcing PXNTable dynamically would be a good
thing. I look forward to your thoughts on the above, and I will go back and
iterate.
Working with IT to fix the email formatting now, so I will hopefully be able to
post a fetchable and runnable version of my initial patch shortly.
Powered by blists - more mailing lists