[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6167c4ce-fef0-4af4-a6a1-9fe7b2eb023d@os.amperecomputing.com>
Date: Wed, 10 Jul 2024 11:43:18 -0700
From: Yang Shi <yang@...amperecomputing.com>
To: Catalin Marinas <catalin.marinas@....com>
Cc: "Christoph Lameter (Ampere)" <cl@...two.org>, will@...nel.org,
anshuman.khandual@....com, david@...hat.com, scott@...amperecomputing.com,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [v5 PATCH] arm64: mm: force write fault for atomic RMW
instructions
On 7/10/24 2:22 AM, Catalin Marinas wrote:
> On Tue, Jul 09, 2024 at 03:29:58PM -0700, Yang Shi wrote:
>> On 7/9/24 11:35 AM, Catalin Marinas wrote:
>>> On Tue, Jul 09, 2024 at 10:56:55AM -0700, Yang Shi wrote:
>>>> On 7/4/24 3:03 AM, Catalin Marinas wrote:
>>>> I tested exec-only on QEMU tcg, but I don't have a hardware supported EPAN.
>>>> I don't think performance benchmark on QEMU tcg makes sense since it is
>>>> quite slow, such small overhead is unlikely measurable on it.
>>> Yeah, benchmarking under qemu is pointless. I think you can remove some
>>> of the ARM64_HAS_EPAN checks (or replaced them with ARM64_HAS_PAN) just
>>> for testing. For security reason, we removed this behaviour in commit
>>> 24cecc377463 ("arm64: Revert support for execute-only user mappings")
>>> but it's good enough for testing. This should give you PROT_EXEC-only
>>> mappings on your hardware.
>> Thanks for the suggestion. IIUC, I still can emulate exec-only even though
>> hardware doesn't support EPAN? So it means reading exec-only area in kernel
>> still can trigger fault, right?
> Yes, it's been supported since ARMv8.0. We limited it to EPAN only since
> setting a PROT_EXEC mapping still allowed the kernel to access the
> memory even if PSTATE.PAN was set.
>
>> And 24cecc377463 ("arm64: Revert support for execute-only user mappings")
>> can't be reverted cleanly by git revert, so I did it manually as below.
> Yeah, I wasn't expecting that to work.
>
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index 6a8b71917e3b..0bdedd415e56 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -573,8 +573,8 @@ static int __kprobes do_page_fault(unsigned long far,
>> unsigned long esr,
>> /* Write implies read */
>> vm_flags |= VM_WRITE;
>> /* If EPAN is absent then exec implies read */
>> - if (!alternative_has_cap_unlikely(ARM64_HAS_EPAN))
>> - vm_flags |= VM_EXEC;
>> + //if (!alternative_has_cap_unlikely(ARM64_HAS_EPAN))
>> + // vm_flags |= VM_EXEC;
>> }
>>
>> if (is_ttbr0_addr(addr) && is_el1_permission_fault(addr, esr, regs))
>> {
>> diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
>> index 642bdf908b22..d30265d424e4 100644
>> --- a/arch/arm64/mm/mmap.c
>> +++ b/arch/arm64/mm/mmap.c
>> @@ -19,7 +19,7 @@ static pgprot_t protection_map[16] __ro_after_init = {
>> [VM_WRITE] = PAGE_READONLY,
>> [VM_WRITE | VM_READ] = PAGE_READONLY,
>> /* PAGE_EXECONLY if Enhanced PAN */
>> - [VM_EXEC] = PAGE_READONLY_EXEC,
>> + [VM_EXEC] = PAGE_EXECONLY,
>> [VM_EXEC | VM_READ] = PAGE_READONLY_EXEC,
>> [VM_EXEC | VM_WRITE] = PAGE_READONLY_EXEC,
>> [VM_EXEC | VM_WRITE | VM_READ] = PAGE_READONLY_EXEC,
> In theory you'd need to change the VM_SHARED | VM_EXEC entry as well.
> Otherwise it looks fine.
Thanks. I just ran the same benchmark. Ran the modified
page_fault1_thread (trigger read fault) in 100 iterations with 160
threads on 160 cores. This should be the worst contention case and
collected the max data (worst latency). It shows the patch may incur
~30% overhead for exec-only case. The overhead should just come from the
permission fault.
N Min Max Median Avg Stddev
x 100 163840 219083 184471 183262 12593.229
+ 100 211198 285947 233608 238819.98 15253.967
Difference at 95.0% confidence
55558 +/- 3877
30.3161% +/- 2.11555%
This is a very extreme benchmark, I don't think any real life workload
will spend that much time (sys vs user) in page fault, particularly read
fault.
With my atomic fault benchmark (populate 1G memory with atomic
instruction then manipulate the value stored in the memory in 100
iterations so the user time is much longer than sys time), I saw around
13% overhead on sys time due to the permission fault, but no noticeable
change for user and real time.
So the permission fault does incur noticeable overhead for read fault on
exec-only, but it may be not that bad for real life workloads.
>
Powered by blists - more mailing lists