[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e2e29986-59fe-447a-81a1-fd06308fe5dd@arm.com>
Date: Wed, 21 Aug 2024 17:02:40 +0530
From: Dev Jain <dev.jain@....com>
To: Catalin Marinas <catalin.marinas@....com>,
Yang Shi <yang@...amperecomputing.com>
Cc: "Christoph Lameter (Ampere)" <cl@...two.org>, will@...nel.org,
anshuman.khandual@....com, david@...hat.com, scott@...amperecomputing.com,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [v5 PATCH] arm64: mm: force write fault for atomic RMW
instructions
On 8/21/24 15:48, Catalin Marinas wrote:
> On Thu, Jul 11, 2024 at 11:17:51AM -0700, Yang Shi wrote:
>> On 7/11/24 10:43 AM, Catalin Marinas wrote:
>>> On Wed, Jul 10, 2024 at 11:43:18AM -0700, Yang Shi wrote:
>>>> On 7/10/24 2:22 AM, Catalin Marinas wrote:
>>>>>> diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
>>>>>> index 642bdf908b22..d30265d424e4 100644
>>>>>> --- a/arch/arm64/mm/mmap.c
>>>>>> +++ b/arch/arm64/mm/mmap.c
>>>>>> @@ -19,7 +19,7 @@ static pgprot_t protection_map[16] __ro_after_init = {
>>>>>> ������� [VM_WRITE]������������������������������������� = PAGE_READONLY,
>>>>>> ������� [VM_WRITE | VM_READ]��������������������������� = PAGE_READONLY,
>>>>>> ������� /* PAGE_EXECONLY if Enhanced PAN */
>>>>>> - ������ [VM_EXEC]�������������������������������������� = PAGE_READONLY_EXEC,
>>>>>> +� ����� [VM_EXEC]�������������������������������������� = PAGE_EXECONLY,
>>>>>> ������� [VM_EXEC | VM_READ]���������������������������� = PAGE_READONLY_EXEC,
>>>>>> ������� [VM_EXEC | VM_WRITE]��������������������������� = PAGE_READONLY_EXEC,
>>>>>> ������� [VM_EXEC | VM_WRITE | VM_READ]����������������� = PAGE_READONLY_EXEC,
>>>>> In theory you'd need to change the VM_SHARED | VM_EXEC entry as well.
>>>>> Otherwise it looks fine.
>>>> Thanks. I just ran the same benchmark. Ran the modified page_fault1_thread
>>>> (trigger read fault) in 100 iterations with 160 threads on 160 cores. This
>>>> should be the worst contention case and collected the max data (worst
>>>> latency). It shows the patch may incur ~30% overhead for exec-only case. The
>>>> overhead should just come from the permission fault.
>>>>
>>>> ��� N���������� Min���������� Max������� Median���������� Avg Stddev
>>>> x 100������� 163840������� 219083������� 184471������� 183262 12593.229
>>>> + 100������� 211198������� 285947������� 233608���� 238819.98 15253.967
>>>> Difference at 95.0% confidence
>>>> �� �55558 +/- 3877
>>>> �� �30.3161% +/- 2.11555%
>>>>
>>>> This is a very extreme benchmark, I don't think any real life workload will
>>>> spend that much time (sys vs user) in page fault, particularly read fault.
>>>>
>>>> With my atomic fault benchmark (populate 1G memory with atomic instruction
>>>> then manipulate the value stored in the memory in 100 iterations so the user
>>>> time is much longer than sys time), I saw around 13% overhead on sys time
>>>> due to the permission fault, but no noticeable change for user and real
>>>> time.
>>> Thanks for running these tests.
>>>
>>>> So the permission fault does incur noticeable overhead for read fault on
>>>> exec-only, but it may be not that bad for real life workloads.
>>> So you are saying 3 faults is not that bad for real life workloads but
>>> the 2 faults behaviour we have currently is problematic for OpenJDK. For
>>> the OpenJDK case, I don't think the faulting is the main issue affecting
>>> run-time performance but, over a longer run (or with more iterations in
>>> this benchmark after the faulting in), you'd notice the breaking up of
>>> the initial THP pages.
>> I meant the extra permission fault for exec-only should be ok since the
>> current implementation can't force write fault for exec-only anyway. It does
>> incur noticeable overhead for read fault, but I'm not aware of any real life
>> workloads are sensitive to read fault. The benchmark is for a very extreme
>> worst case.
> I agree microbenchmarks like this are not realistic but I wouldn't be
> surprised if we hear of some large file mmap() read or very sparse
> arrays read being affected.
>
> Sorry, I forgot the details from the previous discussions. Are there any
> benchmarks that show OpenJDK performing badly _and_ what the improvement
> is with this patch? We should do similar comparison with the THP kernel
> fix. If the THP fix gets us close to the same level of improvement, I
> don't think we should bother with this patch.
Just for the note, I am working on replacing the hugezeropage with a THP;
I will be posting that out (after some testing) soon.
Powered by blists - more mailing lists