[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c1ba9ba3-b0d6-4c6c-d628-614751d737c2@gentwo.org>
Date: Wed, 8 May 2024 10:15:28 -0700 (PDT)
From: "Christoph Lameter (Ampere)" <cl@...two.org>
To: Anshuman Khandual <anshuman.khandual@....com>
cc: Yang Shi <yang@...amperecomputing.com>, catalin.marinas@....com,
will@...nel.org, scott@...amperecomputing.com,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] arm64: mm: force write fault for atomic RMW
instructions
On Wed, 8 May 2024, Anshuman Khandual wrote:
>> The atomic RMW instructions, for example, ldadd, actually does load +
>> add + store in one instruction, it may trigger two page faults, the
>> first fault is a read fault, the second fault is a write fault.
>
> It may or it will definitely create two consecutive page faults. What
> if the second write fault never came about. In that case an writable
> page table entry would be created unnecessarily (or even wrongfully),
> thus breaking the CoW.
An atomic RMV will always perform a write? If there is a read fault
then write fault will follow.
>> Some applications use atomic RMW instructions to populate memory, for
>> example, openjdk uses atomic-add-0 to do pretouch (populate heap memory
>
> But why cannot normal store operation is sufficient for pre-touching
> the heap memory, why read-modify-write (RMW) is required instead ?
Sure a regular write operation is sufficient but you would have to modify
existing applications to get that done. x86 does not do a read fault on
atomics so we have an issue htere.
> If the memory address has some valid data, it must have already reached there
> via a previous write access, which would have caused initial CoW transition ?
> If the memory address has no valid data to begin with, why even use RMW ?
Because the application can reasonably assume that all uninitialized data
is zero and therefore it is not necessary to have a prior write access.
>> Some other architectures also have code inspection in page fault path,
>> for example, SPARC and x86.
>
> Okay, I was about to ask, but is not calling get_user() for all data
> read page faults increase the cost for a hot code path in general for
> some potential savings for a very specific use case. Not sure if that
> is worth the trade-off.
The instruction is cache hot since it must be present in the cpu cache for
the fault. So the overhead is minimal.
Powered by blists - more mailing lists