linux-kernel - Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a8571d06-67db-450a-a3e4-d5bc9350a9ab@redhat.com>
Date: Tue, 14 May 2024 17:57:58 +0200
From: David Hildenbrand <david@...hat.com>
To: Catalin Marinas <catalin.marinas@....com>,
 Yang Shi <yang@...amperecomputing.com>
Cc: will@...nel.org, scott@...amperecomputing.com, cl@...two.org,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions

On 14.05.24 12:39, Catalin Marinas wrote:
> On Fri, May 10, 2024 at 10:13:02AM -0700, Yang Shi wrote:
>> On 5/10/24 5:11 AM, Catalin Marinas wrote:
>>> On Tue, May 07, 2024 at 03:35:58PM -0700, Yang Shi wrote:
>>>> The atomic RMW instructions, for example, ldadd, actually does load +
>>>> add + store in one instruction, it may trigger two page faults, the
>>>> first fault is a read fault, the second fault is a write fault.
>>>>
>>>> Some applications use atomic RMW instructions to populate memory, for
>>>> example, openjdk uses atomic-add-0 to do pretouch (populate heap memory
>>>> at launch time) between v18 and v22.
>>> I'd also argue that this should be optimised in openjdk. Is an LDADD
>>> more efficient on your hardware than a plain STR? I hope it only does
>>> one operation per page rather than per long. There's also MAP_POPULATE
>>> that openjdk can use to pre-fault the pages with no additional fault.
>>> This would be even more efficient than any store or atomic operation.
>>
>> It is not about whether atomic is more efficient than plain store on our
>> hardware or not. It is arch-independent solution used by openjdk.
> 
> It may be arch independent but it's not a great choice. If you run this
> on pre-LSE atomics hardware (ARMv8.0), this operation would involve
> LDXR+STXR and there's no way for the kernel to "upgrade" it to a write
> operation on the first LDXR fault.
> 
> It would be good to understand why openjdk is doing this instead of a
> plain write. Is it because it may be racing with some other threads
> already using the heap? That would be a valid pattern.

Maybe openjdk should be switching to MADV_POPULATE_WRITE. QEMU did that 
for the preallocate/populate use case.

-- 
Cheers,

David / dhildenb