linux-kernel - Re: [v5 PATCH] arm64: mm: force write fault for atomic RMW instructions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1689cd26-514a-4d72-a1bd-b67357aab3e0@os.amperecomputing.com>
Date: Tue, 2 Jul 2024 15:21:41 -0700
From: Yang Shi <yang@...amperecomputing.com>
To: Catalin Marinas <catalin.marinas@....com>
Cc: "Christoph Lameter (Ampere)" <cl@...two.org>, will@...nel.org,
 anshuman.khandual@....com, david@...hat.com, scott@...amperecomputing.com,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [v5 PATCH] arm64: mm: force write fault for atomic RMW
 instructions



On 7/1/24 12:43 PM, Catalin Marinas wrote:
> On Fri, Jun 28, 2024 at 11:20:43AM -0700, Yang Shi wrote:
>> On 6/28/24 10:24 AM, Catalin Marinas wrote:
>>> This patch does feel a bit like working around a non-optimal user choice
>>> in kernel space. Who knows, madvise() may even be quicker if you do a
>>> single call for a larger VA vs touching each page.
>> IMHO, I don't think so. I viewed this patch to solve or workaround some ISA
>> inefficiency in kernel. Two faults are not necessary if we know we are
>> definitely going to write the memory very soon, right?
> I agree the Arm architecture behaviour is not ideal here and any
> timelines for fixing it in hardware, if they do happen, are far into the
> future. Purely from a kernel perspective, what I want though is make
> sure that longer term (a) we don't create additional maintenance burden
> and (b) we don't keep dead code around.
>
> Point (a) could be mitigated if the architecture is changed so that any
> new atomic instructions added to this range would also come with
> additional syndrome information so that we don't have to update the
> decoding patterns.
>
> Point (b), however, depends on the OpenJDK and the kernel versions in
> distros. Nick Gasson kindly provided some information on the OpenJDK
> changes. The atomic_add(0) change happened in early 2022, about 5-6
> months after MADV_POPULATE_WRITE support was added to the kernel. What's
> interesting is Ampere already contributed MADV_POPULATE_WRITE support to
> OpenJDK a few months ago:
>
> https://github.com/openjdk/jdk/commit/a65a89522d2f24b1767e1c74f6689a22ea32ca6a
>
> The OpenJDK commit lacks explanation but what I gathered from the diff
> is that this option is the preferred one in the presence of THP (which
> most/all distros enable by default). If we merge your proposed kernel
> patch, it will take time before it makes its way into distros. I'm
> hoping that by that time, distros would have picked a new OpenJDK
> version already that doesn't need the atomic_add(0) pattern. If that's
> the case, we end up with some dead code in the kernel that's almost
> never exercised.
>
> I don't follow OpenJDK development but I heard that updates are dragging
> quite a lot. I can't tell whether people have picked up the
> atomic_add(0) feature and whether, by the time a kernel patch would make
> it into distros, they'd also move to the MADV_POPULATE_WRITE pattern.

As Christopher said there may be similar use of atomic in other 
applications, so I don't worry too much about dead code problem IMHO. 
OpenJDK is just the usecase that we know. There may be unknown unknowns. 
And the distros typically backport patches from mainline kernel to their 
kernel so there should be combos like old kernel + backported patch + 
old OpenJDK.

AFAICT, the users do expect similar behavior as x86 (one fault instead 
of two faults). Actually we noticed this problem due to a customer report.

>
> There's a point (c) as well on the overhead of reading the faulting
> instruction. I hope that's negligible but I haven't measured it.

I think I showed benchmark data requested by Anshuman in the earlier 
email discussion.

>