linux-kernel - Re: [v5 PATCH] arm64: mm: force write fault for atomic RMW instructions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zog6eFF1zDl4IRHX@arm.com>
Date: Fri, 5 Jul 2024 19:24:56 +0100
From: Catalin Marinas <catalin.marinas@....com>
To: "Christoph Lameter (Ampere)" <cl@...two.org>
Cc: Yang Shi <yang@...amperecomputing.com>, will@...nel.org,
	anshuman.khandual@....com, david@...hat.com,
	scott@...amperecomputing.com, linux-arm-kernel@...ts.infradead.org,
	linux-kernel@...r.kernel.org
Subject: Re: [v5 PATCH] arm64: mm: force write fault for atomic RMW
 instructions

On Fri, Jul 05, 2024 at 10:05:29AM -0700, Christoph Lameter (Ampere) wrote:
> On Thu, 4 Jul 2024, Catalin Marinas wrote:
> > It could be worked around with a new flavour of get_user() that uses the
> > non-T LDR instruction and the user mapping is readable by the kernel
> > (that's the case with EPAN, prior to PIE and I think we can change this
> > for PIE configurations as well). But it adds to the complexity of this
> > patch when the kernel already offers a MADV_POPULATE_WRITE solution.
> 
> The use of MADV_POPULATE_WRITE here is arch specific and not a general
> solution. It requires specialized knowledge and research before someone can
> figure out that this particular trick is required on Linux ARM64 processors.
> The builders need to detect this special situation in the build process and
> activate this workaround.

Not really, see this OpenJDK commit:

https://github.com/openjdk/jdk/commit/a65a89522d2f24b1767e1c74f6689a22ea32ca6a

There's nothing about arm64 in there and it looks like the code prefers
MADV_POPULATE_WRITE if THPs are enabled (which is the case in all
enterprise distros). I can't tell whether the change was made to work
around the arm64 behaviour, there's no commit log (it was contributed by
Ampere).

There's a separate thread with the mm folk on the THP behaviour for
pmd_none() vs pmd mapping the zero huge page but it is more portable for
OpenJDK to use madvise() than guess the kernel behaviour and touch small
pages or a single large pages. Even if one claims that atomic_add(0) is
portable across operating systems, the OpenJDK code was already treating
Linux as a special case in the presence of THP.

> It would be much simpler to just merge the patch and be done with it.
> Otherwise this issue will continue to cause uncountably many hours of
> anguish for sysadmins and developers all over the Linux ecosystem trying to
> figure out what in the world is going on with ARM.

People will be happy until one enables execute-only ELF text sections in
a distro and all that opcode parsing will add considerable overhead for
many read faults (those with a writeable vma).

I'd also like to understand (probably have to re-read the older threads)
whether the overhead is caused mostly by the double fault or the actual
breaking of a THP. For the latter, the mm folk are willing to change the
behaviour so that pmd_none() and pmd to the zero high page are treated
similarly (i.e. allocate a huge page on write fault). If that's good
enough, I'd rather not merge this patch (or some form of it) and wait
for a proper fix in hardware in the future.

Just to be clear, there are still potential issues to address (or
understand the impact of) in this patch with exec-only mappings and
the performance gain _after_ the THP behaviour changed in the mm code.
We can make a call once we have more data but, TBH, my inclination is
towards 'no' given that OpenJDK already support madvise() and it's not
arm64 specific.

-- 
Catalin