linux-kernel - Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <85ecd9c7-8fae-4265-a240-88db54b88602@os.amperecomputing.com>
Date: Fri, 17 May 2024 10:35:13 -0700
From: Yang Shi <yang@...amperecomputing.com>
To: Catalin Marinas <catalin.marinas@....com>
Cc: peterx@...hat.com, will@...nel.org, scott@...amperecomputing.com,
 cl@...two.org, linux-arm-kernel@...ts.infradead.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH] arm64: mm: force write fault for atomic RMW instructions



On 5/17/24 10:25 AM, Catalin Marinas wrote:
> On Fri, May 17, 2024 at 09:30:23AM -0700, Yang Shi wrote:
>> On 5/14/24 3:39 AM, Catalin Marinas wrote:
>>> It would be good to understand why openjdk is doing this instead of a
>>> plain write. Is it because it may be racing with some other threads
>>> already using the heap? That would be a valid pattern.
>> Yes, you are right. I think I quoted the JVM justification in earlier email,
>> anyway they said "permit use of memory concurrently with pretouch".
> Ah, sorry, I missed that. This seems like a valid reason.

I should have articulated this in the commit log. Will add this in v2.

>
>>> A point Will raised was on potential ABI changes introduced by this
>>> patch. The ESR_EL1 reported to user remains the same as per the hardware
>>> spec (read-only), so from a SIGSEGV we may have some slight behaviour
>>> changes:
>>>
>>> 1. PTE invalid:
>>>
>>>      a) vma is VM_READ && !VM_WRITE permission - SIGSEGV reported with
>>>         ESR_EL1.WnR == 0 in sigcontext with your patch. Without this
>>>         patch, the PTE is mapped as PTE_RDONLY first and a subsequent
>>>         fault will report SIGSEGV with ESR_EL1.WnR == 1.
>> I think I can do something like the below conceptually:
>>
>> if is_el0_atomic_instr && !is_write_abort
>>      force_write = true
>>
>> if VM_READ && !VM_WRITE && force_write == true
> Nit: write implies read, so you only need to check !write.
>
>>      vm_flags = VM_READ
>>      mm_flags ~= FAULT_FLAG_WRITE
>>
>> Then we just fallback to read fault. The following write fault will trigger
>> SIGSEGV with consistent ABI.
> I think this should work. So instead of reporting the write fault
> directly in case of a read-only vma, we let the core code handle the
> read fault and first and we retry the atomic instruction.

Yes, just undo the force write when vma flags don't allow it.

>
>>>      b) vma is !VM_READ && !VM_WRITE permission - SIGSEGV reported with
>>>         ESR_EL1.WnR == 0, so no change from current behaviour, unless we
>>>         fix the patch for (1.a) to fake the WnR bit which would change the
>>>         current expectations.
>>>
>>> 2. PTE valid with PTE_RDONLY - we get a normal writeable fault in
>>>      hardware, no need to fix ESR_EL1 up.
>>>
>>> The patch would have to address (1) above but faking the ESR_EL1.WnR bit
>>> based on the vma flags looks a bit fragile.
>> I think we don't need to fake the ESR_EL1.WnR bit with the fallback.
> I agree, with your approach above we don't need to fake WnR.
>
>>> Similarly, we have userfaultfd that reports the fault to user. I think
>>> in scenario (1) the kernel will report UFFD_PAGEFAULT_FLAG_WRITE with
>>> your patch but no UFFD_PAGEFAULT_FLAG_WP. Without this patch, there are
>>> indeed two faults, with the second having both UFFD_PAGEFAULT_FLAG_WP
>>> and UFFD_PAGEFAULT_FLAG_WRITE set.
>> I don't quite get what the problem is. IIUC, uffd just needs a signal from
>> kernel to tell this area will be written. It seems not break the semantic.
>> Added Peter Xu in this loop, who is the uffd developer. He may shed some
>> light.
> Not really familiar with uffd but just looking at the code, if a handler
> is registered for both MODE_MISSING and MODE_WP, currently the atomic
> instruction signals a user fault without UFFD_PAGEFAULT_FLAG_WRITE (the
> do_anonymous_page() path). If the page is mapped by the uffd handler as
> the zero page, a restart of the instruction would signal
> UFFD_PAGEFAULT_FLAG_WRITE and UFFD_PAGEFAULT_FLAG_WP (the do_wp_page()
> path).
>
> With your patch, we get the equivalent of UFFD_PAGEFAULT_FLAG_WRITE on
> the first attempt, just like having a STR instruction instead of
> separate LDR + STR (as the atomics behave from a fault perspective).
>
> However, I don't think that's a problem, the uffd handler should cope
> with an STR anyway, so it's not some unexpected combination of flags.

Yes, this is what I thought.

>