linux-kernel - Re: [RFC PATCH v1 3/4] mm/memory: Use ptep_get_lockless_norecency() for orig

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f6996c4f-da60-4267-bcf1-09e4fd40c91b@redhat.com>
Date: Tue, 26 Mar 2024 18:38:01 +0100
From: David Hildenbrand <david@...hat.com>
To: Ryan Roberts <ryan.roberts@....com>, Mark Rutland <mark.rutland@....com>,
 Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
 Adrian Hunter <adrian.hunter@...el.com>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Muchun Song <muchun.song@...ux.dev>
Cc: linux-arm-kernel@...ts.infradead.org, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v1 3/4] mm/memory: Use ptep_get_lockless_norecency()
 for orig_pte

On 26.03.24 18:27, Ryan Roberts wrote:
> On 26/03/2024 17:02, David Hildenbrand wrote:
>> On 15.02.24 13:17, Ryan Roberts wrote:
>>> Let's convert handle_pte_fault()'s use of ptep_get_lockless() to
>>> ptep_get_lockless_norecency() to save orig_pte.
>>>
>>> There are a number of places that follow this model:
>>>
>>>       orig_pte = ptep_get_lockless(ptep)
>>>       ...
>>>       <lock>
>>>       if (!pte_same(orig_pte, ptep_get(ptep)))
>>>               // RACE!
>>>       ...
>>>       <unlock>
>>>
>>> So we need to be careful to convert all of those to use
>>> pte_same_norecency() so that the access and dirty bits are excluded from
>>> the comparison.
>>>
>>> Additionally there are a couple of places that genuinely rely on the
>>> access and dirty bits of orig_pte, but with some careful refactoring, we
>>> can use ptep_get() once we are holding the lock to achieve equivalent
>>> logic.
>>
>> We really should document that changed behavior somewhere where it can be easily
>> found: that orig_pte might have incomplete/stale accessed/dirty information.
> 
> I could add it to the orig_pte definition in the `struct vm_fault`?
> 
>>
>>
>>> @@ -5343,7 +5356,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>>>                             vmf->address, &vmf->ptl);
>>>            if (unlikely(!vmf->pte))
>>>                return 0;
>>> -        vmf->orig_pte = ptep_get_lockless(vmf->pte);
>>> +        vmf->orig_pte = ptep_get_lockless_norecency(vmf->pte);
>>>            vmf->flags |= FAULT_FLAG_ORIG_PTE_VALID;
>>>
>>>            if (pte_none(vmf->orig_pte)) {
>>> @@ -5363,7 +5376,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf)
>>>
>>>        spin_lock(vmf->ptl);
>>>        entry = vmf->orig_pte;
>>> -    if (unlikely(!pte_same(ptep_get(vmf->pte), entry))) {
>>> +    if (unlikely(!pte_same_norecency(ptep_get(vmf->pte), entry))) {
>>>            update_mmu_tlb(vmf->vma, vmf->address, vmf->pte);
>>>            goto unlock;
>>
>> I was wondering about the following:
>>
>> Assume the PTE is not dirty.
>>
>> Thread 1 does
> 
> Sorry not sure what threads have to do with this? How is the vmf shared between
> threads? What have I misunderstood...

Assume we have a HW that does not have HW-managed access/dirty bits. One 
that ends up using generic ptep_set_access_flags(). Access/dirty bits 
are always updated under PT lock.

Then, imagine two threads. One is the the fault path here. another 
thread performs some other magic that sets the PTE dirty under PTL.

> 
>>
>> vmf->orig_pte = ptep_get_lockless_norecency(vmf->pte)
>> /* not dirty */
>>
>> /* Now, thread 2 ends up setting the PTE dirty under PT lock. */
>>
>> spin_lock(vmf->ptl);
>> entry = vmf->orig_pte;
>> if (unlikely(!pte_same(ptep_get(vmf->pte), entry))) {
>>      ...
>> }
>> ...
>> entry = pte_mkyoung(entry);
> 
> Do you mean pte_mkdirty() here? You're talking about dirty everywhere else.

No, that is just thread 1 seeing "oh, nothing to do" and then goes ahead 
and unconditionally does that in handle_pte_fault().

> 
>> if (ptep_set_access_flags(vmf->vma, ...)
>> ...
>> pte_unmap_unlock(vmf->pte, vmf->ptl);
>>
>>
>> Generic ptep_set_access_flags() will do another pte_same() check and realize
>> "hey, there was a change!" let's update the PTE!
>>
>> set_pte_at(vma->vm_mm, address, ptep, entry);
> 
> This is called from the generic ptep_set_access_flags() in your example, right?
> 

Yes.

>>
>> would overwrite the dirty bit set by thread 2.
> 
> I'm not really sure what you are getting at... Is your concern that there is a
> race where the page could become dirty in the meantime and it now gets lost? I
> think that's why arm64 overrides ptep_set_access_flags(); since the hw can
> update access/dirty we have to deal with the races.

My concern is that your patch can in subtle ways lead to use losing PTE 
dirty bits on architectures that don't have the HW-managed dirty bit. 
They do exist ;)

Arm64 should be fine in that regard.

-- 
Cheers,

David / dhildenb