lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 16 May 2023 14:30:57 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Catalin Marinas <catalin.marinas@....com>
Cc:     Peter Collingbourne <pcc@...gle.com>,
        Qun-wei Lin (林群崴) 
        <Qun-wei.Lin@...iatek.com>, linux-arm-kernel@...ts.infradead.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        "surenb@...gle.com" <surenb@...gle.com>,
        Chinwen Chang (張錦文) 
        <chinwen.chang@...iatek.com>,
        "kasan-dev@...glegroups.com" <kasan-dev@...glegroups.com>,
        Kuan-Ying Lee (李冠穎) 
        <Kuan-Ying.Lee@...iatek.com>,
        Casper Li (李中榮) <casper.li@...iatek.com>,
        "gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
        vincenzo.frascino@....com,
        Alexandru Elisei <alexandru.elisei@....com>, will@...nel.org,
        eugenis@...gle.com, Steven Price <steven.price@....com>,
        stable@...r.kernel.org
Subject: Re: [PATCH 1/3] mm: Move arch_do_swap_page() call to before
 swap_free()

On 15.05.23 19:34, Catalin Marinas wrote:
> On Sat, May 13, 2023 at 05:29:53AM +0200, David Hildenbrand wrote:
>> On 13.05.23 01:57, Peter Collingbourne wrote:
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 01a23ad48a04..83268d287ff1 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -3914,19 +3914,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>>    		}
>>>    	}
>>> -	/*
>>> -	 * Remove the swap entry and conditionally try to free up the swapcache.
>>> -	 * We're already holding a reference on the page but haven't mapped it
>>> -	 * yet.
>>> -	 */
>>> -	swap_free(entry);
>>> -	if (should_try_to_free_swap(folio, vma, vmf->flags))
>>> -		folio_free_swap(folio);
>>> -
>>> -	inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
>>> -	dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
>>>    	pte = mk_pte(page, vma->vm_page_prot);
>>> -
>>>    	/*
>>>    	 * Same logic as in do_wp_page(); however, optimize for pages that are
>>>    	 * certainly not shared either because we just allocated them without
>>> @@ -3946,8 +3934,21 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>>    		pte = pte_mksoft_dirty(pte);
>>>    	if (pte_swp_uffd_wp(vmf->orig_pte))
>>>    		pte = pte_mkuffd_wp(pte);
>>> +	arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte);
>>>    	vmf->orig_pte = pte;
>>> +	/*
>>> +	 * Remove the swap entry and conditionally try to free up the swapcache.
>>> +	 * We're already holding a reference on the page but haven't mapped it
>>> +	 * yet.
>>> +	 */
>>> +	swap_free(entry);
>>> +	if (should_try_to_free_swap(folio, vma, vmf->flags))
>>> +		folio_free_swap(folio);
>>> +
>>> +	inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
>>> +	dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
>>> +
>>>    	/* ksm created a completely new copy */
>>>    	if (unlikely(folio != swapcache && swapcache)) {
>>>    		page_add_new_anon_rmap(page, vma, vmf->address);
>>> @@ -3959,7 +3960,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>>    	VM_BUG_ON(!folio_test_anon(folio) ||
>>>    			(pte_write(pte) && !PageAnonExclusive(page)));
>>>    	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
>>> -	arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_pte);
>>>    	folio_unlock(folio);
>>>    	if (folio != swapcache && swapcache) {
>>
>>
>> You are moving the folio_free_swap() call after the folio_ref_count(folio)
>> == 1 check, which means that such (previously) swapped pages that are
>> exclusive cannot be detected as exclusive.
>>
>> There must be a better way to handle MTE here.
>>
>> Where are the tags stored, how is the location identified, and when are they
>> effectively restored right now?
> 
> I haven't gone through Peter's patches yet but a pretty good description
> of the problem is here:
> https://lore.kernel.org/all/5050805753ac469e8d727c797c2218a9d780d434.camel@mediatek.com/.
> I couldn't reproduce it with my swap setup but both Qun-wei and Peter
> triggered it.
> 
> When a tagged page is swapped out, the arm64 code stores the metadata
> (tags) in a local xarray indexed by the swap pte. When restoring from
> swap, the arm64 set_pte_at() checks this xarray using the old swap pte
> and spills the tags onto the new page. Apparently something changed in
> the kernel recently that causes swap_range_free() to be called before
> set_pte_at(). The arm64 arch_swap_invalidate_page() frees the metadata
> from the xarray and the subsequent set_pte_at() won't find it.
> 
> If we have the page, the metadata can be restored before set_pte_at()
> and I guess that's what Peter is trying to do (again, I haven't looked
> at the details yet; leaving it for tomorrow).

Thanks for the details! I was missing that we also have a hook in 
swap_range_free().

> 
> Is there any other way of handling this? E.g. not release the metadata
> in arch_swap_invalidate_page() but later in set_pte_at() once it was
> restored. But then we may leak this metadata if there's no set_pte_at()
> (the process mapping the swap entry died).

That was my immediate thought: do we really have to hook into 
swap_range_free() at all? And I also wondered why we have to do this 
from set_pte_at() and not do this explicitly (maybe that's the other 
arch_* callback on the swapin path).

I'll have a look at v2, maybe it can be fixed easily without having to 
shuffle around too much of the swapin code (which can easily break again 
because the dependencies are not obvious at all and even undocumented in 
the code).

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ