lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 13 Apr 2022 11:30:19 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Miaohe Lin <linmiaohe@...wei.com>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>
Subject: Re: [PATCH v2 1/8] mm/swap: remember PG_anon_exclusive via a swp pte
 bit

On 13.04.22 10:58, Miaohe Lin wrote:
> On 2022/3/30 0:43, David Hildenbrand wrote:
>> Currently, we clear PG_anon_exclusive in try_to_unmap() and forget about
> ...
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 14618f446139..9060cc7f2123 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -792,6 +792,11 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>>  						&src_mm->mmlist);
>>  			spin_unlock(&mmlist_lock);
>>  		}
>> +		/* Mark the swap entry as shared. */
>> +		if (pte_swp_exclusive(*src_pte)) {
>> +			pte = pte_swp_clear_exclusive(*src_pte);
>> +			set_pte_at(src_mm, addr, src_pte, pte);
>> +		}
>>  		rss[MM_SWAPENTS]++;
>>  	} else if (is_migration_entry(entry)) {
>>  		page = pfn_swap_entry_to_page(entry);
>> @@ -3559,6 +3564,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>  	struct page *page = NULL, *swapcache;
>>  	struct swap_info_struct *si = NULL;
>>  	rmap_t rmap_flags = RMAP_NONE;
>> +	bool exclusive = false;
>>  	swp_entry_t entry;
>>  	pte_t pte;
>>  	int locked;
>> @@ -3724,6 +3730,46 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>  	BUG_ON(!PageAnon(page) && PageMappedToDisk(page));
>>  	BUG_ON(PageAnon(page) && PageAnonExclusive(page));
>>  
>> +	/*
>> +	 * Check under PT lock (to protect against concurrent fork() sharing
>> +	 * the swap entry concurrently) for certainly exclusive pages.
>> +	 */
>> +	if (!PageKsm(page)) {
>> +		/*
>> +		 * Note that pte_swp_exclusive() == false for architectures
>> +		 * without __HAVE_ARCH_PTE_SWP_EXCLUSIVE.
>> +		 */
>> +		exclusive = pte_swp_exclusive(vmf->orig_pte);
>> +		if (page != swapcache) {
>> +			/*
>> +			 * We have a fresh page that is not exposed to the
>> +			 * swapcache -> certainly exclusive.
>> +			 */
>> +			exclusive = true;
>> +		} else if (exclusive && PageWriteback(page) &&
>> +			   !(swp_swap_info(entry)->flags & SWP_STABLE_WRITES)) {
> 
> Really sorry for late respond and a newbie question. IIUC, if SWP_STABLE_WRITES is set,
> it means concurrent page modifications while under writeback is not supported. For these
> problematic swap backends, exclusive marker is dropped. So the above if statement is to
> filter out these problematic swap backends which have SWP_STABLE_WRITES set. If so, the
> above check should be && (swp_swap_info(entry)->flags & SWP_STABLE_WRITES)), i.e. no "!".
> Or am I miss something?

Oh, thanks for your careful eyes!

Indeed, SWP_STABLE_WRITES indicates that the backend *requires* stable
writes, meaning, we must not modify the page while writeback is active.

So if and only if that is set, we must drop the exclusive marker.

This essentially corresponds to previous reuse_swap_page() logic:

bool reuse_swap_page(struct page *page)
{
...
	if (!PageWriteback(page)) {
		...
	} else {
		...
		if (p->flags & SWP_STABLE_WRITES) {
			spin_unlock(&p->lock);
			return false;
		}
...
}

Fortunately, this only affects such backends. For backends without
SWP_STABLE_WRITES, the current code is simply sub-optimal.


So yes, this has to be

} else if (exclusive && PageWriteback(page) &&
	   (swp_swap_info(entry)->flags & SWP_STABLE_WRITES)) {


Let me try finding a way to test this, the tests I was running so far
were apparently not using a backend with SWP_STABLE_WRITES.

-- 
Thanks,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ