lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 14 Apr 2022 10:40:18 +0800
From:   Miaohe Lin <linmiaohe@...wei.com>
To:     David Hildenbrand <david@...hat.com>
CC:     linux-kernel <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>, Minchan Kim <minchan@...nel.org>
Subject: Re: [PATCH v2 1/8] mm/swap: remember PG_anon_exclusive via a swp pte
 bit

On 2022/4/13 20:31, David Hildenbrand wrote:
> On 13.04.22 11:38, Miaohe Lin wrote:
>> On 2022/4/13 17:30, David Hildenbrand wrote:
>>> On 13.04.22 10:58, Miaohe Lin wrote:
>>>> On 2022/3/30 0:43, David Hildenbrand wrote:
>>>>> Currently, we clear PG_anon_exclusive in try_to_unmap() and forget about
>>>> ...
>>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>>> index 14618f446139..9060cc7f2123 100644
>>>>> --- a/mm/memory.c
>>>>> +++ b/mm/memory.c
>>>>> @@ -792,6 +792,11 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>>>>>  						&src_mm->mmlist);
>>>>>  			spin_unlock(&mmlist_lock);
>>>>>  		}
>>>>> +		/* Mark the swap entry as shared. */
>>>>> +		if (pte_swp_exclusive(*src_pte)) {
>>>>> +			pte = pte_swp_clear_exclusive(*src_pte);
>>>>> +			set_pte_at(src_mm, addr, src_pte, pte);
>>>>> +		}
>>>>>  		rss[MM_SWAPENTS]++;
>>>>>  	} else if (is_migration_entry(entry)) {
>>>>>  		page = pfn_swap_entry_to_page(entry);
>>>>> @@ -3559,6 +3564,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>>>>  	struct page *page = NULL, *swapcache;
>>>>>  	struct swap_info_struct *si = NULL;
>>>>>  	rmap_t rmap_flags = RMAP_NONE;
>>>>> +	bool exclusive = false;
>>>>>  	swp_entry_t entry;
>>>>>  	pte_t pte;
>>>>>  	int locked;
>>>>> @@ -3724,6 +3730,46 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>>>>  	BUG_ON(!PageAnon(page) && PageMappedToDisk(page));
>>>>>  	BUG_ON(PageAnon(page) && PageAnonExclusive(page));
>>>>>  
>>>>> +	/*
>>>>> +	 * Check under PT lock (to protect against concurrent fork() sharing
>>>>> +	 * the swap entry concurrently) for certainly exclusive pages.
>>>>> +	 */
>>>>> +	if (!PageKsm(page)) {
>>>>> +		/*
>>>>> +		 * Note that pte_swp_exclusive() == false for architectures
>>>>> +		 * without __HAVE_ARCH_PTE_SWP_EXCLUSIVE.
>>>>> +		 */
>>>>> +		exclusive = pte_swp_exclusive(vmf->orig_pte);
>>>>> +		if (page != swapcache) {
>>>>> +			/*
>>>>> +			 * We have a fresh page that is not exposed to the
>>>>> +			 * swapcache -> certainly exclusive.
>>>>> +			 */
>>>>> +			exclusive = true;
>>>>> +		} else if (exclusive && PageWriteback(page) &&
>>>>> +			   !(swp_swap_info(entry)->flags & SWP_STABLE_WRITES)) {
>>>>
>>>> Really sorry for late respond and a newbie question. IIUC, if SWP_STABLE_WRITES is set,
>>>> it means concurrent page modifications while under writeback is not supported. For these
>>>> problematic swap backends, exclusive marker is dropped. So the above if statement is to
>>>> filter out these problematic swap backends which have SWP_STABLE_WRITES set. If so, the
>>>> above check should be && (swp_swap_info(entry)->flags & SWP_STABLE_WRITES)), i.e. no "!".
>>>> Or am I miss something?
>>>
>>> Oh, thanks for your careful eyes!
>>>
>>> Indeed, SWP_STABLE_WRITES indicates that the backend *requires* stable
>>> writes, meaning, we must not modify the page while writeback is active.
>>>
>>> So if and only if that is set, we must drop the exclusive marker.
>>>
>>> This essentially corresponds to previous reuse_swap_page() logic:
>>>
>>> bool reuse_swap_page(struct page *page)
>>> {
>>> ...
>>> 	if (!PageWriteback(page)) {
>>> 		...
>>> 	} else {
>>> 		...
>>> 		if (p->flags & SWP_STABLE_WRITES) {
>>> 			spin_unlock(&p->lock);
>>> 			return false;
>>> 		}
>>> ...
>>> }
>>>
>>> Fortunately, this only affects such backends. For backends without
>>> SWP_STABLE_WRITES, the current code is simply sub-optimal.
>>>
>>>
>>> So yes, this has to be
>>>
>>> } else if (exclusive && PageWriteback(page) &&
>>> 	   (swp_swap_info(entry)->flags & SWP_STABLE_WRITES)) {
>>>
>>
>> I am glad that my question helps. :)
>>
>>>
>>> Let me try finding a way to test this, the tests I was running so far
>>> were apparently not using a backend with SWP_STABLE_WRITES.
>>>
>>
>> That will be really helpful. Many thanks for your hard work!
>>
> 
> FWIW, I tried with zram, which sets SWP_STABLE_WRITES ... but, it seems
> to always do a synchronous writeback, so it cannot really trigger this
> code path.

That's a pity. We really need a asynchronous writeback to trigger this code path.

> 
> commit f05714293a591038304ddae7cb0dd747bb3786cc
> Author: Minchan Kim <minchan@...nel.org>
> Date:   Tue Jan 10 16:58:15 2017 -0800
> 
>     mm: support anonymous stable page
> 
> 
> mentions "During developemnt for zram-swap asynchronous writeback,";
> maybe that can be activated somehow? Putting Minchan on CC.
> 

ZRAM_WRITEBACK might need to be configured to enable asynchronous IO:

+
+config ZRAM_WRITEBACK
+       bool "Write back incompressible page to backing device"
+       depends on ZRAM
+       default n
+       help
+        With incompressible page, there is no memory saving to keep it
+        in memory. Instead, write it out to backing device.
+        For this feature, admin should set up backing device via
+        /sys/block/zramX/backing_dev.
+
+        See zram.txt for more infomration.

It seems there is only asynchronous IO for swapin ops. I browsed the source code
and I can only found read_from_bdev_async. But I'm not familiar with the zram code.
Minchan might could kindly help us solving this question.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ