linux-kernel - Re: [PATCH 11/13] mm/munlock: page migration needs mlock pagevec drained

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f9960727-d061-8856-45ce-4e33e8ed1de6@suse.cz>
Date:   Fri, 11 Feb 2022 19:49:11 +0100
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     Michal Hocko <mhocko@...e.com>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        Matthew Wilcox <willy@...radead.org>,
        David Hildenbrand <david@...hat.com>,
        Alistair Popple <apopple@...dia.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Rik van Riel <riel@...riel.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Yu Zhao <yuzhao@...gle.com>, Greg Thelen <gthelen@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 11/13] mm/munlock: page migration needs mlock pagevec
 drained

On 2/6/22 22:49, Hugh Dickins wrote:
> Page migration of a VM_LOCKED page tends to fail, because when the old
> page is unmapped, it is put on the mlock pagevec with raised refcount,
> which then fails the freeze.
> 
> At first I thought this would be fixed by a local mlock_page_drain() at
> the upper rmap_walk() level - which would have nicely batched all the
> munlocks of that page; but tests show that the task can too easily move
> to another cpu, leaving pagevec residue behind which fails the migration.
> 
> So try_to_migrate_one() drain the local pagevec after page_remove_rmap()
> from a VM_LOCKED vma; and do the same in try_to_unmap_one(), whose
> TTU_IGNORE_MLOCK users would want the same treatment; and do the same
> in remove_migration_pte() - not important when successfully inserting
> a new page, but necessary when hoping to retry after failure.
> 
> Any new pagevec runs the risk of adding a new way of stranding, and we
> might discover other corners where mlock_page_drain() or lru_add_drain()
> would now help.  If the mlock pagevec raises doubts, we can easily add a
> sysctl to tune its length to 1, which reverts to synchronous operation.

Not a fan of adding new sysctls like those as that just pushes the failure
of kernel devs to poor admins :)
The old pagevec usage deleted by patch 1 was limited to the naturally larger
munlock_vma_pages_range() operation. The new per-cpu based one is more
general, which obviously has its advantages, but then it might bring new
corner cases.
So if this turns out to be an big problem, I would rather go back to the
limited scenario pagevec than a sysctl?

> Signed-off-by: Hugh Dickins <hughd@...gle.com>

Acked-by: Vlastimil Babka <vbabka@...e.cz>

> ---
>  mm/migrate.c | 2 ++
>  mm/rmap.c    | 4 ++++
>  2 files changed, 6 insertions(+)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index f4bcf1541b62..e7d0b68d5dcb 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -251,6 +251,8 @@ static bool remove_migration_pte(struct page *page, struct vm_area_struct *vma,
>  				page_add_file_rmap(new, vma, false);
>  			set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
>  		}
> +		if (vma->vm_flags & VM_LOCKED)
> +			mlock_page_drain(smp_processor_id());
>  
>  		/* No need to invalidate - it was non-present before */
>  		update_mmu_cache(vma, pvmw.address, pvmw.pte);
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 5442a5c97a85..714bfdc72c7b 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1656,6 +1656,8 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  		 * See Documentation/vm/mmu_notifier.rst
>  		 */
>  		page_remove_rmap(subpage, vma, PageHuge(page));
> +		if (vma->vm_flags & VM_LOCKED)
> +			mlock_page_drain(smp_processor_id());
>  		put_page(page);
>  	}
>  
> @@ -1930,6 +1932,8 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
>  		 * See Documentation/vm/mmu_notifier.rst
>  		 */
>  		page_remove_rmap(subpage, vma, PageHuge(page));
> +		if (vma->vm_flags & VM_LOCKED)
> +			mlock_page_drain(smp_processor_id());
>  		put_page(page);
>  	}
>