linux-kernel - Re: [PATCH 07/13] mm/munlock: mlock_pte

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2ec49f65-fe4e-26a0-4059-c18e6dab0af4@suse.cz>
Date:   Fri, 11 Feb 2022 17:45:26 +0100
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     Michal Hocko <mhocko@...e.com>,
        "Kirill A. Shutemov" <kirill@...temov.name>,
        Matthew Wilcox <willy@...radead.org>,
        David Hildenbrand <david@...hat.com>,
        Alistair Popple <apopple@...dia.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Rik van Riel <riel@...riel.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Yu Zhao <yuzhao@...gle.com>, Greg Thelen <gthelen@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 07/13] mm/munlock: mlock_pte_range() when mlocking or
 munlocking

On 2/6/22 22:42, Hugh Dickins wrote:
> Fill in missing pieces: reimplementation of munlock_vma_pages_range(),
> required to lower the mlock_counts when munlocking without munmapping;
> and its complement, implementation of mlock_vma_pages_range(), required
> to raise the mlock_counts on pages already there when a range is mlocked.
> 
> Combine them into just the one function mlock_vma_pages_range(), using
> walk_page_range() to run mlock_pte_range().  This approach fixes the
> "Very slow unlockall()" of unpopulated PROT_NONE areas, reported in
> https://lore.kernel.org/linux-mm/70885d37-62b7-748b-29df-9e94f3291736@gmail.com/
> 
> Munlock clears VM_LOCKED at the start, under exclusive mmap_lock; but if
> a racing truncate or holepunch (depending on i_mmap_rwsem) gets to the
> pte first, it will not try to munlock the page: leaving release_pages()
> to correct it when the last reference to the page is gone - that's okay,
> a page is not evictable anyway while it is held by an extra reference.
> 
> Mlock sets VM_LOCKED at the start, under exclusive mmap_lock; but if
> a racing remove_migration_pte() or try_to_unmap_one() (depending on
> i_mmap_rwsem) gets to the pte first, it will try to mlock the page,
> then mlock_pte_range() mlock it a second time.  This is harder to
> reproduce, but a more serious race because it could leave the page
> unevictable indefinitely though the area is munlocked afterwards.
> Guard against it by setting the (inappropriate) VM_IO flag,
> and modifying mlock_vma_page() to decline such vmas.
> 
> Signed-off-by: Hugh Dickins <hughd@...gle.com>

Acked-by: Vlastimil Babka <vbabka@...e.cz>

> @@ -162,8 +230,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
>  	pgoff_t pgoff;
>  	int nr_pages;
>  	int ret = 0;
> -	int lock = !!(newflags & VM_LOCKED);
> -	vm_flags_t old_flags = vma->vm_flags;
> +	vm_flags_t oldflags = vma->vm_flags;
>  
>  	if (newflags == vma->vm_flags || (vma->vm_flags & VM_SPECIAL) ||

Nit: can use oldflags instead of vma->vm_flags above?