lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 13 Apr 2022 18:28:10 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     David Hildenbrand <david@...hat.com>, linux-kernel@...r.kernel.org
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        David Rientjes <rientjes@...gle.com>,
        Shakeel Butt <shakeelb@...gle.com>,
        John Hubbard <jhubbard@...dia.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Yang Shi <shy828301@...il.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Matthew Wilcox <willy@...radead.org>,
        Jann Horn <jannh@...gle.com>, Michal Hocko <mhocko@...nel.org>,
        Nadav Amit <namit@...are.com>, Rik van Riel <riel@...riel.com>,
        Roman Gushchin <guro@...com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Peter Xu <peterx@...hat.com>,
        Donald Dutile <ddutile@...hat.com>,
        Christoph Hellwig <hch@....de>,
        Oleg Nesterov <oleg@...hat.com>, Jan Kara <jack@...e.cz>,
        Liang Zhang <zhangliang5@...wei.com>,
        Pedro Gomes <pedrodemargomes@...il.com>,
        Oded Gabbay <oded.gabbay@...il.com>, linux-mm@...ck.org
Subject: Re: [PATCH v3 12/16] mm: remember exclusively mapped anonymous pages
 with PG_anon_exclusive

On 3/29/22 18:04, David Hildenbrand wrote:
> Let's mark exclusively mapped anonymous pages with PG_anon_exclusive as
> exclusive, and use that information to make GUP pins reliable and stay
> consistent with the page mapped into the page table even if the
> page table entry gets write-protected.
> 
> With that information at hand, we can extend our COW logic to always
> reuse anonymous pages that are exclusive. For anonymous pages that
> might be shared, the existing logic applies.
> 
> As already documented, PG_anon_exclusive is usually only expressive in
> combination with a page table entry. Especially PTE vs. PMD-mapped
> anonymous pages require more thought, some examples: due to mremap() we
> can easily have a single compound page PTE-mapped into multiple page tables
> exclusively in a single process -- multiple page table locks apply.
> Further, due to MADV_WIPEONFORK we might not necessarily write-protect
> all PTEs, and only some subpages might be pinned. Long story short: once
> PTE-mapped, we have to track information about exclusivity per sub-page,
> but until then, we can just track it for the compound page in the head
> page and not having to update a whole bunch of subpages all of the time
> for a simple PMD mapping of a THP.
> 
> For simplicity, this commit mostly talks about "anonymous pages", while
> it's for THP actually "the part of an anonymous folio referenced via
> a page table entry".
> 
> To not spill PG_anon_exclusive code all over the mm code-base, we let
> the anon rmap code to handle all PG_anon_exclusive logic it can easily
> handle.
> 
> If a writable, present page table entry points at an anonymous (sub)page,
> that (sub)page must be PG_anon_exclusive. If GUP wants to take a reliably
> pin (FOLL_PIN) on an anonymous page references via a present
> page table entry, it must only pin if PG_anon_exclusive is set for the
> mapped (sub)page.
> 
> This commit doesn't adjust GUP, so this is only implicitly handled for
> FOLL_WRITE, follow-up commits will teach GUP to also respect it for
> FOLL_PIN without !FOLL_WRITE, to make all GUP pins of anonymous pages

	   without FOLL_WRITE ?

> fully reliable.

<snip>

> @@ -202,11 +203,26 @@ static inline int is_writable_migration_entry(swp_entry_t entry)
>  	return unlikely(swp_type(entry) == SWP_MIGRATION_WRITE);
>  }
>  
> +static inline int is_readable_migration_entry(swp_entry_t entry)
> +{
> +	return unlikely(swp_type(entry) == SWP_MIGRATION_READ);
> +}
> +
> +static inline int is_readable_exclusive_migration_entry(swp_entry_t entry)
> +{
> +	return unlikely(swp_type(entry) == SWP_MIGRATION_READ_EXCLUSIVE);
> +}

This one seems to be missing a !CONFIG_MIGRATION counterpart. Although the
only caller __split_huge_pmd_locked() probably indirectly only exists with
CONFIG_MIGRATION so it's not an immediate issue.  (THP selects COMPACTION
selects MIGRATION)

<snip>

> @@ -3035,10 +3083,19 @@ void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
>  
>  	flush_cache_range(vma, address, address + HPAGE_PMD_SIZE);
>  	pmdval = pmdp_invalidate(vma, address, pvmw->pmd);
> +
> +	anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
> +	if (anon_exclusive && page_try_share_anon_rmap(page)) {
> +		set_pmd_at(mm, address, pvmw->pmd, pmdval);
> +		return;

I am admittedly not too familiar with this code, but looks like this means
we fail to migrate the THP, right? But we don't seem to be telling the
caller, which is try_to_migrate_one(), so it will continue and not terminate
the walk and return false?

> +	}
> +
>  	if (pmd_dirty(pmdval))
>  		set_page_dirty(page);
>  	if (pmd_write(pmdval))
>  		entry = make_writable_migration_entry(page_to_pfn(page));
> +	else if (anon_exclusive)
> +		entry = make_readable_exclusive_migration_entry(page_to_pfn(page));
>  	else
>  		entry = make_readable_migration_entry(page_to_pfn(page));
>  	pmdswp = swp_entry_to_pmd(entry);

<snip>

> @@ -1918,6 +1955,15 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>  				page_vma_mapped_walk_done(&pvmw);
>  				break;
>  			}
> +			VM_BUG_ON_PAGE(pte_write(pteval) && folio_test_anon(folio) &&
> +				       !anon_exclusive, subpage);
> +			if (anon_exclusive &&
> +			    page_try_share_anon_rmap(subpage)) {
> +				set_pte_at(mm, address, pvmw.pte, pteval);
> +				ret = false;
> +				page_vma_mapped_walk_done(&pvmw);
> +				break;
> +			}

Yeah for the PTE version it seems to do what I'd expect.

>  			/*
>  			 * Store the pfn of the page in a special migration

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ