linux-kernel - Re: [PATCH 1/2] mm: rmap: make hugetlb pages participate in _nr_pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230307215420.GA59222@monkey>
Date:   Tue, 7 Mar 2023 13:54:20 -0800
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     James Houghton <jthoughton@...gle.com>
Cc:     Hugh Dickins <hughd@...gle.com>,
        Muchun Song <songmuchun@...edance.com>,
        Peter Xu <peterx@...hat.com>,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        David Hildenbrand <david@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Jiaqi Yan <jiaqiyan@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] mm: rmap: make hugetlb pages participate in
 _nr_pages_mapped

On 03/06/23 23:00, James Houghton wrote:
> For compound mappings (compound=true), _nr_pages_mapped will now be
> incremented by COMPOUND_MAPPED when the first compound mapping is
> created.

This sentence makes it sound like incrementing by COMPOUND_MAPPED for
compound pages is introduced by this patch.  Rather, it is just for
hugetlb (now always) compound mappings.   Perhaps change that to read:
For hugetlb mappings ...

> For small mappings, _nr_pages_mapped is incremented by 1 when the
> particular small page is mapped for the first time. This is incompatible
> with HPageVmemmapOptimize()ed folios, as most of the tail page structs
> will be mapped read-only.
> 
> Currently HugeTLB always passes compound=true, but in the future,
> HugeTLB pages may be mapped with small mappings.
> 
> To implement this change:
>  1. Replace most of HugeTLB's calls to page_dup_file_rmap() with
>     page_add_file_rmap(). The call in copy_hugetlb_page_range() is kept.
>  2. Update page_add_file_rmap() and page_remove_rmap() to support
>     HugeTLB folios.
>  3. Update hugepage_add_anon_rmap() and hugepage_add_new_anon_rmap() to
>     also increment _nr_pages_mapped properly.
> 
> With these changes, folio_large_is_mapped() no longer needs to check
> _entire_mapcount.
> 
> HugeTLB doesn't use LRU or mlock, so page_add_file_rmap() and
> page_remove_rmap() excludes those pieces. It is also important that
> the folio_test_pmd_mappable() check is removed (or changed), as it's
> possible to have a HugeTLB page whose order is not >= HPAGE_PMD_ORDER,
> like arm64's CONT_PTE_SIZE HugeTLB pages.
> 
> This patch limits HugeTLB pages to 16G in size. That limit can be
> increased if COMPOUND_MAPPED is raised.
> 
> Signed-off-by: James Houghton <jthoughton@...gle.com>
> 

Thanks!

This is a step in the direction of having hugetlb use the same mapcount
scheme as elsewhere.  As you mention, with this in place future mapcount
changes should mostly 'just work' for hugetlb.

Because of this,
Acked-by: Mike Kravetz <mike.kravetz@...cle.com>

I have a few nits below, and I'm sure others will chime in later.

> diff --git a/mm/rmap.c b/mm/rmap.c
> index ba901c416785..4a975429b91a 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1316,19 +1316,21 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>  	int nr = 0, nr_pmdmapped = 0;
>  	bool first;
>  
> -	VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
> +	VM_BUG_ON_PAGE(compound && !PageTransHuge(page)
> +				&& !folio_test_hugetlb(folio), page);
>  
>  	/* Is page being mapped by PTE? Is this its first map to be added? */
>  	if (likely(!compound)) {
> +		if (unlikely(folio_test_hugetlb(folio)))
> +			VM_BUG_ON_PAGE(HPageVmemmapOptimized(&folio->page),
> +				       page);
>  		first = atomic_inc_and_test(&page->_mapcount);
>  		nr = first;
>  		if (first && folio_test_large(folio)) {
>  			nr = atomic_inc_return_relaxed(mapped);
>  			nr = (nr < COMPOUND_MAPPED);
>  		}
> -	} else if (folio_test_pmd_mappable(folio)) {
> -		/* That test is redundant: it's for safety or to optimize out */

I 'think' removing this check is OK.  It would seem that the caller
knows if the folio is mappable.  If we want a similar test, we might be
able to use something like:

	arch_hugetlb_valid_size(folio_size(folio))

> -
> +	} else {
>  		first = atomic_inc_and_test(&folio->_entire_mapcount);
>  		if (first) {
>  			nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
> @@ -1345,6 +1347,9 @@ void page_add_file_rmap(struct page *page, struct vm_area_struct *vma,
>  		}
>  	}
>  
> +	if (folio_test_hugetlb(folio))
> +		return;

IMO, a comment saying hugetlb is special and does not participate in lru
would be appropriate here.

> +
>  	if (nr_pmdmapped)
>  		__lruvec_stat_mod_folio(folio, folio_test_swapbacked(folio) ?
>  			NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr_pmdmapped);
> @@ -1373,24 +1378,18 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
>  
>  	VM_BUG_ON_PAGE(compound && !PageHead(page), page);
>  
> -	/* Hugetlb pages are not counted in NR_*MAPPED */
> -	if (unlikely(folio_test_hugetlb(folio))) {
> -		/* hugetlb pages are always mapped with pmds */
> -		atomic_dec(&folio->_entire_mapcount);
> -		return;
> -	}
> -
>  	/* Is page being unmapped by PTE? Is this its last map to be removed? */
>  	if (likely(!compound)) {
> +		if (unlikely(folio_test_hugetlb(folio)))
> +			VM_BUG_ON_PAGE(HPageVmemmapOptimized(&folio->page),
> +				       page);
>  		last = atomic_add_negative(-1, &page->_mapcount);
>  		nr = last;
>  		if (last && folio_test_large(folio)) {
>  			nr = atomic_dec_return_relaxed(mapped);
>  			nr = (nr < COMPOUND_MAPPED);
>  		}
> -	} else if (folio_test_pmd_mappable(folio)) {
> -		/* That test is redundant: it's for safety or to optimize out */
> -
> +	} else {
>  		last = atomic_add_negative(-1, &folio->_entire_mapcount);
>  		if (last) {
>  			nr = atomic_sub_return_relaxed(COMPOUND_MAPPED, mapped);
> @@ -1407,6 +1406,9 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
>  		}
>  	}
>  
> +	if (folio_test_hugetlb(folio))
> +		return;

Same as above in page_add_file_rmap.

> +
>  	if (nr_pmdmapped) {
>  		if (folio_test_anon(folio))
>  			idx = NR_ANON_THPS;
> @@ -2541,9 +2543,11 @@ void hugepage_add_anon_rmap(struct page *page, struct vm_area_struct *vma,
>  	first = atomic_inc_and_test(&folio->_entire_mapcount);
>  	VM_BUG_ON_PAGE(!first && (flags & RMAP_EXCLUSIVE), page);
>  	VM_BUG_ON_PAGE(!first && PageAnonExclusive(page), page);
> -	if (first)
> +	if (first) {
> +		atomic_add(COMPOUND_MAPPED, &folio->_nr_pages_mapped);
>  		__page_set_anon_rmap(folio, page, vma, address,
>  				     !!(flags & RMAP_EXCLUSIVE));
> +	}
>  }
>  
>  void hugepage_add_new_anon_rmap(struct folio *folio,
> @@ -2552,6 +2556,7 @@ void hugepage_add_new_anon_rmap(struct folio *folio,
>  	BUG_ON(address < vma->vm_start || address >= vma->vm_end);
>  	/* increment count (starts at -1) */
>  	atomic_set(&folio->_entire_mapcount, 0);
> +	atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED);
>  	folio_clear_hugetlb_restore_reserve(folio);
>  	__page_set_anon_rmap(folio, &folio->page, vma, address, 1);
>  }

Should we look at perhaps modifying page_add_anon_rmap and
folio_add_new_anon_rmap as well?
-- 
Mike Kravetz