lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 4 Apr 2023 17:21:31 -0400
From:   Peter Xu <peterx@...hat.com>
To:     David Stevens <stevensd@...omium.org>
Cc:     linux-mm@...ck.org, Hugh Dickins <hughd@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        "Kirill A . Shutemov" <kirill@...temov.name>,
        Yang Shi <shy828301@...il.com>,
        David Hildenbrand <david@...hat.com>,
        Jiaqi Yan <jiaqiyan@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 4/4] mm/khugepaged: maintain page cache uptodate flag

On Tue, Apr 04, 2023 at 09:01:17PM +0900, David Stevens wrote:
> From: David Stevens <stevensd@...omium.org>
> 
> Make sure that collapse_file doesn't interfere with checking the
> uptodate flag in the page cache by only inserting hpage into the page
> cache after it has been updated and marked uptodate. This is achieved by
> simply not replacing present pages with hpage when iterating over the
> target range.
> 
> The present pages are already locked, so replacing them with the locked
> hpage before the collapse is finalized is unnecessary. However, it is
> necessary to stop freezing the present pages after validating them,
> since leaving long-term frozen pages in the page cache can lead to
> deadlocks. Simply checking the reference count is sufficient to ensure
> that there are no long-term references hanging around that would the
> collapse would break. Similar to hpage, there is no reason that the
> present pages actually need to be frozen in addition to being locked.
> 
> This fixes a race where folio_seek_hole_data would mistake hpage for
> an fallocated but unwritten page. This race is visible to userspace via
> data temporarily disappearing from SEEK_DATA/SEEK_HOLE. This also fixes
> a similar race where pages could temporarily disappear from mincore.
> 
> Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
> Signed-off-by: David Stevens <stevensd@...omium.org>
> ---
>  mm/khugepaged.c | 79 ++++++++++++++++++-------------------------------
>  1 file changed, 29 insertions(+), 50 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 7679551e9540..a19aa140fd52 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1855,17 +1855,18 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff,
>   *
>   * Basic scheme is simple, details are more complex:
>   *  - allocate and lock a new huge page;
> - *  - scan page cache replacing old pages with the new one
> + *  - scan page cache, locking old pages
>   *    + swap/gup in pages if necessary;
> - *    + keep old pages around in case rollback is required;
> + *  - copy data to new page
> + *  - handle shmem holes
> + *    + re-validate that holes weren't filled by someone else
> + *    + check for userfaultfd

PS: some of the changes may belong to previous patch here, but not
necessary to repost only for this, just in case there'll be a new one.

>   *  - finalize updates to the page cache;
>   *  - if replacing succeeds:
> - *    + copy data over;
> - *    + free old pages;
>   *    + unlock huge page;
> + *    + free old pages;
>   *  - if replacing failed;
> - *    + put all pages back and unfreeze them;
> - *    + restore gaps in the page cache;
> + *    + unlock old pages
>   *    + unlock and free huge page;
>   */
>  static int collapse_file(struct mm_struct *mm, unsigned long addr,
> @@ -1913,12 +1914,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>  		}
>  	} while (1);
>  
> -	/*
> -	 * At this point the hpage is locked and not up-to-date.
> -	 * It's safe to insert it into the page cache, because nobody would
> -	 * be able to map it or use it in another way until we unlock it.
> -	 */
> -
>  	xas_set(&xas, start);
>  	for (index = start; index < end; index++) {
>  		page = xas_next(&xas);
> @@ -2076,12 +2071,16 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>  		VM_BUG_ON_PAGE(page != xas_load(&xas), page);
>  
>  		/*
> -		 * The page is expected to have page_count() == 3:
> +		 * We control three references to the page:
>  		 *  - we hold a pin on it;
>  		 *  - one reference from page cache;
>  		 *  - one from isolate_lru_page;
> +		 * If those are the only references, then any new usage of the
> +		 * page will have to fetch it from the page cache. That requires
> +		 * locking the page to handle truncate, so any new usage will be
> +		 * blocked until we unlock page after collapse/during rollback.
>  		 */
> -		if (!page_ref_freeze(page, 3)) {
> +		if (page_count(page) != 3) {
>  			result = SCAN_PAGE_COUNT;
>  			xas_unlock_irq(&xas);
>  			putback_lru_page(page);

Personally I don't see anything wrong with this change to resolve the dead
lock.  E.g. fast gup race right before unmapping the pgtables seems fine,
since we'll just bail out with >3 refcounts (or fast-gup bails out by
checking pte changes).  Either way looks fine here.

So far it looks good to me, but that may not mean much per the history on
what I can overlook.  It'll be always good to hear from Hugh and others.

-- 
Peter Xu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ