lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 25 Nov 2019 12:36:11 +0300
From:   "Kirill A. Shutemov" <kirill@...temov.name>
To:     Yang Shi <yang.shi@...ux.alibaba.com>
Cc:     hughd@...gle.com, kirill.shutemov@...ux.intel.com,
        aarcange@...hat.com, akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] mm: shmem: allow split THP when truncating THP
 partially

On Sat, Nov 23, 2019 at 09:05:32AM +0800, Yang Shi wrote:
> Currently when truncating shmem file, if the range is partial of THP
> (start or end is in the middle of THP), the pages actually will just get
> cleared rather than being freed unless the range cover the whole THP.
> Even though all the subpages are truncated (randomly or sequentially),
> the THP may still be kept in page cache.  This might be fine for some
> usecases which prefer preserving THP.
> 
> But, when doing balloon inflation in QEMU, QEMU actually does hole punch
> or MADV_DONTNEED in base page size granulairty if hugetlbfs is not used.
> So, when using shmem THP as memory backend QEMU inflation actually doesn't
> work as expected since it doesn't free memory.  But, the inflation
> usecase really needs get the memory freed.  Anonymous THP will not get
> freed right away too but it will be freed eventually when all subpages are
> unmapped, but shmem THP would still stay in page cache.
> 
> To protect the usecases which may prefer preserving THP, introduce a
> new fallocate mode: FALLOC_FL_SPLIT_HPAGE, which means spltting THP is
> preferred behavior if truncating partial THP.  This mode just makes
> sense to tmpfs for the time being.

We need to clarify interaction with khugepaged. This implementation
doesn't do anything to prevent khugepaged from collapsing the range back
to THP just after the split.

> @@ -976,8 +1022,31 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>  			}
>  			unlock_page(page);
>  		}
> +rescan_split:
>  		pagevec_remove_exceptionals(&pvec);
>  		pagevec_release(&pvec);
> +
> +		if (split && PageTransCompound(page)) {
> +			/* The THP may get freed under us */
> +			if (!get_page_unless_zero(compound_head(page)))
> +				goto rescan_out;
> +
> +			lock_page(page);
> +
> +			/*
> +			 * The extra pins from page cache lookup have been
> +			 * released by pagevec_release().
> +			 */
> +			if (!split_huge_page(page)) {
> +				unlock_page(page);
> +				put_page(page);
> +				/* Re-look up page cache from current index */
> +				goto again;
> +			}
> +			unlock_page(page);
> +			put_page(page);
> +		}
> +rescan_out:
>  		index++;
>  	}

Doing get_page_unless_zero() just after you've dropped the pin for the
page looks very suboptimal.

-- 
 Kirill A. Shutemov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ