linux-kernel - Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP partially

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c0f134f1-caf8-0baa-5f0e-87f2c530c631@linux.alibaba.com>
Date:   Wed, 4 Dec 2019 16:50:49 -0800
From:   Yang Shi <yang.shi@...ux.alibaba.com>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     kirill.shutemov@...ux.intel.com, aarcange@...hat.com,
        akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP
 partially



On 12/4/19 4:15 PM, Hugh Dickins wrote:
> On Wed, 4 Dec 2019, Yang Shi wrote:
>
>> Currently when truncating shmem file, if the range is partial of THP
>> (start or end is in the middle of THP), the pages actually will just get
>> cleared rather than being freed unless the range cover the whole THP.
>> Even though all the subpages are truncated (randomly or sequentially),
>> the THP may still be kept in page cache.  This might be fine for some
>> usecases which prefer preserving THP.
>>
>> But, when doing balloon inflation in QEMU, QEMU actually does hole punch
>> or MADV_DONTNEED in base page size granulairty if hugetlbfs is not used.
>> So, when using shmem THP as memory backend QEMU inflation actually doesn't
>> work as expected since it doesn't free memory.  But, the inflation
>> usecase really needs get the memory freed.  Anonymous THP will not get
>> freed right away too but it will be freed eventually when all subpages are
>> unmapped, but shmem THP would still stay in page cache.
>>
>> Split THP right away when doing partial hole punch, and if split fails
>> just clear the page so that read to the hole punched area would return
>> zero.
>>
>> Cc: Hugh Dickins <hughd@...gle.com>
>> Cc: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
>> Cc: Andrea Arcangeli <aarcange@...hat.com>
>> Signed-off-by: Yang Shi <yang.shi@...ux.alibaba.com>
>> ---
>> v2: * Adopted the comment from Kirill.
>>      * Dropped fallocate mode flag, THP split is the default behavior.
>>      * Blended Huge's implementation with my v1 patch. TBH I'm not very keen to
>>        Hugh's find_get_entries() hack (basically neutral), but without that hack
> Thanks for giving it a try.  I'm not neutral about my find_get_entries()
> hack: it surely had to go (without it, I'd have just pushed my own patch).

We are on the same page :-）

> I've not noticed anything wrong with your patch, and it's in the right
> direction, but I'm still not thrilled with it.  I also remember that I
> got the looping wrong in my first internal attempt (fixed in what I sent),
> and need to be very sure of the try-again-versus-move-on-to-next conditions
> before agreeing to anything.  No rush, I'll come back to this in days or
> month ahead: I'll try to find a less gotoey blend of yours and mine.

Yes, those goto look a little bit convoluted so I added a lot comments 
to improve the readability. Thanks for your time.

>
> Hugh
>
>>        we have to rely on pagevec_release() to release extra pins and play with
>>        goto. This version does in this way. The patch is bigger than Hugh's due
>>        to extra comments to make the flow clear.
>>
>>   mm/shmem.c | 120 ++++++++++++++++++++++++++++++++++++++++++-------------------
>>   1 file changed, 83 insertions(+), 37 deletions(-)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 220be9f..1ae0c7f 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -806,12 +806,15 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   	long nr_swaps_freed = 0;
>>   	pgoff_t index;
>>   	int i;
>> +	bool split = false;
>> +	struct page *page = NULL;
>>   
>>   	if (lend == -1)
>>   		end = -1;	/* unsigned, so actually very big */
>>   
>>   	pagevec_init(&pvec);
>>   	index = start;
>> +retry:
>>   	while (index < end) {
>>   		pvec.nr = find_get_entries(mapping, index,
>>   			min(end - index, (pgoff_t)PAGEVEC_SIZE),
>> @@ -819,7 +822,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   		if (!pvec.nr)
>>   			break;
>>   		for (i = 0; i < pagevec_count(&pvec); i++) {
>> -			struct page *page = pvec.pages[i];
>> +			split = false;
>> +			page = pvec.pages[i];
>>   
>>   			index = indices[i];
>>   			if (index >= end)
>> @@ -838,23 +842,24 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   			if (!trylock_page(page))
>>   				continue;
>>   
>> -			if (PageTransTail(page)) {
>> -				/* Middle of THP: zero out the page */
>> -				clear_highpage(page);
>> -				unlock_page(page);
>> -				continue;
>> -			} else if (PageTransHuge(page)) {
>> -				if (index == round_down(end, HPAGE_PMD_NR)) {
>> +			if (PageTransCompound(page) && !unfalloc) {
>> +				if (PageHead(page) &&
>> +				    index != round_down(end, HPAGE_PMD_NR)) {
>>   					/*
>> -					 * Range ends in the middle of THP:
>> -					 * zero out the page
>> +					 * Fall through when punching whole
>> +					 * THP.
>>   					 */
>> -					clear_highpage(page);
>> -					unlock_page(page);
>> -					continue;
>> +					index += HPAGE_PMD_NR - 1;
>> +					i += HPAGE_PMD_NR - 1;
>> +				} else {
>> +					/*
>> +					 * Split THP for any partial hole
>> +					 * punch.
>> +					 */
>> +					get_page(page);
>> +					split = true;
>> +					goto split;
>>   				}
>> -				index += HPAGE_PMD_NR - 1;
>> -				i += HPAGE_PMD_NR - 1;
>>   			}
>>   
>>   			if (!unfalloc || !PageUptodate(page)) {
>> @@ -866,9 +871,29 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   			}
>>   			unlock_page(page);
>>   		}
>> +split:
>>   		pagevec_remove_exceptionals(&pvec);
>>   		pagevec_release(&pvec);
>>   		cond_resched();
>> +
>> +		if (split) {
>> +			/*
>> +			 * The pagevec_release() released all extra pins
>> +			 * from pagevec lookup.  And we hold an extra pin
>> +			 * and still have the page locked under us.
>> +			 */
>> +			if (!split_huge_page(page)) {
>> +				unlock_page(page);
>> +				put_page(page);
>> +				/* Re-lookup page cache from current index */
>> +				goto retry;
>> +			}
>> +
>> +			/* Fail to split THP, move to next index */
>> +			unlock_page(page);
>> +			put_page(page);
>> +		}
>> +
>>   		index++;
>>   	}
>>   
>> @@ -901,6 +926,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   		return;
>>   
>>   	index = start;
>> +again:
>>   	while (index < end) {
>>   		cond_resched();
>>   
>> @@ -916,7 +942,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   			continue;
>>   		}
>>   		for (i = 0; i < pagevec_count(&pvec); i++) {
>> -			struct page *page = pvec.pages[i];
>> +			split = false;
>> +			page = pvec.pages[i];
>>   
>>   			index = indices[i];
>>   			if (index >= end)
>> @@ -936,30 +963,24 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   
>>   			lock_page(page);
>>   
>> -			if (PageTransTail(page)) {
>> -				/* Middle of THP: zero out the page */
>> -				clear_highpage(page);
>> -				unlock_page(page);
>> -				/*
>> -				 * Partial thp truncate due 'start' in middle
>> -				 * of THP: don't need to look on these pages
>> -				 * again on !pvec.nr restart.
>> -				 */
>> -				if (index != round_down(end, HPAGE_PMD_NR))
>> -					start++;
>> -				continue;
>> -			} else if (PageTransHuge(page)) {
>> -				if (index == round_down(end, HPAGE_PMD_NR)) {
>> +			if (PageTransCompound(page) && !unfalloc) {
>> +				if (PageHead(page) &&
>> +				    index != round_down(end, HPAGE_PMD_NR)) {
>>   					/*
>> -					 * Range ends in the middle of THP:
>> -					 * zero out the page
>> +					 * Fall through when punching whole
>> +					 * THP.
>>   					 */
>> -					clear_highpage(page);
>> -					unlock_page(page);
>> -					continue;
>> +					index += HPAGE_PMD_NR - 1;
>> +					i += HPAGE_PMD_NR - 1;
>> +				} else {
>> +					/*
>> +					 * Split THP for any partial hole
>> +					 * punch.
>> +					 */
>> +					get_page(page);
>> +					split = true;
>> +					goto rescan_split;
>>   				}
>> -				index += HPAGE_PMD_NR - 1;
>> -				i += HPAGE_PMD_NR - 1;
>>   			}
>>   
>>   			if (!unfalloc || !PageUptodate(page)) {
>> @@ -976,8 +997,33 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   			}
>>   			unlock_page(page);
>>   		}
>> +rescan_split:
>>   		pagevec_remove_exceptionals(&pvec);
>>   		pagevec_release(&pvec);
>> +
>> +		if (split) {
>> +			/*
>> +			 * The pagevec_release() released all extra pins
>> +			 * from pagevec lookup.  And we hold an extra pin
>> +			 * and still have the page locked under us.
>> +			 */
>> +			if (!split_huge_page(page)) {
>> +				unlock_page(page);
>> +				put_page(page);
>> +				/* Re-lookup page cache from current index */
>> +				goto again;
>> +			}
>> +
>> +			/*
>> +			 * Split fail, clear the page then move to next
>> +			 * index.
>> +			 */
>> +			clear_highpage(page);
>> +
>> +			unlock_page(page);
>> +			put_page(page);
>> +		}
>> +
>>   		index++;
>>   	}
>>   
>> -- 
>> 1.8.3.1
>>
>>