[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c0f134f1-caf8-0baa-5f0e-87f2c530c631@linux.alibaba.com>
Date: Wed, 4 Dec 2019 16:50:49 -0800
From: Yang Shi <yang.shi@...ux.alibaba.com>
To: Hugh Dickins <hughd@...gle.com>
Cc: kirill.shutemov@...ux.intel.com, aarcange@...hat.com,
akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP
partially
On 12/4/19 4:15 PM, Hugh Dickins wrote:
> On Wed, 4 Dec 2019, Yang Shi wrote:
>
>> Currently when truncating shmem file, if the range is partial of THP
>> (start or end is in the middle of THP), the pages actually will just get
>> cleared rather than being freed unless the range cover the whole THP.
>> Even though all the subpages are truncated (randomly or sequentially),
>> the THP may still be kept in page cache. This might be fine for some
>> usecases which prefer preserving THP.
>>
>> But, when doing balloon inflation in QEMU, QEMU actually does hole punch
>> or MADV_DONTNEED in base page size granulairty if hugetlbfs is not used.
>> So, when using shmem THP as memory backend QEMU inflation actually doesn't
>> work as expected since it doesn't free memory. But, the inflation
>> usecase really needs get the memory freed. Anonymous THP will not get
>> freed right away too but it will be freed eventually when all subpages are
>> unmapped, but shmem THP would still stay in page cache.
>>
>> Split THP right away when doing partial hole punch, and if split fails
>> just clear the page so that read to the hole punched area would return
>> zero.
>>
>> Cc: Hugh Dickins <hughd@...gle.com>
>> Cc: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
>> Cc: Andrea Arcangeli <aarcange@...hat.com>
>> Signed-off-by: Yang Shi <yang.shi@...ux.alibaba.com>
>> ---
>> v2: * Adopted the comment from Kirill.
>> * Dropped fallocate mode flag, THP split is the default behavior.
>> * Blended Huge's implementation with my v1 patch. TBH I'm not very keen to
>> Hugh's find_get_entries() hack (basically neutral), but without that hack
> Thanks for giving it a try. I'm not neutral about my find_get_entries()
> hack: it surely had to go (without it, I'd have just pushed my own patch).
We are on the same page :-)
> I've not noticed anything wrong with your patch, and it's in the right
> direction, but I'm still not thrilled with it. I also remember that I
> got the looping wrong in my first internal attempt (fixed in what I sent),
> and need to be very sure of the try-again-versus-move-on-to-next conditions
> before agreeing to anything. No rush, I'll come back to this in days or
> month ahead: I'll try to find a less gotoey blend of yours and mine.
Yes, those goto look a little bit convoluted so I added a lot comments
to improve the readability. Thanks for your time.
>
> Hugh
>
>> we have to rely on pagevec_release() to release extra pins and play with
>> goto. This version does in this way. The patch is bigger than Hugh's due
>> to extra comments to make the flow clear.
>>
>> mm/shmem.c | 120 ++++++++++++++++++++++++++++++++++++++++++-------------------
>> 1 file changed, 83 insertions(+), 37 deletions(-)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 220be9f..1ae0c7f 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -806,12 +806,15 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>> long nr_swaps_freed = 0;
>> pgoff_t index;
>> int i;
>> + bool split = false;
>> + struct page *page = NULL;
>>
>> if (lend == -1)
>> end = -1; /* unsigned, so actually very big */
>>
>> pagevec_init(&pvec);
>> index = start;
>> +retry:
>> while (index < end) {
>> pvec.nr = find_get_entries(mapping, index,
>> min(end - index, (pgoff_t)PAGEVEC_SIZE),
>> @@ -819,7 +822,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>> if (!pvec.nr)
>> break;
>> for (i = 0; i < pagevec_count(&pvec); i++) {
>> - struct page *page = pvec.pages[i];
>> + split = false;
>> + page = pvec.pages[i];
>>
>> index = indices[i];
>> if (index >= end)
>> @@ -838,23 +842,24 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>> if (!trylock_page(page))
>> continue;
>>
>> - if (PageTransTail(page)) {
>> - /* Middle of THP: zero out the page */
>> - clear_highpage(page);
>> - unlock_page(page);
>> - continue;
>> - } else if (PageTransHuge(page)) {
>> - if (index == round_down(end, HPAGE_PMD_NR)) {
>> + if (PageTransCompound(page) && !unfalloc) {
>> + if (PageHead(page) &&
>> + index != round_down(end, HPAGE_PMD_NR)) {
>> /*
>> - * Range ends in the middle of THP:
>> - * zero out the page
>> + * Fall through when punching whole
>> + * THP.
>> */
>> - clear_highpage(page);
>> - unlock_page(page);
>> - continue;
>> + index += HPAGE_PMD_NR - 1;
>> + i += HPAGE_PMD_NR - 1;
>> + } else {
>> + /*
>> + * Split THP for any partial hole
>> + * punch.
>> + */
>> + get_page(page);
>> + split = true;
>> + goto split;
>> }
>> - index += HPAGE_PMD_NR - 1;
>> - i += HPAGE_PMD_NR - 1;
>> }
>>
>> if (!unfalloc || !PageUptodate(page)) {
>> @@ -866,9 +871,29 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>> }
>> unlock_page(page);
>> }
>> +split:
>> pagevec_remove_exceptionals(&pvec);
>> pagevec_release(&pvec);
>> cond_resched();
>> +
>> + if (split) {
>> + /*
>> + * The pagevec_release() released all extra pins
>> + * from pagevec lookup. And we hold an extra pin
>> + * and still have the page locked under us.
>> + */
>> + if (!split_huge_page(page)) {
>> + unlock_page(page);
>> + put_page(page);
>> + /* Re-lookup page cache from current index */
>> + goto retry;
>> + }
>> +
>> + /* Fail to split THP, move to next index */
>> + unlock_page(page);
>> + put_page(page);
>> + }
>> +
>> index++;
>> }
>>
>> @@ -901,6 +926,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>> return;
>>
>> index = start;
>> +again:
>> while (index < end) {
>> cond_resched();
>>
>> @@ -916,7 +942,8 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>> continue;
>> }
>> for (i = 0; i < pagevec_count(&pvec); i++) {
>> - struct page *page = pvec.pages[i];
>> + split = false;
>> + page = pvec.pages[i];
>>
>> index = indices[i];
>> if (index >= end)
>> @@ -936,30 +963,24 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>
>> lock_page(page);
>>
>> - if (PageTransTail(page)) {
>> - /* Middle of THP: zero out the page */
>> - clear_highpage(page);
>> - unlock_page(page);
>> - /*
>> - * Partial thp truncate due 'start' in middle
>> - * of THP: don't need to look on these pages
>> - * again on !pvec.nr restart.
>> - */
>> - if (index != round_down(end, HPAGE_PMD_NR))
>> - start++;
>> - continue;
>> - } else if (PageTransHuge(page)) {
>> - if (index == round_down(end, HPAGE_PMD_NR)) {
>> + if (PageTransCompound(page) && !unfalloc) {
>> + if (PageHead(page) &&
>> + index != round_down(end, HPAGE_PMD_NR)) {
>> /*
>> - * Range ends in the middle of THP:
>> - * zero out the page
>> + * Fall through when punching whole
>> + * THP.
>> */
>> - clear_highpage(page);
>> - unlock_page(page);
>> - continue;
>> + index += HPAGE_PMD_NR - 1;
>> + i += HPAGE_PMD_NR - 1;
>> + } else {
>> + /*
>> + * Split THP for any partial hole
>> + * punch.
>> + */
>> + get_page(page);
>> + split = true;
>> + goto rescan_split;
>> }
>> - index += HPAGE_PMD_NR - 1;
>> - i += HPAGE_PMD_NR - 1;
>> }
>>
>> if (!unfalloc || !PageUptodate(page)) {
>> @@ -976,8 +997,33 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>> }
>> unlock_page(page);
>> }
>> +rescan_split:
>> pagevec_remove_exceptionals(&pvec);
>> pagevec_release(&pvec);
>> +
>> + if (split) {
>> + /*
>> + * The pagevec_release() released all extra pins
>> + * from pagevec lookup. And we hold an extra pin
>> + * and still have the page locked under us.
>> + */
>> + if (!split_huge_page(page)) {
>> + unlock_page(page);
>> + put_page(page);
>> + /* Re-lookup page cache from current index */
>> + goto again;
>> + }
>> +
>> + /*
>> + * Split fail, clear the page then move to next
>> + * index.
>> + */
>> + clear_highpage(page);
>> +
>> + unlock_page(page);
>> + put_page(page);
>> + }
>> +
>> index++;
>> }
>>
>> --
>> 1.8.3.1
>>
>>
Powered by blists - more mailing lists