[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33768a7e-837d-3bcd-fb98-19727921d6fd@linux.alibaba.com>
Date: Tue, 4 Feb 2020 15:27:25 -0800
From: Yang Shi <yang.shi@...ux.alibaba.com>
To: Hugh Dickins <hughd@...gle.com>
Cc: kirill.shutemov@...ux.intel.com, aarcange@...hat.com,
akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP
partially
On 1/14/20 11:28 AM, Yang Shi wrote:
>
>
> On 12/4/19 4:15 PM, Hugh Dickins wrote:
>> On Wed, 4 Dec 2019, Yang Shi wrote:
>>
>>> Currently when truncating shmem file, if the range is partial of THP
>>> (start or end is in the middle of THP), the pages actually will just
>>> get
>>> cleared rather than being freed unless the range cover the whole THP.
>>> Even though all the subpages are truncated (randomly or sequentially),
>>> the THP may still be kept in page cache. This might be fine for some
>>> usecases which prefer preserving THP.
>>>
>>> But, when doing balloon inflation in QEMU, QEMU actually does hole
>>> punch
>>> or MADV_DONTNEED in base page size granulairty if hugetlbfs is not
>>> used.
>>> So, when using shmem THP as memory backend QEMU inflation actually
>>> doesn't
>>> work as expected since it doesn't free memory. But, the inflation
>>> usecase really needs get the memory freed. Anonymous THP will not get
>>> freed right away too but it will be freed eventually when all
>>> subpages are
>>> unmapped, but shmem THP would still stay in page cache.
>>>
>>> Split THP right away when doing partial hole punch, and if split fails
>>> just clear the page so that read to the hole punched area would return
>>> zero.
>>>
>>> Cc: Hugh Dickins <hughd@...gle.com>
>>> Cc: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
>>> Cc: Andrea Arcangeli <aarcange@...hat.com>
>>> Signed-off-by: Yang Shi <yang.shi@...ux.alibaba.com>
>>> ---
>>> v2: * Adopted the comment from Kirill.
>>> * Dropped fallocate mode flag, THP split is the default behavior.
>>> * Blended Huge's implementation with my v1 patch. TBH I'm not
>>> very keen to
>>> Hugh's find_get_entries() hack (basically neutral), but
>>> without that hack
>> Thanks for giving it a try. I'm not neutral about my find_get_entries()
>> hack: it surely had to go (without it, I'd have just pushed my own
>> patch).
>> I've not noticed anything wrong with your patch, and it's in the right
>> direction, but I'm still not thrilled with it. I also remember that I
>> got the looping wrong in my first internal attempt (fixed in what I
>> sent),
>> and need to be very sure of the try-again-versus-move-on-to-next
>> conditions
>> before agreeing to anything. No rush, I'll come back to this in days or
>> month ahead: I'll try to find a less gotoey blend of yours and mine.
>
> Hi Hugh,
>
> Any update on this one?
>
> Thanks,
> Yang
Hi Hugh,
Ping. Any comment on this? I really hope it can make v5.7.
Thanks,
Yang
>
>>
>> Hugh
>>
>>> we have to rely on pagevec_release() to release extra pins
>>> and play with
>>> goto. This version does in this way. The patch is bigger than
>>> Hugh's due
>>> to extra comments to make the flow clear.
>>>
>>> mm/shmem.c | 120
>>> ++++++++++++++++++++++++++++++++++++++++++-------------------
>>> 1 file changed, 83 insertions(+), 37 deletions(-)
>>>
>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>> index 220be9f..1ae0c7f 100644
>>> --- a/mm/shmem.c
>>> +++ b/mm/shmem.c
>>> @@ -806,12 +806,15 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> long nr_swaps_freed = 0;
>>> pgoff_t index;
>>> int i;
>>> + bool split = false;
>>> + struct page *page = NULL;
>>> if (lend == -1)
>>> end = -1; /* unsigned, so actually very big */
>>> pagevec_init(&pvec);
>>> index = start;
>>> +retry:
>>> while (index < end) {
>>> pvec.nr = find_get_entries(mapping, index,
>>> min(end - index, (pgoff_t)PAGEVEC_SIZE),
>>> @@ -819,7 +822,8 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> if (!pvec.nr)
>>> break;
>>> for (i = 0; i < pagevec_count(&pvec); i++) {
>>> - struct page *page = pvec.pages[i];
>>> + split = false;
>>> + page = pvec.pages[i];
>>> index = indices[i];
>>> if (index >= end)
>>> @@ -838,23 +842,24 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> if (!trylock_page(page))
>>> continue;
>>> - if (PageTransTail(page)) {
>>> - /* Middle of THP: zero out the page */
>>> - clear_highpage(page);
>>> - unlock_page(page);
>>> - continue;
>>> - } else if (PageTransHuge(page)) {
>>> - if (index == round_down(end, HPAGE_PMD_NR)) {
>>> + if (PageTransCompound(page) && !unfalloc) {
>>> + if (PageHead(page) &&
>>> + index != round_down(end, HPAGE_PMD_NR)) {
>>> /*
>>> - * Range ends in the middle of THP:
>>> - * zero out the page
>>> + * Fall through when punching whole
>>> + * THP.
>>> */
>>> - clear_highpage(page);
>>> - unlock_page(page);
>>> - continue;
>>> + index += HPAGE_PMD_NR - 1;
>>> + i += HPAGE_PMD_NR - 1;
>>> + } else {
>>> + /*
>>> + * Split THP for any partial hole
>>> + * punch.
>>> + */
>>> + get_page(page);
>>> + split = true;
>>> + goto split;
>>> }
>>> - index += HPAGE_PMD_NR - 1;
>>> - i += HPAGE_PMD_NR - 1;
>>> }
>>> if (!unfalloc || !PageUptodate(page)) {
>>> @@ -866,9 +871,29 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> }
>>> unlock_page(page);
>>> }
>>> +split:
>>> pagevec_remove_exceptionals(&pvec);
>>> pagevec_release(&pvec);
>>> cond_resched();
>>> +
>>> + if (split) {
>>> + /*
>>> + * The pagevec_release() released all extra pins
>>> + * from pagevec lookup. And we hold an extra pin
>>> + * and still have the page locked under us.
>>> + */
>>> + if (!split_huge_page(page)) {
>>> + unlock_page(page);
>>> + put_page(page);
>>> + /* Re-lookup page cache from current index */
>>> + goto retry;
>>> + }
>>> +
>>> + /* Fail to split THP, move to next index */
>>> + unlock_page(page);
>>> + put_page(page);
>>> + }
>>> +
>>> index++;
>>> }
>>> @@ -901,6 +926,7 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> return;
>>> index = start;
>>> +again:
>>> while (index < end) {
>>> cond_resched();
>>> @@ -916,7 +942,8 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> continue;
>>> }
>>> for (i = 0; i < pagevec_count(&pvec); i++) {
>>> - struct page *page = pvec.pages[i];
>>> + split = false;
>>> + page = pvec.pages[i];
>>> index = indices[i];
>>> if (index >= end)
>>> @@ -936,30 +963,24 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> lock_page(page);
>>> - if (PageTransTail(page)) {
>>> - /* Middle of THP: zero out the page */
>>> - clear_highpage(page);
>>> - unlock_page(page);
>>> - /*
>>> - * Partial thp truncate due 'start' in middle
>>> - * of THP: don't need to look on these pages
>>> - * again on !pvec.nr restart.
>>> - */
>>> - if (index != round_down(end, HPAGE_PMD_NR))
>>> - start++;
>>> - continue;
>>> - } else if (PageTransHuge(page)) {
>>> - if (index == round_down(end, HPAGE_PMD_NR)) {
>>> + if (PageTransCompound(page) && !unfalloc) {
>>> + if (PageHead(page) &&
>>> + index != round_down(end, HPAGE_PMD_NR)) {
>>> /*
>>> - * Range ends in the middle of THP:
>>> - * zero out the page
>>> + * Fall through when punching whole
>>> + * THP.
>>> */
>>> - clear_highpage(page);
>>> - unlock_page(page);
>>> - continue;
>>> + index += HPAGE_PMD_NR - 1;
>>> + i += HPAGE_PMD_NR - 1;
>>> + } else {
>>> + /*
>>> + * Split THP for any partial hole
>>> + * punch.
>>> + */
>>> + get_page(page);
>>> + split = true;
>>> + goto rescan_split;
>>> }
>>> - index += HPAGE_PMD_NR - 1;
>>> - i += HPAGE_PMD_NR - 1;
>>> }
>>> if (!unfalloc || !PageUptodate(page)) {
>>> @@ -976,8 +997,33 @@ static void shmem_undo_range(struct inode
>>> *inode, loff_t lstart, loff_t lend,
>>> }
>>> unlock_page(page);
>>> }
>>> +rescan_split:
>>> pagevec_remove_exceptionals(&pvec);
>>> pagevec_release(&pvec);
>>> +
>>> + if (split) {
>>> + /*
>>> + * The pagevec_release() released all extra pins
>>> + * from pagevec lookup. And we hold an extra pin
>>> + * and still have the page locked under us.
>>> + */
>>> + if (!split_huge_page(page)) {
>>> + unlock_page(page);
>>> + put_page(page);
>>> + /* Re-lookup page cache from current index */
>>> + goto again;
>>> + }
>>> +
>>> + /*
>>> + * Split fail, clear the page then move to next
>>> + * index.
>>> + */
>>> + clear_highpage(page);
>>> +
>>> + unlock_page(page);
>>> + put_page(page);
>>> + }
>>> +
>>> index++;
>>> }
>>> --
>>> 1.8.3.1
>>>
>>>
>
Powered by blists - more mailing lists