[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd21f6a6-32a5-7d31-3bcd-4fc3f6cc0a84@linux.alibaba.com>
Date: Thu, 13 Feb 2020 16:38:01 -0800
From: Yang Shi <yang.shi@...ux.alibaba.com>
To: Hugh Dickins <hughd@...gle.com>, kirill.shutemov@...ux.intel.com
Cc: aarcange@...hat.com, akpm@...ux-foundation.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP
partially
Hi Kirill,
Would you please help review this patch? I don't know why Hugh didn't
response though I pinged him twice.
Thanks,
Yang
On 2/4/20 3:27 PM, Yang Shi wrote:
>
>
> On 1/14/20 11:28 AM, Yang Shi wrote:
>>
>>
>> On 12/4/19 4:15 PM, Hugh Dickins wrote:
>>> On Wed, 4 Dec 2019, Yang Shi wrote:
>>>
>>>> Currently when truncating shmem file, if the range is partial of THP
>>>> (start or end is in the middle of THP), the pages actually will
>>>> just get
>>>> cleared rather than being freed unless the range cover the whole THP.
>>>> Even though all the subpages are truncated (randomly or sequentially),
>>>> the THP may still be kept in page cache. This might be fine for some
>>>> usecases which prefer preserving THP.
>>>>
>>>> But, when doing balloon inflation in QEMU, QEMU actually does hole
>>>> punch
>>>> or MADV_DONTNEED in base page size granulairty if hugetlbfs is not
>>>> used.
>>>> So, when using shmem THP as memory backend QEMU inflation actually
>>>> doesn't
>>>> work as expected since it doesn't free memory. But, the inflation
>>>> usecase really needs get the memory freed. Anonymous THP will not get
>>>> freed right away too but it will be freed eventually when all
>>>> subpages are
>>>> unmapped, but shmem THP would still stay in page cache.
>>>>
>>>> Split THP right away when doing partial hole punch, and if split fails
>>>> just clear the page so that read to the hole punched area would return
>>>> zero.
>>>>
>>>> Cc: Hugh Dickins <hughd@...gle.com>
>>>> Cc: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
>>>> Cc: Andrea Arcangeli <aarcange@...hat.com>
>>>> Signed-off-by: Yang Shi <yang.shi@...ux.alibaba.com>
>>>> ---
>>>> v2: * Adopted the comment from Kirill.
>>>> * Dropped fallocate mode flag, THP split is the default behavior.
>>>> * Blended Huge's implementation with my v1 patch. TBH I'm not
>>>> very keen to
>>>> Hugh's find_get_entries() hack (basically neutral), but
>>>> without that hack
>>> Thanks for giving it a try. I'm not neutral about my
>>> find_get_entries()
>>> hack: it surely had to go (without it, I'd have just pushed my own
>>> patch).
>>> I've not noticed anything wrong with your patch, and it's in the right
>>> direction, but I'm still not thrilled with it. I also remember that I
>>> got the looping wrong in my first internal attempt (fixed in what I
>>> sent),
>>> and need to be very sure of the try-again-versus-move-on-to-next
>>> conditions
>>> before agreeing to anything. No rush, I'll come back to this in
>>> days or
>>> month ahead: I'll try to find a less gotoey blend of yours and mine.
>>
>> Hi Hugh,
>>
>> Any update on this one?
>>
>> Thanks,
>> Yang
>
> Hi Hugh,
>
> Ping. Any comment on this? I really hope it can make v5.7.
>
> Thanks,
> Yang
>
>>
>>>
>>> Hugh
>>>
>>>> we have to rely on pagevec_release() to release extra pins
>>>> and play with
>>>> goto. This version does in this way. The patch is bigger
>>>> than Hugh's due
>>>> to extra comments to make the flow clear.
>>>>
>>>> mm/shmem.c | 120
>>>> ++++++++++++++++++++++++++++++++++++++++++-------------------
>>>> 1 file changed, 83 insertions(+), 37 deletions(-)
>>>>
>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>> index 220be9f..1ae0c7f 100644
>>>> --- a/mm/shmem.c
>>>> +++ b/mm/shmem.c
>>>> @@ -806,12 +806,15 @@ static void shmem_undo_range(struct inode
>>>> *inode, loff_t lstart, loff_t lend,
>>>> long nr_swaps_freed = 0;
>>>> pgoff_t index;
>>>> int i;
>>>> + bool split = false;
>>>> + struct page *page = NULL;
>>>> if (lend == -1)
>>>> end = -1; /* unsigned, so actually very big */
>>>> pagevec_init(&pvec);
>>>> index = start;
>>>> +retry:
>>>> while (index < end) {
>>>> pvec.nr = find_get_entries(mapping, index,
>>>> min(end - index, (pgoff_t)PAGEVEC_SIZE),
>>>> @@ -819,7 +822,8 @@ static void shmem_undo_range(struct inode
>>>> *inode, loff_t lstart, loff_t lend,
>>>> if (!pvec.nr)
>>>> break;
>>>> for (i = 0; i < pagevec_count(&pvec); i++) {
>>>> - struct page *page = pvec.pages[i];
>>>> + split = false;
>>>> + page = pvec.pages[i];
>>>> index = indices[i];
>>>> if (index >= end)
>>>> @@ -838,23 +842,24 @@ static void shmem_undo_range(struct inode
>>>> *inode, loff_t lstart, loff_t lend,
>>>> if (!trylock_page(page))
>>>> continue;
>>>> - if (PageTransTail(page)) {
>>>> - /* Middle of THP: zero out the page */
>>>> - clear_highpage(page);
>>>> - unlock_page(page);
>>>> - continue;
>>>> - } else if (PageTransHuge(page)) {
>>>> - if (index == round_down(end, HPAGE_PMD_NR)) {
>>>> + if (PageTransCompound(page) && !unfalloc) {
>>>> + if (PageHead(page) &&
>>>> + index != round_down(end, HPAGE_PMD_NR)) {
>>>> /*
>>>> - * Range ends in the middle of THP:
>>>> - * zero out the page
>>>> + * Fall through when punching whole
>>>> + * THP.
>>>> */
>>>> - clear_highpage(page);
>>>> - unlock_page(page);
>>>> - continue;
>>>> + index += HPAGE_PMD_NR - 1;
>>>> + i += HPAGE_PMD_NR - 1;
>>>> + } else {
>>>> + /*
>>>> + * Split THP for any partial hole
>>>> + * punch.
>>>> + */
>>>> + get_page(page);
>>>> + split = true;
>>>> + goto split;
>>>> }
>>>> - index += HPAGE_PMD_NR - 1;
>>>> - i += HPAGE_PMD_NR - 1;
>>>> }
>>>> if (!unfalloc || !PageUptodate(page)) {
>>>> @@ -866,9 +871,29 @@ static void shmem_undo_range(struct inode
>>>> *inode, loff_t lstart, loff_t lend,
>>>> }
>>>> unlock_page(page);
>>>> }
>>>> +split:
>>>> pagevec_remove_exceptionals(&pvec);
>>>> pagevec_release(&pvec);
>>>> cond_resched();
>>>> +
>>>> + if (split) {
>>>> + /*
>>>> + * The pagevec_release() released all extra pins
>>>> + * from pagevec lookup. And we hold an extra pin
>>>> + * and still have the page locked under us.
>>>> + */
>>>> + if (!split_huge_page(page)) {
>>>> + unlock_page(page);
>>>> + put_page(page);
>>>> + /* Re-lookup page cache from current index */
>>>> + goto retry;
>>>> + }
>>>> +
>>>> + /* Fail to split THP, move to next index */
>>>> + unlock_page(page);
>>>> + put_page(page);
>>>> + }
>>>> +
>>>> index++;
>>>> }
>>>> @@ -901,6 +926,7 @@ static void shmem_undo_range(struct inode
>>>> *inode, loff_t lstart, loff_t lend,
>>>> return;
>>>> index = start;
>>>> +again:
>>>> while (index < end) {
>>>> cond_resched();
>>>> @@ -916,7 +942,8 @@ static void shmem_undo_range(struct inode
>>>> *inode, loff_t lstart, loff_t lend,
>>>> continue;
>>>> }
>>>> for (i = 0; i < pagevec_count(&pvec); i++) {
>>>> - struct page *page = pvec.pages[i];
>>>> + split = false;
>>>> + page = pvec.pages[i];
>>>> index = indices[i];
>>>> if (index >= end)
>>>> @@ -936,30 +963,24 @@ static void shmem_undo_range(struct inode
>>>> *inode, loff_t lstart, loff_t lend,
>>>> lock_page(page);
>>>> - if (PageTransTail(page)) {
>>>> - /* Middle of THP: zero out the page */
>>>> - clear_highpage(page);
>>>> - unlock_page(page);
>>>> - /*
>>>> - * Partial thp truncate due 'start' in middle
>>>> - * of THP: don't need to look on these pages
>>>> - * again on !pvec.nr restart.
>>>> - */
>>>> - if (index != round_down(end, HPAGE_PMD_NR))
>>>> - start++;
>>>> - continue;
>>>> - } else if (PageTransHuge(page)) {
>>>> - if (index == round_down(end, HPAGE_PMD_NR)) {
>>>> + if (PageTransCompound(page) && !unfalloc) {
>>>> + if (PageHead(page) &&
>>>> + index != round_down(end, HPAGE_PMD_NR)) {
>>>> /*
>>>> - * Range ends in the middle of THP:
>>>> - * zero out the page
>>>> + * Fall through when punching whole
>>>> + * THP.
>>>> */
>>>> - clear_highpage(page);
>>>> - unlock_page(page);
>>>> - continue;
>>>> + index += HPAGE_PMD_NR - 1;
>>>> + i += HPAGE_PMD_NR - 1;
>>>> + } else {
>>>> + /*
>>>> + * Split THP for any partial hole
>>>> + * punch.
>>>> + */
>>>> + get_page(page);
>>>> + split = true;
>>>> + goto rescan_split;
>>>> }
>>>> - index += HPAGE_PMD_NR - 1;
>>>> - i += HPAGE_PMD_NR - 1;
>>>> }
>>>> if (!unfalloc || !PageUptodate(page)) {
>>>> @@ -976,8 +997,33 @@ static void shmem_undo_range(struct inode
>>>> *inode, loff_t lstart, loff_t lend,
>>>> }
>>>> unlock_page(page);
>>>> }
>>>> +rescan_split:
>>>> pagevec_remove_exceptionals(&pvec);
>>>> pagevec_release(&pvec);
>>>> +
>>>> + if (split) {
>>>> + /*
>>>> + * The pagevec_release() released all extra pins
>>>> + * from pagevec lookup. And we hold an extra pin
>>>> + * and still have the page locked under us.
>>>> + */
>>>> + if (!split_huge_page(page)) {
>>>> + unlock_page(page);
>>>> + put_page(page);
>>>> + /* Re-lookup page cache from current index */
>>>> + goto again;
>>>> + }
>>>> +
>>>> + /*
>>>> + * Split fail, clear the page then move to next
>>>> + * index.
>>>> + */
>>>> + clear_highpage(page);
>>>> +
>>>> + unlock_page(page);
>>>> + put_page(page);
>>>> + }
>>>> +
>>>> index++;
>>>> }
>>>> --
>>>> 1.8.3.1
>>>>
>>>>
>>
>
Powered by blists - more mailing lists