lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 13 Feb 2020 16:38:01 -0800
From:   Yang Shi <yang.shi@...ux.alibaba.com>
To:     Hugh Dickins <hughd@...gle.com>, kirill.shutemov@...ux.intel.com
Cc:     aarcange@...hat.com, akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [v2 PATCH] mm: shmem: allow split THP when truncating THP
 partially

Hi Kirill,


Would you please help review this patch? I don't know why Hugh didn't 
response though I pinged him twice.


Thanks,

Yang



On 2/4/20 3:27 PM, Yang Shi wrote:
>
>
> On 1/14/20 11:28 AM, Yang Shi wrote:
>>
>>
>> On 12/4/19 4:15 PM, Hugh Dickins wrote:
>>> On Wed, 4 Dec 2019, Yang Shi wrote:
>>>
>>>> Currently when truncating shmem file, if the range is partial of THP
>>>> (start or end is in the middle of THP), the pages actually will 
>>>> just get
>>>> cleared rather than being freed unless the range cover the whole THP.
>>>> Even though all the subpages are truncated (randomly or sequentially),
>>>> the THP may still be kept in page cache.  This might be fine for some
>>>> usecases which prefer preserving THP.
>>>>
>>>> But, when doing balloon inflation in QEMU, QEMU actually does hole 
>>>> punch
>>>> or MADV_DONTNEED in base page size granulairty if hugetlbfs is not 
>>>> used.
>>>> So, when using shmem THP as memory backend QEMU inflation actually 
>>>> doesn't
>>>> work as expected since it doesn't free memory.  But, the inflation
>>>> usecase really needs get the memory freed.  Anonymous THP will not get
>>>> freed right away too but it will be freed eventually when all 
>>>> subpages are
>>>> unmapped, but shmem THP would still stay in page cache.
>>>>
>>>> Split THP right away when doing partial hole punch, and if split fails
>>>> just clear the page so that read to the hole punched area would return
>>>> zero.
>>>>
>>>> Cc: Hugh Dickins <hughd@...gle.com>
>>>> Cc: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
>>>> Cc: Andrea Arcangeli <aarcange@...hat.com>
>>>> Signed-off-by: Yang Shi <yang.shi@...ux.alibaba.com>
>>>> ---
>>>> v2: * Adopted the comment from Kirill.
>>>>      * Dropped fallocate mode flag, THP split is the default behavior.
>>>>      * Blended Huge's implementation with my v1 patch. TBH I'm not 
>>>> very keen to
>>>>        Hugh's find_get_entries() hack (basically neutral), but 
>>>> without that hack
>>> Thanks for giving it a try.  I'm not neutral about my 
>>> find_get_entries()
>>> hack: it surely had to go (without it, I'd have just pushed my own 
>>> patch).
>>> I've not noticed anything wrong with your patch, and it's in the right
>>> direction, but I'm still not thrilled with it.  I also remember that I
>>> got the looping wrong in my first internal attempt (fixed in what I 
>>> sent),
>>> and need to be very sure of the try-again-versus-move-on-to-next 
>>> conditions
>>> before agreeing to anything.  No rush, I'll come back to this in 
>>> days or
>>> month ahead: I'll try to find a less gotoey blend of yours and mine.
>>
>> Hi Hugh,
>>
>> Any update on this one?
>>
>> Thanks,
>> Yang
>
> Hi Hugh,
>
> Ping. Any comment on this? I really hope it can make v5.7.
>
> Thanks,
> Yang
>
>>
>>>
>>> Hugh
>>>
>>>>        we have to rely on pagevec_release() to release extra pins 
>>>> and play with
>>>>        goto. This version does in this way. The patch is bigger 
>>>> than Hugh's due
>>>>        to extra comments to make the flow clear.
>>>>
>>>>   mm/shmem.c | 120 
>>>> ++++++++++++++++++++++++++++++++++++++++++-------------------
>>>>   1 file changed, 83 insertions(+), 37 deletions(-)
>>>>
>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>> index 220be9f..1ae0c7f 100644
>>>> --- a/mm/shmem.c
>>>> +++ b/mm/shmem.c
>>>> @@ -806,12 +806,15 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>       long nr_swaps_freed = 0;
>>>>       pgoff_t index;
>>>>       int i;
>>>> +    bool split = false;
>>>> +    struct page *page = NULL;
>>>>         if (lend == -1)
>>>>           end = -1;    /* unsigned, so actually very big */
>>>>         pagevec_init(&pvec);
>>>>       index = start;
>>>> +retry:
>>>>       while (index < end) {
>>>>           pvec.nr = find_get_entries(mapping, index,
>>>>               min(end - index, (pgoff_t)PAGEVEC_SIZE),
>>>> @@ -819,7 +822,8 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>           if (!pvec.nr)
>>>>               break;
>>>>           for (i = 0; i < pagevec_count(&pvec); i++) {
>>>> -            struct page *page = pvec.pages[i];
>>>> +            split = false;
>>>> +            page = pvec.pages[i];
>>>>                 index = indices[i];
>>>>               if (index >= end)
>>>> @@ -838,23 +842,24 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>               if (!trylock_page(page))
>>>>                   continue;
>>>>   -            if (PageTransTail(page)) {
>>>> -                /* Middle of THP: zero out the page */
>>>> -                clear_highpage(page);
>>>> -                unlock_page(page);
>>>> -                continue;
>>>> -            } else if (PageTransHuge(page)) {
>>>> -                if (index == round_down(end, HPAGE_PMD_NR)) {
>>>> +            if (PageTransCompound(page) && !unfalloc) {
>>>> +                if (PageHead(page) &&
>>>> +                    index != round_down(end, HPAGE_PMD_NR)) {
>>>>                       /*
>>>> -                     * Range ends in the middle of THP:
>>>> -                     * zero out the page
>>>> +                     * Fall through when punching whole
>>>> +                     * THP.
>>>>                        */
>>>> -                    clear_highpage(page);
>>>> -                    unlock_page(page);
>>>> -                    continue;
>>>> +                    index += HPAGE_PMD_NR - 1;
>>>> +                    i += HPAGE_PMD_NR - 1;
>>>> +                } else {
>>>> +                    /*
>>>> +                     * Split THP for any partial hole
>>>> +                     * punch.
>>>> +                     */
>>>> +                    get_page(page);
>>>> +                    split = true;
>>>> +                    goto split;
>>>>                   }
>>>> -                index += HPAGE_PMD_NR - 1;
>>>> -                i += HPAGE_PMD_NR - 1;
>>>>               }
>>>>                 if (!unfalloc || !PageUptodate(page)) {
>>>> @@ -866,9 +871,29 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>               }
>>>>               unlock_page(page);
>>>>           }
>>>> +split:
>>>>           pagevec_remove_exceptionals(&pvec);
>>>>           pagevec_release(&pvec);
>>>>           cond_resched();
>>>> +
>>>> +        if (split) {
>>>> +            /*
>>>> +             * The pagevec_release() released all extra pins
>>>> +             * from pagevec lookup.  And we hold an extra pin
>>>> +             * and still have the page locked under us.
>>>> +             */
>>>> +            if (!split_huge_page(page)) {
>>>> +                unlock_page(page);
>>>> +                put_page(page);
>>>> +                /* Re-lookup page cache from current index */
>>>> +                goto retry;
>>>> +            }
>>>> +
>>>> +            /* Fail to split THP, move to next index */
>>>> +            unlock_page(page);
>>>> +            put_page(page);
>>>> +        }
>>>> +
>>>>           index++;
>>>>       }
>>>>   @@ -901,6 +926,7 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>           return;
>>>>         index = start;
>>>> +again:
>>>>       while (index < end) {
>>>>           cond_resched();
>>>>   @@ -916,7 +942,8 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>               continue;
>>>>           }
>>>>           for (i = 0; i < pagevec_count(&pvec); i++) {
>>>> -            struct page *page = pvec.pages[i];
>>>> +            split = false;
>>>> +            page = pvec.pages[i];
>>>>                 index = indices[i];
>>>>               if (index >= end)
>>>> @@ -936,30 +963,24 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>                 lock_page(page);
>>>>   -            if (PageTransTail(page)) {
>>>> -                /* Middle of THP: zero out the page */
>>>> -                clear_highpage(page);
>>>> -                unlock_page(page);
>>>> -                /*
>>>> -                 * Partial thp truncate due 'start' in middle
>>>> -                 * of THP: don't need to look on these pages
>>>> -                 * again on !pvec.nr restart.
>>>> -                 */
>>>> -                if (index != round_down(end, HPAGE_PMD_NR))
>>>> -                    start++;
>>>> -                continue;
>>>> -            } else if (PageTransHuge(page)) {
>>>> -                if (index == round_down(end, HPAGE_PMD_NR)) {
>>>> +            if (PageTransCompound(page) && !unfalloc) {
>>>> +                if (PageHead(page) &&
>>>> +                    index != round_down(end, HPAGE_PMD_NR)) {
>>>>                       /*
>>>> -                     * Range ends in the middle of THP:
>>>> -                     * zero out the page
>>>> +                     * Fall through when punching whole
>>>> +                     * THP.
>>>>                        */
>>>> -                    clear_highpage(page);
>>>> -                    unlock_page(page);
>>>> -                    continue;
>>>> +                    index += HPAGE_PMD_NR - 1;
>>>> +                    i += HPAGE_PMD_NR - 1;
>>>> +                } else {
>>>> +                    /*
>>>> +                     * Split THP for any partial hole
>>>> +                     * punch.
>>>> +                     */
>>>> +                    get_page(page);
>>>> +                    split = true;
>>>> +                    goto rescan_split;
>>>>                   }
>>>> -                index += HPAGE_PMD_NR - 1;
>>>> -                i += HPAGE_PMD_NR - 1;
>>>>               }
>>>>                 if (!unfalloc || !PageUptodate(page)) {
>>>> @@ -976,8 +997,33 @@ static void shmem_undo_range(struct inode 
>>>> *inode, loff_t lstart, loff_t lend,
>>>>               }
>>>>               unlock_page(page);
>>>>           }
>>>> +rescan_split:
>>>>           pagevec_remove_exceptionals(&pvec);
>>>>           pagevec_release(&pvec);
>>>> +
>>>> +        if (split) {
>>>> +            /*
>>>> +             * The pagevec_release() released all extra pins
>>>> +             * from pagevec lookup.  And we hold an extra pin
>>>> +             * and still have the page locked under us.
>>>> +             */
>>>> +            if (!split_huge_page(page)) {
>>>> +                unlock_page(page);
>>>> +                put_page(page);
>>>> +                /* Re-lookup page cache from current index */
>>>> +                goto again;
>>>> +            }
>>>> +
>>>> +            /*
>>>> +             * Split fail, clear the page then move to next
>>>> +             * index.
>>>> +             */
>>>> +            clear_highpage(page);
>>>> +
>>>> +            unlock_page(page);
>>>> +            put_page(page);
>>>> +        }
>>>> +
>>>>           index++;
>>>>       }
>>>>   --
>>>> 1.8.3.1
>>>>
>>>>
>>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ