lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3a35da3a-dff0-a8ca-8269-3018fff8f21b@linux.alibaba.com>
Date:   Mon, 25 Nov 2019 10:24:38 -0800
From:   Yang Shi <yang.shi@...ux.alibaba.com>
To:     "Kirill A. Shutemov" <kirill@...temov.name>
Cc:     hughd@...gle.com, kirill.shutemov@...ux.intel.com,
        aarcange@...hat.com, akpm@...ux-foundation.org, linux-mm@...ck.org,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] mm: shmem: allow split THP when truncating THP
 partially



On 11/25/19 1:36 AM, Kirill A. Shutemov wrote:
> On Sat, Nov 23, 2019 at 09:05:32AM +0800, Yang Shi wrote:
>> Currently when truncating shmem file, if the range is partial of THP
>> (start or end is in the middle of THP), the pages actually will just get
>> cleared rather than being freed unless the range cover the whole THP.
>> Even though all the subpages are truncated (randomly or sequentially),
>> the THP may still be kept in page cache.  This might be fine for some
>> usecases which prefer preserving THP.
>>
>> But, when doing balloon inflation in QEMU, QEMU actually does hole punch
>> or MADV_DONTNEED in base page size granulairty if hugetlbfs is not used.
>> So, when using shmem THP as memory backend QEMU inflation actually doesn't
>> work as expected since it doesn't free memory.  But, the inflation
>> usecase really needs get the memory freed.  Anonymous THP will not get
>> freed right away too but it will be freed eventually when all subpages are
>> unmapped, but shmem THP would still stay in page cache.
>>
>> To protect the usecases which may prefer preserving THP, introduce a
>> new fallocate mode: FALLOC_FL_SPLIT_HPAGE, which means spltting THP is
>> preferred behavior if truncating partial THP.  This mode just makes
>> sense to tmpfs for the time being.
> We need to clarify interaction with khugepaged. This implementation
> doesn't do anything to prevent khugepaged from collapsing the range back
> to THP just after the split.

Yes, it doesn't. Will clarify this in the commit log.

>
>> @@ -976,8 +1022,31 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>>   			}
>>   			unlock_page(page);
>>   		}
>> +rescan_split:
>>   		pagevec_remove_exceptionals(&pvec);
>>   		pagevec_release(&pvec);
>> +
>> +		if (split && PageTransCompound(page)) {
>> +			/* The THP may get freed under us */
>> +			if (!get_page_unless_zero(compound_head(page)))
>> +				goto rescan_out;
>> +
>> +			lock_page(page);
>> +
>> +			/*
>> +			 * The extra pins from page cache lookup have been
>> +			 * released by pagevec_release().
>> +			 */
>> +			if (!split_huge_page(page)) {
>> +				unlock_page(page);
>> +				put_page(page);
>> +				/* Re-look up page cache from current index */
>> +				goto again;
>> +			}
>> +			unlock_page(page);
>> +			put_page(page);
>> +		}
>> +rescan_out:
>>   		index++;
>>   	}
> Doing get_page_unless_zero() just after you've dropped the pin for the
> page looks very suboptimal.

If I don't drop the pins the THP can't be split. And, there might be 
more than one pins from find_get_entries() if I read the code correctly. 
For example, truncate 8K length in the middle of THP, the THP's refcount 
would get bumpped twice sinceĀ  two sub pages would be returned.

>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ