lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <558766E4.5020801@redhat.com>
Date:	Sun, 21 Jun 2015 21:37:40 -0400
From:	Rik van Riel <riel@...hat.com>
To:	"Kirill A. Shutemov" <kirill@...temov.name>,
	Ebru Akagunduz <ebru.akagunduz@...il.com>
CC:	linux-mm@...ck.org, akpm@...ux-foundation.org,
	kirill.shutemov@...ux.intel.com, n-horiguchi@...jp.nec.com,
	aarcange@...hat.com, iamjoonsoo.kim@....com, xiexiuqi@...wei.com,
	gorcunov@...nvz.org, linux-kernel@...r.kernel.org, mgorman@...e.de,
	rientjes@...gle.com, vbabka@...e.cz,
	aneesh.kumar@...ux.vnet.ibm.com, hughd@...gle.com,
	hannes@...xchg.org, mhocko@...e.cz, boaz@...xistor.com,
	raindel@...lanox.com
Subject: Re: [RFC v2 3/3] mm: make swapin readahead to improve thp collapse
 rate

On 06/21/2015 02:11 PM, Kirill A. Shutemov wrote:
> On Sat, Jun 20, 2015 at 02:28:06PM +0300, Ebru Akagunduz wrote:

>> +static void __collapse_huge_page_swapin(struct mm_struct *mm,
>> +					struct vm_area_struct *vma,
>> +					unsigned long address, pmd_t *pmd,
>> +					pte_t *pte)
>> +{
>> +	unsigned long _address;
>> +	pte_t pteval = *pte;
>> +	int swap_pte = 0;
>> +
>> +	pte = pte_offset_map(pmd, address);
>> +	for (_address = address; _address < address + HPAGE_PMD_NR*PAGE_SIZE;
>> +	     pte++, _address += PAGE_SIZE) {
>> +		pteval = *pte;
>> +		if (is_swap_pte(pteval)) {
>> +			swap_pte++;
>> +			do_swap_page(mm, vma, _address, pte, pmd,
>> +				     FAULT_FLAG_ALLOW_RETRY|FAULT_FLAG_RETRY_NOWAIT,
>> +				     pteval);
> 
> Hm. I guess this lacking error handling.
> We really should abort early at least for VM_FAULT_HWPOISON and VM_FAULT_OOM.

Good catch.

>> +			/* pte is unmapped now, we need to map it */
>> +			pte = pte_offset_map(pmd, _address);
> 
> No, it's within the same pte page table. It should be mapped with
> pte_offset_map() above.

It would be, except do_swap_page() unmaps the pte page table.

>> @@ -2551,6 +2586,8 @@ static void collapse_huge_page(struct mm_struct *mm,
>>  	if (!pmd)
>>  		goto out;
>>  
>> +	__collapse_huge_page_swapin(mm, vma, address, pmd, pte);
>> +
> 
> And now the pages we swapped in are not isolated, right?
> What prevents them from being swapped out again or whatever?

Nothing, but __collapse_huge_page_isolate is run with the
appropriate locks to ensure that once we actually collapse
the THP, things are present.

The way do_swap_page is called, khugepaged does not even
wait for pages to be brought in from swap. It just maps
in pages that are in the swap cache, and which can be
immediately locked (without waiting).

It will also start IO on pages that are not in memory
yet, and will hopefully get those next round.


-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ