linux-kernel - Re: [PATCH -v4 8/9] migrate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <878rh8gzwm.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date:   Wed, 08 Feb 2023 19:27:53 +0800
From:   "Huang, Ying" <ying.huang@...el.com>
To:     Zi Yan <ziy@...dia.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, <linux-mm@...ck.org>,
        <linux-kernel@...r.kernel.org>, Yang Shi <shy828301@...il.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Oscar Salvador <osalvador@...e.de>,
        "Matthew Wilcox" <willy@...radead.org>,
        Bharata B Rao <bharata@....com>,
        "Alistair Popple" <apopple@...dia.com>,
        haoxin <xhao@...ux.alibaba.com>,
        Minchan Kim <minchan@...nel.org>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Hyeonggon Yoo <42.hyeyoo@...il.com>
Subject: Re: [PATCH -v4 8/9] migrate_pages: batch flushing TLB

Zi Yan <ziy@...dia.com> writes:

> On 6 Feb 2023, at 1:33, Huang Ying wrote:
>
>> The TLB flushing will cost quite some CPU cycles during the folio
>> migration in some situations.  For example, when migrate a folio of a
>> process with multiple active threads that run on multiple CPUs.  After
>> batching the _unmap and _move in migrate_pages(), the TLB flushing can
>> be batched easily with the existing TLB flush batching mechanism.
>> This patch implements that.
>>
>> We use the following test case to test the patch.
>>
>> On a 2-socket Intel server,
>>
>> - Run pmbench memory accessing benchmark
>>
>> - Run `migratepages` to migrate pages of pmbench between node 0 and
>>   node 1 back and forth.
>>
>> With the patch, the TLB flushing IPI reduces 99.1% during the test and
>> the number of pages migrated successfully per second increases 291.7%.
>>
>> NOTE: TLB flushing is batched only for normal folios, not for THP
>> folios.  Because the overhead of TLB flushing for THP folios is much
>> lower than that for normal folios (about 1/512 on x86 platform).
>>
>> Signed-off-by: "Huang, Ying" <ying.huang@...el.com>
>> Cc: Zi Yan <ziy@...dia.com>
>> Cc: Yang Shi <shy828301@...il.com>
>> Cc: Baolin Wang <baolin.wang@...ux.alibaba.com>
>> Cc: Oscar Salvador <osalvador@...e.de>
>> Cc: Matthew Wilcox <willy@...radead.org>
>> Cc: Bharata B Rao <bharata@....com>
>> Cc: Alistair Popple <apopple@...dia.com>
>> Cc: haoxin <xhao@...ux.alibaba.com>
>> Cc: Minchan Kim <minchan@...nel.org>
>> Cc: Mike Kravetz <mike.kravetz@...cle.com>
>> Cc: Hyeonggon Yoo <42.hyeyoo@...il.com>
>> ---
>>  mm/migrate.c |  4 +++-
>>  mm/rmap.c    | 20 +++++++++++++++++---
>>  2 files changed, 20 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 9378fa2ad4a5..ca6e2ff02a09 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1230,7 +1230,7 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
>>  		/* Establish migration ptes */
>>  		VM_BUG_ON_FOLIO(folio_test_anon(src) &&
>>  			       !folio_test_ksm(src) && !anon_vma, src);
>> -		try_to_migrate(src, 0);
>> +		try_to_migrate(src, TTU_BATCH_FLUSH);
>>  		page_was_mapped = 1;
>>  	}
>>
>> @@ -1781,6 +1781,8 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>>  	stats->nr_thp_failed += thp_retry;
>>  	stats->nr_failed_pages += nr_retry_pages;
>>  move:
>
> Maybe a comment:
> /* Flush TLBs for all the unmapped pages */

OK.  Will do that in the next version.

Best Regards,
Huang, Ying

>> +	try_to_unmap_flush();
>> +
>>  	retry = 1;
>>  	for (pass = 0;
>>  	     pass < NR_MAX_MIGRATE_PAGES_RETRY && (retry || large_retry);
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index b616870a09be..2e125f3e462e 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1976,7 +1976,21 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>>  		} else {
>>  			flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
>>  			/* Nuke the page table entry. */
>> -			pteval = ptep_clear_flush(vma, address, pvmw.pte);
>> +			if (should_defer_flush(mm, flags)) {
>> +				/*
>> +				 * We clear the PTE but do not flush so potentially
>> +				 * a remote CPU could still be writing to the folio.
>> +				 * If the entry was previously clean then the
>> +				 * architecture must guarantee that a clear->dirty
>> +				 * transition on a cached TLB entry is written through
>> +				 * and traps if the PTE is unmapped.
>> +				 */
>> +				pteval = ptep_get_and_clear(mm, address, pvmw.pte);
>> +
>> +				set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
>> +			} else {
>> +				pteval = ptep_clear_flush(vma, address, pvmw.pte);
>> +			}
>>  		}
>>
>>  		/* Set the dirty flag on the folio now the pte is gone. */
>> @@ -2148,10 +2162,10 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags)
>>
>>  	/*
>>  	 * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and
>> -	 * TTU_SPLIT_HUGE_PMD and TTU_SYNC flags.
>> +	 * TTU_SPLIT_HUGE_PMD, TTU_SYNC, and TTU_BATCH_FLUSH flags.
>>  	 */
>>  	if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD |
>> -					TTU_SYNC)))
>> +					TTU_SYNC | TTU_BATCH_FLUSH)))
>>  		return;
>>
>>  	if (folio_is_zone_device(folio) &&
>> -- 
>> 2.35.1
>
> Everything else looks good to me. Reviewed-by: Zi Yan <ziy@...dia.com>
>
> --
> Best Regards,
> Yan, Zi