lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 28 Sep 2022 11:33:13 +0800
From:   haoxin <xhao@...ux.alibaba.com>
To:     "Huang, Ying" <ying.huang@...el.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Zi Yan <ziy@...dia.com>, Yang Shi <shy828301@...il.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Oscar Salvador <osalvador@...e.de>,
        Matthew Wilcox <willy@...radead.org>, yangyicong@...ilicon.com,
        v-songbaohua@...o.com, 21cnbao@...il.com
Subject: Re: [RFC 0/6] migrate_pages(): batch TLB flushing


在 2022/9/28 上午10:01, Huang, Ying 写道:
> haoxin <xhao@...ux.alibaba.com> writes:
>
>> Hi, Huang
>>
>> ( 2022/9/21 .H2:06, Huang Ying S:
>>> From: "Huang, Ying" <ying.huang@...el.com>
>>>
>>> Now, migrate_pages() migrate pages one by one, like the fake code as
>>> follows,
>>>
>>>     for each page
>>>       unmap
>>>       flush TLB
>>>       copy
>>>       restore map
>>>
>>> If multiple pages are passed to migrate_pages(), there are
>>> opportunities to batch the TLB flushing and copying.  That is, we can
>>> change the code to something as follows,
>>>
>>>     for each page
>>>       unmap
>>>     for each page
>>>       flush TLB
>>>     for each page
>>>       copy
>>>     for each page
>>>       restore map
>>>
>>> The total number of TLB flushing IPI can be reduced considerably.  And
>>> we may use some hardware accelerator such as DSA to accelerate the
>>> page copying.
>>>
>>> So in this patch, we refactor the migrate_pages() implementation and
>>> implement the TLB flushing batching.  Base on this, hardware
>>> accelerated page copying can be implemented.
>>>
>>> If too many pages are passed to migrate_pages(), in the naive batched
>>> implementation, we may unmap too many pages at the same time.  The
>>> possibility for a task to wait for the migrated pages to be mapped
>>> again increases.  So the latency may be hurt.  To deal with this
>>> issue, the max number of pages be unmapped in batch is restricted to
>>> no more than HPAGE_PMD_NR.  That is, the influence is at the same
>>> level of THP migration.
>>>
>>> We use the following test to measure the performance impact of the
>>> patchset,
>>>
>>> On a 2-socket Intel server,
>>>
>>>    - Run pmbench memory accessing benchmark
>>>
>>>    - Run `migratepages` to migrate pages of pmbench between node 0 and
>>>      node 1 back and forth.
>>>
>> As the pmbench can not run on arm64 machine, so i use lmbench instead.
>> I test case like this:  (i am not sure whether it is reasonable, but it seems worked)
>> ./bw_mem -N10000 10000m rd &
>> time migratepages pid node0 node1
>>
>> o/patch      		w/patch
>> real	0m0.035s  	real	0m0.024s
>> user	0m0.000s  	user	0m0.000s
>> sys	0m0.035s        sys	0m0.024s
>>
>> the migratepages time is reduced above 32%.
>>
>> But there has a problem, i see the batch flush is called by
>> migrate_pages_batch
>> 	try_to_unmap_flush
>> 		arch_tlbbatch_flush(&tlb_ubc->arch); // there batch flush really work.
>>
>> But in arm64, the arch_tlbbatch_flush are not supported, becasue it not support CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH yet.
>>
>> So, the tlb batch flush means no any flush is did, it is a empty func.
> Yes.  And should_defer_flush() will always return false too.  That is,
> the TLB will still be flushed, but will not be batched.
Oh, yes, i  ignore this, thank you.
>
>> Maybe this patch can help solve this problem.
>> https://lore.kernel.org/linux-arm-kernel/20220921084302.43631-1-yangyicong@huawei.com/T/
> Yes.  This will bring TLB flush batching to ARM64.
Next time,  i will combine with this patch, and do some test again, do 
you have any suggestion about  benchmark ?
>
> Best Regards,
> Huang, Ying

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ