linux-kernel - Re: [PATCH v2 1/2] mm:vmscan: the dirty folio in folio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <356a0ae7-6fba-4065-bdb3-5da184074f60@redhat.com>
Date:   Tue, 24 Oct 2023 09:07:52 +0200
From:   David Hildenbrand <david@...hat.com>
To:     zhiguojiang <justinjiang@...o.com>,
        Matthew Wilcox <willy@...radead.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, opensource.kernel@...o.com
Subject: Re: [PATCH v2 1/2] mm:vmscan: the dirty folio in folio_list skip
 unmap

On 24.10.23 04:04, zhiguojiang wrote:
> 
> 
> 在 2023/10/23 21:01, Matthew Wilcox 写道:
>> On Mon, Oct 23, 2023 at 08:44:55PM +0800, zhiguojiang wrote:
>>> 在 2023/10/23 20:21, Matthew Wilcox 写道:
>>>> On Mon, Oct 23, 2023 at 04:07:28PM +0800, zhiguojiang wrote:
>>>>>> Are you seeing measurable changes for any workloads?  It certainly seems
>>>>>> like you should, but it would help if you chose a test from mmtests and
>>>>>> showed how performance changed on your system.
>>>>> In one mmtest, the max times for a invalid recyling of a folio_list dirty
>>>>> folio that does not support pageout and has been activated in
>>>>> shrink_folio_list() are: cost=51us, exe=2365us.
>>>>>
>>>>> Calculate according to this formula: dirty_cost / total_cost * 100%, the
>>>>> recyling efficiency of dirty folios can be improved 53.13%、82.95%.
>>>>>
>>>>> So this patch can optimize shrink efficiency and reduce the workload of
>>>>> kswapd to a certain extent.
>>>>>
>>>>> kswapd0-96      (     96) [005] .....   387.218548:
>>>>> mm_vmscan_lru_shrink_inactive: [Justin] nid 0 nr_scanned 32 nr_taken 32
>>>>> nr_reclaimed 31 nr_dirty  1 nr_unqueued_dirty  1 nr_writeback 0
>>>>> nr_activate[1]  1 nr_ref_keep  0 f RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
>>>>> total_cost 96 total_exe 2365 dirty_cost 51 total_exe 2365
>>>>>
>>>>> kswapd0-96      (     96) [006] .....   412.822532:
>>>>> mm_vmscan_lru_shrink_inactive: [Justin] nid 0 nr_scanned 32 nr_taken 32
>>>>> nr_reclaimed  0 nr_dirty 32 nr_unqueued_dirty 32 nr_writeback 0
>>>>> nr_activate[1] 19 nr_ref_keep 13 f RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
>>>>> total_cost 88 total_exe 605  dirty_cost 73 total_exe 605
>>>> I appreciate that you can put probes in and determine the cost, but do
>>>> you see improvements for a real workload?  Like doing a kernel compile
>>>> -- does it speed up at all?
>>> Can you help share a method for testing thread workload, like kswapd?
>> Something dirt simple like 'time make -j8'.
> Two compilations were conducted separately, and compared to the
> unmodified compilation,
> the compilation time for adding modified patches had a certain
> reduction, as follows:
> 
> Compilation command:
> make distclean -j8
> make ARCH=x86_64 x86_64_defconfig
> time make -j8
> 
> 1.Unmodified Compilation time:
> real    2m40.276s
> user    16m2.956s
> sys     2m14.738s
> 
> real    2m40.136s
> user    16m2.617s
> sys     2m14.722s
> 
> 2.[Patch v2 1/2] Modified Compilation time:
> real    2m40.067s
> user    16m3.164s
> sys     2m14.211s
> 
> real    2m40.123s
> user    16m2.439s
> sys     2m14.508s
> 
> 3 [Patch v2 1/2] + [Patch v2 2/2] Modified Compilation time:
> real    2m40.367s
> user    16m3.738s
> sys     2m13.662s
> 
> real    2m40.014s
> user    16m3.108s
> sys     2m14.096s
> 

To get expressive numbers two iterations are usually not sufficient. How 
much memory does you system have? Does vmscan even ever get active?

-- 
Cheers,

David / dhildenb