lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 3 Sep 2020 11:00:24 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Alex Shi <alex.shi@...ux.alibaba.com>,
        Mel Gorman <mgorman@...hsingularity.net>
Cc:     Anshuman Khandual <anshuman.khandual@....com>,
        David Hildenbrand <david@...hat.com>,
        Matthew Wilcox <willy@...radead.org>,
        Alexander Duyck <alexander.duyck@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 1/4] mm/pageblock: mitigation cmpxchg false sharing in
 pageblock flags

On 9/3/20 10:40 AM, Alex Shi wrote:
> 
> 
> 在 2020/9/3 下午4:32, Alex Shi 写道:
>>>
>> I have run thpscale with 'always' defrag setting of THP. The Amean stddev is much
>> larger than a very little average run time reducing.
>> 
>> But the left patch 4 could show the cmpxchg retry reduce from thousands to hundreds
>> or less.
>> 
>> Subject: [PATCH v4 4/4] add cmpxchg tracing
> 
> 
> It's a typical result with the patchset:
> 
>  Performance counter stats for './run-mmtests.sh -c configs/config-workload-thpscale pageblock-c':
> 
>              9,564      compaction:mm_compaction_isolate_migratepages
>              6,430      compaction:mm_compaction_isolate_freepages
>              5,287      compaction:mm_compaction_migratepages
>             45,299      compaction:mm_compaction_begin
>             45,299      compaction:mm_compaction_end
>             30,557      compaction:mm_compaction_try_to_compact_pages
>             95,540      compaction:mm_compaction_finished
>            149,379      compaction:mm_compaction_suitable
>                  0      compaction:mm_compaction_deferred
>                  0      compaction:mm_compaction_defer_compaction
>              3,949      compaction:mm_compaction_defer_reset
>                  0      compaction:mm_compaction_kcompactd_sleep
>                  0      compaction:mm_compaction_wakeup_kcompactd
>                  0      compaction:mm_compaction_kcompactd_wake
>                 68      pageblock:hit_cmpxchg
> 
>      113.570974583 seconds time elapsed
> 
>       14.664451000 seconds user
>       96.847116000 seconds sys
> 
> It's 5.9-rc2 base kernel result:
> 
>  Performance counter stats for './run-mmtests.sh -c configs/config-workload-thpscale rc2-e':
> 
>             15,920      compaction:mm_compaction_isolate_migratepages
>             20,523      compaction:mm_compaction_isolate_freepages
>              9,752      compaction:mm_compaction_migratepages
>             27,773      compaction:mm_compaction_begin
>             27,773      compaction:mm_compaction_end
>             16,391      compaction:mm_compaction_try_to_compact_pages
>             62,809      compaction:mm_compaction_finished
>             69,821      compaction:mm_compaction_suitable
>                  0      compaction:mm_compaction_deferred
>                  0      compaction:mm_compaction_defer_compaction
>              7,875      compaction:mm_compaction_defer_reset
>                  0      compaction:mm_compaction_kcompactd_sleep
>                  0      compaction:mm_compaction_wakeup_kcompactd
>                  0      compaction:mm_compaction_kcompactd_wake
>              1,208      pageblock:hit_cmpxchg
> 
>      116.440414591 seconds time elapsed
> 
>       15.326913000 seconds user
>      103.752758000 seconds sys

The runs wildly differ in many of other stats, so I'm not sure they are really
comparable. I guess you could show the fraction of hit_cmpxchg to all cmpxchg.
But there's also danger of tracepoints widening the race window.

In the end what matters is how these 1208 retries contribute to runtime. I doubt
they could be really visible in a 100+ seconds run though.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ