lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 30 Aug 2020 18:08:48 +0800
From:   Alex Shi <alex.shi@...ux.alibaba.com>
To:     David Hildenbrand <david@...hat.com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Matthew Wilcox <willy@...radead.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Hugh Dickins <hughd@...gle.com>,
        Alexander Duyck <alexander.h.duyck@...ux.intel.com>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/2] mm/pageblock: mitigation cmpxchg false sharing in
 pageblock flags



在 2020/8/19 下午4:04, David Hildenbrand 写道:
> On 19.08.20 09:55, Anshuman Khandual wrote:
>>
>>
>> On 08/19/2020 11:17 AM, Alex Shi wrote:
>>> pageblock_flags is used as long, since every pageblock_flags is just 4
>>> bits, 'long' size will include 8(32bit machine) or 16 pageblocks' flags,
>>> that flag setting has to sync in cmpxchg with 7 or 15 other pageblock
>>> flags. It would cause long waiting for sync.
>>>
>>> If we could change the pageblock_flags variable as char, we could use
>>> char size cmpxchg, which just sync up with 2 pageblock flags. it could
>>> relief much false sharing in cmpxchg.
>>
>> Do you have numbers demonstrating claimed performance improvement
>> after this change ?
>>
> 
> I asked for that in v1 and there are no performance numbers to justify
> the change. IMHO, that will be required to consider this for inclusion,
> otherwise it's just code churn resulting in an (although minimal)
> additional memory consumption.
> 

Just got some time to run thpscale on my 4*HT cores machine, here is the data:
I run each of kernel for 3 times, pageblock kernel is the 5.9-rc2 with this 2
patches, the plp1 is the first patch on 5.9-rc2, and rc2 is 5.9-rc2 kernel.
We could found the system and total time is slight less than original kernel.

                   pageblock   pageblock   pageblock        plp1        plp1        plp1         rc2         rc2         rc2   pageblock
                          16        16-2        16-3           1           2           3           a           b           c           a
Duration User          14.81       15.24       14.55       15.28       14.66       14.63       14.76       14.97       14.38       15.07
Duration System        84.44       88.38       90.64       92.65       94.01       90.58      100.43       89.15       88.89       84.04
Duration Elapsed       98.83       99.06       99.81       99.65      100.26       99.90      100.30       99.24       99.14       98.87

And I also add tracing for patchset effect, which show the cmpxchg failure times
get clearly less.



 Performance counter stats for './run-mmtests.sh -c configs/config-workload-thpscale rc2-b':

             6,720      compaction:mm_compaction_isolate_migratepages
            13,526      compaction:mm_compaction_isolate_freepages
             4,052      compaction:mm_compaction_migratepages
            34,199      compaction:mm_compaction_begin
            34,199      compaction:mm_compaction_end
            21,784      compaction:mm_compaction_try_to_compact_pages
            71,606      compaction:mm_compaction_finished
           106,545      compaction:mm_compaction_suitable
                 0      compaction:mm_compaction_deferred
                 0      compaction:mm_compaction_defer_compaction
             2,977      compaction:mm_compaction_defer_reset
                 0      compaction:mm_compaction_kcompactd_sleep
                 0      compaction:mm_compaction_wakeup_kcompactd
                 0      compaction:mm_compaction_kcompactd_wake
             1,046      pageblock:hit_cmpxchg

     114.914303988 seconds time elapsed

      15.754797000 seconds user
      89.712251000 seconds sys





 Performance counter stats for './run-mmtests.sh -c configs/config-workload-thpscale pageblock-a':

               602      compaction:mm_compaction_isolate_migratepages
             3,710      compaction:mm_compaction_isolate_freepages
               402      compaction:mm_compaction_migratepages
            43,116      compaction:mm_compaction_begin
            43,116      compaction:mm_compaction_end
            24,810      compaction:mm_compaction_try_to_compact_pages
            86,527      compaction:mm_compaction_finished
           125,819      compaction:mm_compaction_suitable
                 2      compaction:mm_compaction_deferred
                 0      compaction:mm_compaction_defer_compaction
               271      compaction:mm_compaction_defer_reset
                 0      compaction:mm_compaction_kcompactd_sleep
                 0      compaction:mm_compaction_wakeup_kcompactd
                 0      compaction:mm_compaction_kcompactd_wake
               369      pageblock:hit_cmpxchg

     107.405499745 seconds time elapsed

      15.830967000 seconds user
      84.559767000 seconds sys


commit 36cea76895637c0c18ce8590c0f43a3e453fbf8f
Author: Alex Shi <alex.shi@...ux.alibaba.com>
Date:   Wed Aug 19 17:26:26 2020 +0800

    add cmpxchg tracing

    Signed-off-by: Alex Shi <alex.shi@...ux.alibaba.com>

diff --git a/include/trace/events/pageblock.h b/include/trace/events/pageblock.h
new file mode 100644
index 000000000000..003c2d716f82
--- /dev/null
+++ b/include/trace/events/pageblock.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM pageblock
+
+#if !defined(_TRACE_PAGEBLOCK_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PAGEBLOCK_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(hit_cmpxchg,
+
+       TP_PROTO(char byte),
+
+       TP_ARGS(byte),
+
+       TP_STRUCT__entry(
+               __field(char, byte)
+       ),
+
+       TP_fast_assign(
+               __entry->byte = byte;
+       ),
+
+       TP_printk("%d", __entry->byte)
+);
+
+#endif /* _TRACE_PAGE_ISOLATION_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 60342e764090..2422dec00484 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -509,6 +509,9 @@ static __always_inline int get_pfnblock_migratetype(struct page *page, unsigned
  * @pfn: The target page frame number
  * @mask: mask of bits that the caller is interested in
  */
+#define CREATE_TRACE_POINTS
+#include <trace/events/pageblock.h>
+
 void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
                                        unsigned long pfn,
                                        unsigned long mask)
@@ -536,6 +539,7 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
                if (byte == old_byte)
                        break;
                byte = old_byte;
+               trace_hit_cmpxchg(byte);
        }
 }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ