lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZEo0wctuNFBzaxoJ@dhcp22.suse.cz>
Date:   Thu, 27 Apr 2023 10:39:29 +0200
From:   Michal Hocko <mhocko@...e.com>
To:     Marcelo Tosatti <mtosatti@...hat.com>
Cc:     Vlastimil Babka <vbabka@...e.cz>,
        Frederic Weisbecker <frederic@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Christoph Lameter <cl@...ux.com>,
        Aaron Tomlin <atomlin@...mlin.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        Russell King <linux@...linux.org.uk>,
        Huacai Chen <chenhuacai@...nel.org>,
        Heiko Carstens <hca@...ux.ibm.com>, x86@...nel.org
Subject: Re: [PATCH v7 00/13] fold per-CPU vmstats remotely

On Wed 26-04-23 13:10:54, Marcelo Tosatti wrote:
[...]
> "To test the performance difference, a page allocator microbenchmark:
> https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/bench/page_bench01.c
> with loops=1000000 was used, on Intel Core i7-11850H @ 2.50GHz.
> 
> For the single_page_alloc_free test, which does
> 
>        	/** Loop to measure **/
>        	for (i = 0; i < rec->loops; i++) {
>                	my_page = alloc_page(gfp_mask);
>                 if (unlikely(my_page == NULL))
>                        	return 0;
>                 __free_page(my_page);
>         }                                                                                                           
> 
> Unit is cycles.
> 
> Vanilla                 Patched         Diff
> 115.25                  117             1.4%"
> 
> To be honest, that 1.4% difference was not stable but fluctuated between
> positive and negative percentages (so the performance difference was in
> the noise).
> 
> So performance is not a decisive factor in this case.

It is not neglible considering that majority worklods will not benefit
from this change. You are clearly ignoring that vmstat code has been
highly optimized for local per-cpu access exactly to avoid locked
operations and cache line bouncing.
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ