lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 11 May 2017 16:59:33 -0400
From:   Pasha Tatashin <pasha.tatashin@...cle.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     linux-kernel@...r.kernel.org, sparclinux@...r.kernel.org,
        linux-mm@...ck.org, linuxppc-dev@...ts.ozlabs.org,
        linux-s390@...r.kernel.org, borntraeger@...ibm.com,
        heiko.carstens@...ibm.com, davem@...emloft.net
Subject: Re: [v3 0/9] parallelized "struct page" zeroing

We should either keep memset() only for deferred struct pages as what I 
have in my patches.

Another option is to add a new function struct_page_clear() which would 
default to memset() and to something else on platforms that decide to 
optimize it.

On SPARC it would call STBIs, and we would do one membar call after all 
"struct pages" are initialized.

I think what I sent out already is cleaner and better solution, because 
I am not sure what kind of performance we would see on other chips.

On 05/11/2017 04:47 PM, Pasha Tatashin wrote:
>>>
>>> Have you measured that? I do not think it would be super hard to
>>> measure. I would be quite surprised if this added much if anything at
>>> all as the whole struct page should be in the cache line already. We do
>>> set reference count and other struct members. Almost nobody should be
>>> looking at our page at this time and stealing the cache line. On the
>>> other hand a large memcpy will basically wipe everything away from the
>>> cpu cache. Or am I missing something?
>>>
> 
> Here is data for single thread (deferred struct page init is disabled):
> 
> Intel CPU E7-8895 v3 @ 2.60GHz  1T memory
> -----------------------------------------
> time to memset "struct pages in memblock: 11.28s
> time to init "struct pag"es:               4.90s
> 
> Moving memset into __init_single_page()
> time to init and memset "struct page"es:   8.39s
> 
> SPARC M6 @ 3600 MHz  1T memory
> -----------------------------------------
> time to memset "struct pages in memblock:  1.60s
> time to init "struct pag"es:               3.37s
> 
> Moving memset into __init_single_page()
> time to init and memset "struct page"es:  12.99s
> 
> 
> So, moving memset() into __init_single_page() benefits Intel. I am 
> actually surprised why memset() is so slow on intel when it is called 
> from memblock. But, hurts SPARC, I guess these membars at the end of 
> memset() kills the performance.
> 
> Also, when looking at these values, remeber that Intel has twice as many 
> "struct page" for the same amount of memory.
> 
> Pasha
> -- 
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ