lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 12 May 2017 12:56:16 -0400 (EDT)
From:   David Miller <davem@...emloft.net>
To:     pasha.tatashin@...cle.com
Cc:     mhocko@...nel.org, linux-kernel@...r.kernel.org,
        sparclinux@...r.kernel.org, linux-mm@...ck.org,
        linuxppc-dev@...ts.ozlabs.org, linux-s390@...r.kernel.org,
        borntraeger@...ibm.com, heiko.carstens@...ibm.com
Subject: Re: [v3 0/9] parallelized "struct page" zeroing

From: Pasha Tatashin <pasha.tatashin@...cle.com>
Date: Thu, 11 May 2017 16:47:05 -0400

> So, moving memset() into __init_single_page() benefits Intel. I am
> actually surprised why memset() is so slow on intel when it is called
> from memblock. But, hurts SPARC, I guess these membars at the end of
> memset() kills the performance.

Perhaps an x86 expert can chime in, but it might be the case that past
a certain size, the microcode for the enhanced stosb uses non-temporal
stores or something like that.

As for sparc64, yes we can get really killed by the transactional cost
of memset because of the membars.

But I wonder, for a single page struct, if we even use the special
stores and thus eat the membar cost.  struct page is only 64 bytes,
and the cutoff in the Niagara4 bzero implementation is "64 + (64 - 8)"
so indeed the initializing stores will not even be used.

So sparc64 will only use initializing stores and do the membars if
at least 2 pages are cleared at a time.

Powered by blists - more mailing lists