linux-kernel - Re: [v3 0/9] parallelized "struct page" zeroing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20170512.125616.2184259340380386583.davem@davemloft.net>
Date:   Fri, 12 May 2017 12:56:16 -0400 (EDT)
From:   David Miller <davem@...emloft.net>
To:     pasha.tatashin@...cle.com
Cc:     mhocko@...nel.org, linux-kernel@...r.kernel.org,
        sparclinux@...r.kernel.org, linux-mm@...ck.org,
        linuxppc-dev@...ts.ozlabs.org, linux-s390@...r.kernel.org,
        borntraeger@...ibm.com, heiko.carstens@...ibm.com
Subject: Re: [v3 0/9] parallelized "struct page" zeroing

From: Pasha Tatashin <pasha.tatashin@...cle.com>
Date: Thu, 11 May 2017 16:47:05 -0400

> So, moving memset() into __init_single_page() benefits Intel. I am
> actually surprised why memset() is so slow on intel when it is called
> from memblock. But, hurts SPARC, I guess these membars at the end of
> memset() kills the performance.

Perhaps an x86 expert can chime in, but it might be the case that past
a certain size, the microcode for the enhanced stosb uses non-temporal
stores or something like that.

As for sparc64, yes we can get really killed by the transactional cost
of memset because of the membars.

But I wonder, for a single page struct, if we even use the special
stores and thus eat the membar cost.  struct page is only 64 bytes,
and the cutoff in the Niagara4 bzero implementation is "64 + (64 - 8)"
so indeed the initializing stores will not even be used.

So sparc64 will only use initializing stores and do the membars if
at least 2 pages are cleared at a time.