lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 12 May 2017 13:24:52 -0400
From:   Pasha Tatashin <pasha.tatashin@...cle.com>
To:     David Miller <davem@...emloft.net>
Cc:     mhocko@...nel.org, linux-kernel@...r.kernel.org,
        sparclinux@...r.kernel.org, linux-mm@...ck.org,
        linuxppc-dev@...ts.ozlabs.org, linux-s390@...r.kernel.org,
        borntraeger@...ibm.com, heiko.carstens@...ibm.com
Subject: Re: [v3 0/9] parallelized "struct page" zeroing



On 05/12/2017 12:57 PM, David Miller wrote:
> From: Pasha Tatashin <pasha.tatashin@...cle.com>
> Date: Thu, 11 May 2017 16:59:33 -0400
> 
>> We should either keep memset() only for deferred struct pages as what
>> I have in my patches.
>>
>> Another option is to add a new function struct_page_clear() which
>> would default to memset() and to something else on platforms that
>> decide to optimize it.
>>
>> On SPARC it would call STBIs, and we would do one membar call after
>> all "struct pages" are initialized.
> 
> No membars will be performed for single individual page struct clear,
> the cutoff to use the STBI is larger than that.
> 

Right now it is larger, but what I suggested is to add a new optimized 
routine just for this case, which would do STBI for 64-bytes but without 
membar (do membar at the end of memmap_init_zone() and 
deferred_init_memmap()

#define struct_page_clear(page)                                 \
         __asm__ __volatile__(                                   \
         "stxa   %%g0, [%0]%2\n"                                 \
         "stxa   %%xg0, [%0 + %1]%2\n"                           \
         : /* No output */                                       \
         : "r" (page), "r" (0x20), "i"(ASI_BLK_INIT_QUAD_LDD_P))

And insert it into __init_single_page() instead of memset()

The final result is 4.01s/T which is even faster compared to current 4.97s/T



Pasha

Powered by blists - more mailing lists