linux-kernel - Re: [v3 0/9] parallelized "struct page" zeroing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ab667486-54a0-a36e-6797-b5f7b83c10f7@oracle.com>
Date:   Wed, 10 May 2017 11:01:40 -0400
From:   Pasha Tatashin <pasha.tatashin@...cle.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     linux-kernel@...r.kernel.org, sparclinux@...r.kernel.org,
        linux-mm@...ck.org, linuxppc-dev@...ts.ozlabs.org,
        linux-s390@...r.kernel.org, borntraeger@...ibm.com,
        heiko.carstens@...ibm.com, davem@...emloft.net
Subject: Re: [v3 0/9] parallelized "struct page" zeroing



On 05/10/2017 10:57 AM, Michal Hocko wrote:
> On Wed 10-05-17 09:42:22, Pasha Tatashin wrote:
>>>
>>> Well, I didn't object to this particular part. I was mostly concerned
>>> about
>>> http://lkml.kernel.org/r/1494003796-748672-4-git-send-email-pasha.tatashin@oracle.com
>>> and the "zero" argument for other functions. I guess we can do without
>>> that. I _think_ that we should simply _always_ initialize the page at the
>>> __init_single_page time rather than during the allocation. That would
>>> require dropping __GFP_ZERO for non-memblock allocations. Or do you
>>> think we could regress for single threaded initialization?
>>>
>>
>> Hi Michal,
>>
>> Thats exactly right, I am worried that we will regress when there is no
>> parallelized initialization of "struct pages" if we force unconditionally do
>> memset() in __init_single_page(). The overhead of calling memset() on a
>> smaller chunks (64-bytes) may cause the regression, this is why I opted only
>> for parallelized case to zero this metadata. This way, we are guaranteed to
>> see great improvements from this change without having regressions on
>> platforms and builds that do not support parallelized initialization of
>> "struct pages".
> 
> Have you measured that? I do not think it would be super hard to
> measure. I would be quite surprised if this added much if anything at
> all as the whole struct page should be in the cache line already. We do
> set reference count and other struct members. Almost nobody should be
> looking at our page at this time and stealing the cache line. On the
> other hand a large memcpy will basically wipe everything away from the
> cpu cache. Or am I missing something?
> 

Perhaps you are right, and I will measure on x86. But, I suspect hit can 
become unacceptable on some platfoms: there is an overhead of calling a 
function, even if it is leaf-optimized, and there is an overhead in 
memset() to check for alignments of size and address, types of setting 
(zeroing vs. non-zeroing), etc., that adds up quickly.

Pasha