[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65b8a658-76d1-0617-ece8-ff7a3c1c4046@oracle.com>
Date: Thu, 11 May 2017 16:59:33 -0400
From: Pasha Tatashin <pasha.tatashin@...cle.com>
To: Michal Hocko <mhocko@...nel.org>
Cc: linux-kernel@...r.kernel.org, sparclinux@...r.kernel.org,
linux-mm@...ck.org, linuxppc-dev@...ts.ozlabs.org,
linux-s390@...r.kernel.org, borntraeger@...ibm.com,
heiko.carstens@...ibm.com, davem@...emloft.net
Subject: Re: [v3 0/9] parallelized "struct page" zeroing
We should either keep memset() only for deferred struct pages as what I
have in my patches.
Another option is to add a new function struct_page_clear() which would
default to memset() and to something else on platforms that decide to
optimize it.
On SPARC it would call STBIs, and we would do one membar call after all
"struct pages" are initialized.
I think what I sent out already is cleaner and better solution, because
I am not sure what kind of performance we would see on other chips.
On 05/11/2017 04:47 PM, Pasha Tatashin wrote:
>>>
>>> Have you measured that? I do not think it would be super hard to
>>> measure. I would be quite surprised if this added much if anything at
>>> all as the whole struct page should be in the cache line already. We do
>>> set reference count and other struct members. Almost nobody should be
>>> looking at our page at this time and stealing the cache line. On the
>>> other hand a large memcpy will basically wipe everything away from the
>>> cpu cache. Or am I missing something?
>>>
>
> Here is data for single thread (deferred struct page init is disabled):
>
> Intel CPU E7-8895 v3 @ 2.60GHz 1T memory
> -----------------------------------------
> time to memset "struct pages in memblock: 11.28s
> time to init "struct pag"es: 4.90s
>
> Moving memset into __init_single_page()
> time to init and memset "struct page"es: 8.39s
>
> SPARC M6 @ 3600 MHz 1T memory
> -----------------------------------------
> time to memset "struct pages in memblock: 1.60s
> time to init "struct pag"es: 3.37s
>
> Moving memset into __init_single_page()
> time to init and memset "struct page"es: 12.99s
>
>
> So, moving memset() into __init_single_page() benefits Intel. I am
> actually surprised why memset() is so slow on intel when it is called
> from memblock. But, hurts SPARC, I guess these membars at the end of
> memset() kills the performance.
>
> Also, when looking at these values, remeber that Intel has twice as many
> "struct page" for the same amount of memory.
>
> Pasha
> --
> To unsubscribe from this list: send the line "unsubscribe sparclinux" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists