[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20170601084609.GF32677@dhcp22.suse.cz>
Date: Thu, 1 Jun 2017 10:46:09 +0200
From: Michal Hocko <mhocko@...nel.org>
To: Pasha Tatashin <pasha.tatashin@...cle.com>
Cc: linux-kernel@...r.kernel.org, sparclinux@...r.kernel.org,
linux-mm@...ck.org, linuxppc-dev@...ts.ozlabs.org,
linux-s390@...r.kernel.org, borntraeger@...ibm.com,
heiko.carstens@...ibm.com, davem@...emloft.net
Subject: Re: [v3 0/9] parallelized "struct page" zeroing
On Wed 31-05-17 23:35:48, Pasha Tatashin wrote:
> >OK, so why cannot we make zero_struct_page 8x 8B stores, other arches
> >would do memset. You said it would be slower but would that be
> >measurable? I am sorry to be so persistent here but I would be really
> >happier if this didn't depend on the deferred initialization. If this is
> >absolutely a no-go then I can live with that of course.
>
> Hi Michal,
>
> This is actually a very good idea. I just did some measurements, and it
> looks like performance is very good.
>
> Here is data from SPARC-M7 with 3312G memory with single thread performance:
>
> Current:
> memset() in memblock allocator takes: 8.83s
> __init_single_page() take: 8.63s
>
> Option 1:
> memset() in __init_single_page() takes: 61.09s (as we discussed because of
> membar overhead, memset should really be optimized to do STBI only when size
> is 1 page or bigger).
>
> Option 2:
>
> 8 stores (stx) in __init_single_page(): 8.525s!
>
> So, even for single thread performance we can double the initialization
> speed of "struct page" on SPARC by removing memset() from memblock, and
> using 8 stx in __init_single_page(). It appears we never miss L1 in
> __init_single_page() after the initial 8 stx.
OK, that is good to hear and it actually matches my understanding that
writes to a single cacheline should add an overhead.
Thanks!
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists