[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1494978657.21847.74.camel@au1.ibm.com>
Date: Wed, 17 May 2017 09:50:57 +1000
From: Benjamin Herrenschmidt <benh@....ibm.com>
To: David Miller <davem@...emloft.net>, pasha.tatashin@...cle.com
Cc: linux-s390@...r.kernel.org, borntraeger@...ibm.com,
heiko.carstens@...ibm.com, linux-kernel@...r.kernel.org,
mhocko@...nel.org, linux-mm@...ck.org, sparclinux@...r.kernel.org,
linuxppc-dev@...ts.ozlabs.org
Subject: Re: [v3 0/9] parallelized "struct page" zeroing
On Fri, 2017-05-12 at 13:37 -0400, David Miller wrote:
> > Right now it is larger, but what I suggested is to add a new optimized
> > routine just for this case, which would do STBI for 64-bytes but
> > without membar (do membar at the end of memmap_init_zone() and
> > deferred_init_memmap()
> >
> > #define struct_page_clear(page) \
> > __asm__ __volatile__( \
> > "stxa %%g0, [%0]%2\n" \
> > "stxa %%xg0, [%0 + %1]%2\n" \
> > : /* No output */ \
> > : "r" (page), "r" (0x20), "i"(ASI_BLK_INIT_QUAD_LDD_P))
> >
> > And insert it into __init_single_page() instead of memset()
> >
> > The final result is 4.01s/T which is even faster compared to current
> > 4.97s/T
>
> Ok, indeed, that would work.
On ppc64, that might not. We have a dcbz instruction that clears an
entire cache line at once. That's what we use for memset's and page
clearing. However, 64 bytes is half a cache line on modern processors
so we can't use it with that semantic and would have to fallback to the
slower stores.
Cheers,
Ben.
Powered by blists - more mailing lists