[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2fa60098-d9be-f57d-cb86-3b55cfe915b7@oracle.com>
Date: Wed, 31 May 2017 23:35:48 -0400
From: Pasha Tatashin <pasha.tatashin@...cle.com>
To: Michal Hocko <mhocko@...nel.org>
Cc: linux-kernel@...r.kernel.org, sparclinux@...r.kernel.org,
linux-mm@...ck.org, linuxppc-dev@...ts.ozlabs.org,
linux-s390@...r.kernel.org, borntraeger@...ibm.com,
heiko.carstens@...ibm.com, davem@...emloft.net
Subject: Re: [v3 0/9] parallelized "struct page" zeroing
> OK, so why cannot we make zero_struct_page 8x 8B stores, other arches
> would do memset. You said it would be slower but would that be
> measurable? I am sorry to be so persistent here but I would be really
> happier if this didn't depend on the deferred initialization. If this is
> absolutely a no-go then I can live with that of course.
Hi Michal,
This is actually a very good idea. I just did some measurements, and it
looks like performance is very good.
Here is data from SPARC-M7 with 3312G memory with single thread performance:
Current:
memset() in memblock allocator takes: 8.83s
__init_single_page() take: 8.63s
Option 1:
memset() in __init_single_page() takes: 61.09s (as we discussed because
of membar overhead, memset should really be optimized to do STBI only
when size is 1 page or bigger).
Option 2:
8 stores (stx) in __init_single_page(): 8.525s!
So, even for single thread performance we can double the initialization
speed of "struct page" on SPARC by removing memset() from memblock, and
using 8 stx in __init_single_page(). It appears we never miss L1 in
__init_single_page() after the initial 8 stx.
I will update patches with memset() on other platforms, and stx on SPARC.
My experimental code looks like this:
static void __meminit __init_single_page(struct page *page, unsigned
long pfn, unsigned long zone, int nid)
{
__asm__ __volatile__(
"stx %%g0, [%0 + 0x00]\n"
"stx %%g0, [%0 + 0x08]\n"
"stx %%g0, [%0 + 0x10]\n"
"stx %%g0, [%0 + 0x18]\n"
"stx %%g0, [%0 + 0x20]\n"
"stx %%g0, [%0 + 0x28]\n"
"stx %%g0, [%0 + 0x30]\n"
"stx %%g0, [%0 + 0x38]\n"
:
:"r"(page));
set_page_links(page, zone, nid, pfn);
init_page_count(page);
page_mapcount_reset(page);
page_cpupid_reset_last(page);
INIT_LIST_HEAD(&page->lru);
#ifdef WANT_PAGE_VIRTUAL
/* The shift won't overflow because ZONE_NORMAL is below 4G. */
if (!is_highmem_idx(zone))
set_page_address(page, __va(pfn << PAGE_SHIFT));
#endif
}
Thank you,
Pasha
Powered by blists - more mailing lists