[<prev] [next>] [day] [month] [year] [list]
Date: Mon, 6 Jun 2016 16:51:40 +0300
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: neha agarwal <neha.agbk@...il.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Hugh Dickins <hughd@...gle.com>,
Andrea Arcangeli <aarcange@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Dave Hansen <dave.hansen@...el.com>,
Vlastimil Babka <vbabka@...e.cz>,
Christoph Lameter <cl@...two.org>,
Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
Jerome Marchand <jmarchan@...hat.com>,
Yang Shi <yang.shi@...aro.org>,
Sasha Levin <sasha.levin@...cle.com>,
Andres Lagar-Cavilla <andreslc@...gle.com>,
Ning Qu <quning@...il.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCHv8 00/32] THP-enabled tmpfs/shmem using compound pages
On Wed, May 25, 2016 at 03:11:55PM -0400, neha agarwal wrote:
> Hi All,
>
> I have been testing Hugh's and Kirill's huge tmpfs patch sets with
> Cassandra (NoSQL database). I am seeing significant performance gap between
> these two implementations (~30%). Hugh's implementation performs better
> than Kirill's implementation. I am surprised why I am seeing this
> performance gap. Following is my test setup.
>
> Patchsets
> ========
> - For Hugh's:
> I checked out 4.6-rc3, applied Hugh's preliminary patches (01 to 10
> patches) from here: https://lkml.org/lkml/2016/4/5/792 and then applied the
> THP patches posted on April 16 (01 to 29 patches).
>
> - For Kirill's:
> I am using his branch "git://
> git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git hugetmpfs/v8", which
> is based off of 4.6-rc3, posted on May 12.
>
>
> Khugepaged settings
> ================
> cd /sys/kernel/mm/transparent_hugepage
> echo 10 >khugepaged/alloc_sleep_millisecs
> echo 10 >khugepaged/scan_sleep_millisecs
> echo 511 >khugepaged/max_ptes_none
>
>
> Mount options
> ===========
> - For Hugh's:
> sudo sysctl -w vm/shmem_huge=2
> sudo mount -o remount,huge=1 /hugetmpfs
>
> - For Kirill's:
> sudo mount -o remount,huge=always /hugetmpfs
> echo force > /sys/kernel/mm/transparent_hugepage/shmem_enabled
> echo 511 >khugepaged/max_ptes_swap
>
>
> Workload Setting
> =============
> Please look at the attached setup document for Cassandra (NoSQL database):
> cassandra-setup.txt
>
>
> Machine setup
> ===========
> 36-core (72 hardware thread) dual-socket x86 server with 512 GB RAM running
> Ubuntu. I use control groups for resource isolation. Server and client
> threads run on different sockets. Frequency governor set to "performance"
> to remove any performance fluctuations due to frequency variation.
>
>
> Throughput numbers
> ================
> Hugh's implementation: 74522.08 ops/sec
> Kirill's implementation: 54919.10 ops/sec
In my setup I don't see the difference:
v4.7-rc1 + my implementation:
[OVERALL], RunTime(ms), 822862.0
[OVERALL], Throughput(ops/sec), 60763.53021527304
ShmemPmdMapped: 4999168 kB
v4.6-rc2 + Hugh's implementation:
[OVERALL], RunTime(ms), 833157.0
[OVERALL], Throughput(ops/sec), 60012.698687042175
ShmemPmdMapped: 5021696 kB
It's basically within measuarment error. 'ShmemPmdMapped' indicate how
much memory is mapped with huge pages by the end of test.
It's on dual-socket 24-core machine with 64G of RAM.
I guess we have some configuration difference or something, but so far I
don't see the drastic performance difference you've pointed to.
May be my implementation behaves slower on bigger machines, I don't know..
There's no architectural reason for this.
I'll post my updated patchset today.
--
Kirill A. Shutemov
Powered by blists - more mailing lists