[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20080918200116.06b41fa7.kamezawa.hiroyu@jp.fujitsu.com>
Date: Thu, 18 Sep 2008 20:01:16 +0900
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To: balbir@...ux.vnet.ibm.com
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Dave Hansen <dave@...ux.vnet.ibm.com>,
Nick Piggin <nickpiggin@...oo.com.au>, hugh@...itas.com,
menage@...gle.com, xemul@...nvz.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: [RFC][PATCH] Remove cgroup member from struct page (v3)
On Wed, 17 Sep 2008 21:58:08 -0700
Balbir Singh <balbir@...ux.vnet.ibm.com> wrote:
> > BTW, I already have lazy-lru-by-pagevec protocol on my patch(hash version) and
> > seems to work well. I'm now testing it and will post today if I'm enough lucky.
>
> cool! Please do post what numbers you see as well. I would appreciate if you can
> try this version and see what sort of performance issues you see.
>
This is the result on 8cpu box. I think I have to reduce footprint of fastpath of
my patch ;)
Test result of your patch is (2).
==
Xeon 8cpu/2socket/1-node equips 48GB of memory.
run shell/exec benchmark 3 times just after boot.
lps ... loops per sec.
lpm ... loops per min.
(*) Shell tests somtimes fail because of division by zero, etc...
(1). rc6-mm1(2008/9/13 version)
==
Run == 1st == == 2nd == ==3rd==
Execl Throughput 2425.2 2534.5 2465.8 (lps)
C Compiler Throughput 1438.3 1476.3 1459.1 (lpm)
Shell Scripts (1 concurrent) 9360.3 9368.3 9360.0 (lpm)
Shell Scripts (8 concurrent) 3868.0 3870.0 3868.0 (lpm)
Shell Scripts (16 concurrent) 2207.0 2204.0 2201.0 (lpm)
Dc: sqrt(2) to 99 decimal places 101644.3 102184.5 102118.5 (lpm)
(2). (1) +remove-page-cgroup-pointer-v3 (radix-tree + dynamic allocation)
==
Run == 1st == == 2nd == == 3rd ==
Execl Throughput 2514.1 2548.9 2648.7 (lps)
C Compiler Throughput 1353.9 1324.6 1324.7 (lpm)
Shell Scripts (1 concurrent) 8866.7 8871.0 8856.0 (lpm)
Shell Scripts (8 concurrent) 3674.3 3680.0 3677.7 (lpm)
Shell Scripts (16 concurrent) failed. failed 2094.3 (lpm)
Dc: sqrt(2) to 99 decimal places 98837.0 98206.9 98250.6 (lpm)
(3). (1) + pre-allocation by "vmalloc" + hash + misc(atomic flags etc..)
==
Run == 1st == == 2nd == == 3rd ==
Execl Throughput 2385.4 2579.2 2361.5 (lps)
C Compiler Throughput 1424.3 1436.3 1430.6 (lpm)
Shell Scripts (1 concurrent) 9222.0 9234.0 9246.7 (lpm)
Shell Scripts (8 concurrent) 3787.7 3799.3 failed (lpm)
Shell Scripts (16 concurrent) 2165.7 2166.7 failed (lpm)
Dc: sqrt(2) to 99 decimal places 102228.9 102658.5 104049.8 (lpm)
(4). (3) + get/put page charge/uncharge + lazy lru handling
Run == 1st == == 2nd == == 3rd ==
Execl Throughput 2349.4 2335.7 2338.9 (lps)
C Compiler Throughput 1430.8 1445.0 1435.3 (lpm)
Shell Scripts (1 concurrent) 9250.3 9262.0 9265.0 (lpm)
Shell Scripts (8 concurrent) 3831.0 3834.4 3833.3 (lpm)
Shell Scripts (16 concurrent) 2193.3 2195.3 2196.0 (lpm)
Dc: sqrt(2) to 99 decimal places 102956.8 102886.9 101884.6 (lpm)
It seems "execl" test is affected by footprint and cache hit rate than other
tests. I need some more efforts for reducing overhead in (4).
Note:
(1)'s struct page is 64 bytes.
(2)(3)(4)'s struct page is 56 bytes.
-Kame
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists