lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20080918200116.06b41fa7.kamezawa.hiroyu@jp.fujitsu.com>
Date:	Thu, 18 Sep 2008 20:01:16 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	balbir@...ux.vnet.ibm.com
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	Nick Piggin <nickpiggin@...oo.com.au>, hugh@...itas.com,
	menage@...gle.com, xemul@...nvz.org, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [RFC][PATCH] Remove cgroup member from struct page (v3)

On Wed, 17 Sep 2008 21:58:08 -0700
Balbir Singh <balbir@...ux.vnet.ibm.com> wrote:
> > BTW, I already have lazy-lru-by-pagevec protocol on my patch(hash version) and
> > seems to work well. I'm now testing it and will post today if I'm enough lucky.
> 
> cool! Please do post what numbers you see as well. I would appreciate if you can
> try this version and see what sort of performance issues you see.
> 

This is the result on 8cpu box. I think I have to reduce footprint of fastpath of
my patch ;)

Test result of your patch is (2).
==
Xeon 8cpu/2socket/1-node equips 48GB of memory.
run shell/exec benchmark 3 times just after boot.

lps ... loops per sec.
lpm ... loops per min.
(*) Shell tests somtimes fail because of division by zero, etc...

(1). rc6-mm1(2008/9/13 version)
==
Run                                       == 1st ==  == 2nd ==  ==3rd==
Execl Throughput                           2425.2     2534.5     2465.8  (lps)
C Compiler Throughput                      1438.3     1476.3     1459.1  (lpm)
Shell Scripts (1 concurrent)               9360.3     9368.3     9360.0  (lpm)
Shell Scripts (8 concurrent)               3868.0     3870.0     3868.0  (lpm)
Shell Scripts (16 concurrent)              2207.0     2204.0     2201.0  (lpm)
Dc: sqrt(2) to 99 decimal places         101644.3   102184.5   102118.5  (lpm)

(2). (1) +remove-page-cgroup-pointer-v3 (radix-tree + dynamic allocation)
==
Run                                       == 1st ==  == 2nd ==  == 3rd ==
Execl Throughput                           2514.1      2548.9    2648.7  (lps)
C Compiler Throughput                      1353.9      1324.6    1324.7  (lpm)
Shell Scripts (1 concurrent)               8866.7      8871.0    8856.0  (lpm)
Shell Scripts (8 concurrent)               3674.3      3680.0    3677.7  (lpm)
Shell Scripts (16 concurrent)              failed.     failed    2094.3  (lpm)
Dc: sqrt(2) to 99 decimal places          98837.0     98206.9   98250.6  (lpm)

(3). (1) + pre-allocation by "vmalloc" + hash + misc(atomic flags etc..)
==
Run                                       == 1st ==  == 2nd ==  == 3rd ==
Execl Throughput                           2385.4      2579.2    2361.5  (lps)
C Compiler Throughput                      1424.3      1436.3    1430.6  (lpm)
Shell Scripts (1 concurrent)               9222.0      9234.0    9246.7  (lpm)
Shell Scripts (8 concurrent)               3787.7      3799.3    failed  (lpm)
Shell Scripts (16 concurrent)              2165.7      2166.7    failed  (lpm)
Dc: sqrt(2) to 99 decimal places         102228.9    102658.5   104049.8 (lpm)

(4). (3) + get/put page charge/uncharge + lazy lru handling
Run                                       == 1st ==  == 2nd ==  == 3rd ==
Execl Throughput                           2349.4      2335.7    2338.9  (lps)
C Compiler Throughput                      1430.8      1445.0    1435.3  (lpm)
Shell Scripts (1 concurrent)               9250.3      9262.0    9265.0  (lpm)
Shell Scripts (8 concurrent)               3831.0      3834.4    3833.3  (lpm)
Shell Scripts (16 concurrent)              2193.3      2195.3    2196.0  (lpm)
Dc: sqrt(2) to 99 decimal places         102956.8    102886.9   101884.6 (lpm)


It seems "execl" test is affected by footprint and cache hit rate than other
tests. I need some more efforts for reducing overhead in (4).

Note:
(1)'s struct page is 64 bytes.
(2)(3)(4)'s struct page is 56 bytes.
 

-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ