linux-kernel - Re: [RFC][PATCH] Remove cgroup member from struct page (v3)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <48D2EAA1.1000301@linux.vnet.ibm.com>
Date:	Thu, 18 Sep 2008 16:56:17 -0700
From:	Balbir Singh <balbir@...ux.vnet.ibm.com>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	Nick Piggin <nickpiggin@...oo.com.au>, hugh@...itas.com,
	menage@...gle.com, xemul@...nvz.org, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [RFC][PATCH] Remove cgroup member from struct page (v3)

KAMEZAWA Hiroyuki wrote:
> On Wed, 17 Sep 2008 21:58:08 -0700
> Balbir Singh <balbir@...ux.vnet.ibm.com> wrote:
>>> BTW, I already have lazy-lru-by-pagevec protocol on my patch(hash version) and
>>> seems to work well. I'm now testing it and will post today if I'm enough lucky.
>> cool! Please do post what numbers you see as well. I would appreciate if you can
>> try this version and see what sort of performance issues you see.
>>
> 
> This is the result on 8cpu box. I think I have to reduce footprint of fastpath of
> my patch ;)
> 
> Test result of your patch is (2).
> ==
> Xeon 8cpu/2socket/1-node equips 48GB of memory.
> run shell/exec benchmark 3 times just after boot.
> 
> lps ... loops per sec.
> lpm ... loops per min.
> (*) Shell tests somtimes fail because of division by zero, etc...
> 
> (1). rc6-mm1(2008/9/13 version)
> ==
> Run                                       == 1st ==  == 2nd ==  ==3rd==
> Execl Throughput                           2425.2     2534.5     2465.8  (lps)
> C Compiler Throughput                      1438.3     1476.3     1459.1  (lpm)
> Shell Scripts (1 concurrent)               9360.3     9368.3     9360.0  (lpm)
> Shell Scripts (8 concurrent)               3868.0     3870.0     3868.0  (lpm)
> Shell Scripts (16 concurrent)              2207.0     2204.0     2201.0  (lpm)
> Dc: sqrt(2) to 99 decimal places         101644.3   102184.5   102118.5  (lpm)
> 
> (2). (1) +remove-page-cgroup-pointer-v3 (radix-tree + dynamic allocation)
> ==
> Run                                       == 1st ==  == 2nd ==  == 3rd ==
> Execl Throughput                           2514.1      2548.9    2648.7  (lps)
> C Compiler Throughput                      1353.9      1324.6    1324.7  (lpm)
> Shell Scripts (1 concurrent)               8866.7      8871.0    8856.0  (lpm)
> Shell Scripts (8 concurrent)               3674.3      3680.0    3677.7  (lpm)
> Shell Scripts (16 concurrent)              failed.     failed    2094.3  (lpm)
> Dc: sqrt(2) to 99 decimal places          98837.0     98206.9   98250.6  (lpm)
> 
> (3). (1) + pre-allocation by "vmalloc" + hash + misc(atomic flags etc..)
> ==
> Run                                       == 1st ==  == 2nd ==  == 3rd ==
> Execl Throughput                           2385.4      2579.2    2361.5  (lps)
> C Compiler Throughput                      1424.3      1436.3    1430.6  (lpm)
> Shell Scripts (1 concurrent)               9222.0      9234.0    9246.7  (lpm)
> Shell Scripts (8 concurrent)               3787.7      3799.3    failed  (lpm)
> Shell Scripts (16 concurrent)              2165.7      2166.7    failed  (lpm)
> Dc: sqrt(2) to 99 decimal places         102228.9    102658.5   104049.8 (lpm)
> 
> (4). (3) + get/put page charge/uncharge + lazy lru handling
> Run                                       == 1st ==  == 2nd ==  == 3rd ==
> Execl Throughput                           2349.4      2335.7    2338.9  (lps)
> C Compiler Throughput                      1430.8      1445.0    1435.3  (lpm)
> Shell Scripts (1 concurrent)               9250.3      9262.0    9265.0  (lpm)
> Shell Scripts (8 concurrent)               3831.0      3834.4    3833.3  (lpm)
> Shell Scripts (16 concurrent)              2193.3      2195.3    2196.0  (lpm)
> Dc: sqrt(2) to 99 decimal places         102956.8    102886.9   101884.6 (lpm)
> 
> 
> It seems "execl" test is affected by footprint and cache hit rate than other
> tests. I need some more efforts for reducing overhead in (4).
> 
> Note:
> (1)'s struct page is 64 bytes.
> (2)(3)(4)'s struct page is 56 bytes.

Thanks, Kame! I'll look at the lazy lru patches and see if I can find anything.
Do you have a unified patch anywhere, I seem to get confused with the patches, I
see 10/9, 11/9 and 12/9. I'll do some analysis when I find some free time, I am
currently at plumbers conference.

-- 
	Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/