[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091005103733.GC3036@balbir.in.ibm.com>
Date: Mon, 5 Oct 2009 16:07:33 +0530
From: Balbir Singh <balbir@...ux.vnet.ibm.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc: "linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"nishimura@....nes.nec.co.jp" <nishimura@....nes.nec.co.jp>
Subject: Re: [PATCH 0/2] memcg: improving scalability by reducing lock
contention at charge/uncharge
* KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> [2009-10-02 13:55:31]:
> Hi,
>
> This patch is against mmotm + softlimit fix patches.
> (which are now in -rc git tree.)
>
> In the latest -rc series, the kernel avoids accessing res_counter when
> cgroup is root cgroup. This helps scalabilty when memcg is not used.
>
> It's necessary to improve scalabilty even when memcg is used. This patch
> is for that. Previous Balbir's work shows that the biggest obstacles for
> better scalabilty is memcg's res_counter. Then, there are 2 ways.
>
> (1) make counter scale well.
> (2) avoid accessing core counter as much as possible.
>
> My first direction was (1). But no, there is no counter which is free
> from false sharing when it needs system-wide fine grain synchronization.
> And res_counter has several functionality...this makes (1) difficult.
> spin_lock (in slow path) around counter means tons of invalidation will
> happen even when we just access counter without modification.
>
> This patch series is for (2). This implements charge/uncharge in bached manner.
> This coalesces access to res_counter at charge/uncharge using nature of
> access locality.
>
> Tested for a month. And I got good reorts from Balbir and Nishimura, thanks.
> One concern is that this adds some members to the bottom of task_struct.
> Better idea is welcome.
>
> Following is test result of continuous page-fault on my 8cpu box(x86-64).
>
> A loop like this runs on all cpus in parallel for 60secs.
> ==
> while (1) {
> x = mmap(NULL, MEGA, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, 0, 0);
>
> for (off = 0; off < MEGA; off += PAGE_SIZE)
> x[off]=0;
> munmap(x, MEGA);
> }
> ==
> please see # of page faults. I think this is good improvement.
>
>
> [Before]
> Performance counter stats for './runpause.sh' (5 runs):
>
> 474539.756944 task-clock-msecs # 7.890 CPUs ( +- 0.015% )
> 10284 context-switches # 0.000 M/sec ( +- 0.156% )
> 12 CPU-migrations # 0.000 M/sec ( +- 0.000% )
> 18425800 page-faults # 0.039 M/sec ( +- 0.107% )
> 1486296285360 cycles # 3132.080 M/sec ( +- 0.029% )
> 380334406216 instructions # 0.256 IPC ( +- 0.058% )
> 3274206662 cache-references # 6.900 M/sec ( +- 0.453% )
> 1272947699 cache-misses # 2.682 M/sec ( +- 0.118% )
>
> 60.147907341 seconds time elapsed ( +- 0.010% )
>
> [After]
> Performance counter stats for './runpause.sh' (5 runs):
>
> 474658.997489 task-clock-msecs # 7.891 CPUs ( +- 0.006% )
> 10250 context-switches # 0.000 M/sec ( +- 0.020% )
> 11 CPU-migrations # 0.000 M/sec ( +- 0.000% )
> 33177858 page-faults # 0.070 M/sec ( +- 0.152% )
> 1485264748476 cycles # 3129.120 M/sec ( +- 0.021% )
> 409847004519 instructions # 0.276 IPC ( +- 0.123% )
> 3237478723 cache-references # 6.821 M/sec ( +- 0.574% )
> 1182572827 cache-misses # 2.491 M/sec ( +- 0.179% )
>
> 60.151786309 seconds time elapsed ( +- 0.014% )
>
I agree, I liked the previous patchset, let me re-review this one!
Definitely a good candidate to -mm.
--
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists