[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTinz9=i+wYzBFf0iu_m_C=+t72iiyELoivigxYkH@mail.gmail.com>
Date: Fri, 17 Sep 2010 20:47:36 +0900
From: Hiroyuki Kamezawa <kamezawa.hiroyuki@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"balbir@...ux.vnet.ibm.com" <balbir@...ux.vnet.ibm.com>,
"nishimura@....nes.nec.co.jp" <nishimura@....nes.nec.co.jp>
Subject: Re: [PATCH][-mm] memcg : memory cgroup cpu hotplug support update.
2010/9/17 Andrew Morton <akpm@...ux-foundation.org>:
> On Thu, 16 Sep 2010 14:46:18 +0900
> KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> wrote:
>
>> This is onto The mm-of-the-moment snapshot 2010-09-15-16-21.
>>
>> ==
>> From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
>>
>> Now, memory cgroup uses for_each_possible_cpu() for percpu stat handling.
>> It's just because cpu hotplug handler doesn't handle them.
>> On the other hand, per-cpu usage counter cache is maintained per cpu and
>> it's cpu hotplug aware.
>>
>> This patch adds a cpu hotplug hanlder and replaces for_each_possible_cpu()
>> with for_each_online_cpu(). And this merges new callbacks with old
>> callbacks.(IOW, memcg has only one cpu-hotplug handler.)
>>
>> For this purpose, mem_cgroup_walk_all() is added.
>>
>> ...
>>
>> @@ -537,7 +540,7 @@ static s64 mem_cgroup_read_stat(struct m
>> int cpu;
>> s64 val = 0;
>>
>> - for_each_possible_cpu(cpu)
>> + for_each_online_cpu(cpu)
>> val += per_cpu(mem->stat->count[idx], cpu);
>
> Can someone remind me again why all this code couldn't use
> percpu-counters?
>
The design was based on vmstat[] and some other reasons.
IIUC, it doesn't has good memory layout when it used as "array".
spinlock
counter
list_head
percpu pointer
This seems big and not cache friendly to me. I want a memory layout
like vmstat[].
If someone requests, I may able to write a patch of percpu_coutner_array.
And, percpu counter is used with core value + percpu value and does
synchronization
with some thresholds.
memcg's counter is used for 2 purposes as
- counters #
- per cpu event counter # don't need any synchronization.
Then, this is as it is now.
>> return val;
>> }
>> @@ -700,6 +703,35 @@ static inline bool mem_cgroup_is_root(st
>> return (mem == root_mem_cgroup);
>> }
>>
>> +static int mem_cgroup_walk_all(void *data,
>> + int (*func)(struct mem_cgroup *, void *))
>> +{
>> + int found, ret, nextid;
>> + struct cgroup_subsys_state *css;
>> + struct mem_cgroup *mem;
>> +
>> + nextid = 1;
>> + do {
>> + ret = 0;
>> + mem = NULL;
>> +
>> + rcu_read_lock();
>> + css = css_get_next(&mem_cgroup_subsys, nextid,
>> + &root_mem_cgroup->css, &found);
>> + if (css && css_tryget(css))
>> + mem = container_of(css, struct mem_cgroup, css);
>> + rcu_read_unlock();
>> +
>> + if (mem) {
>> + ret = (*func)(mem, data);
>> + css_put(&mem->css);
>> + }
>> + nextid = found + 1;
>> + } while (!ret && css);
>> +
>> + return ret;
>> +}
>
> It would be better to convert `void *data' to `unsigned cpu' within the
> caller of this function rather than adding the typecast to each
> function which this function calls. So this becomes
>
> static int mem_cgroup_walk_all(unsigned cpu,
> int (*func)(struct mem_cgroup *memcg, unsigned cpu))
>
Hmm. As generic function, I may have to add void *data...we already have
- mem_cgroup_walk_tree() # check hierarchy subtree, not walk all.
This function itself doesn't assume any context of its caller.
(But see below)
>
>> +/*
>> + * CPU Hotplug handling.
>> + */
>> +static int synchronize_move_stat(struct mem_cgroup *mem, void *data)
>> +{
>> + long cpu = (long)data;
>> + s64 x = this_cpu_read(mem->stat->count[MEM_CGROUP_ON_MOVE]);
>> + /* All cpus should have the same value */
>> + per_cpu(mem->stat->count[MEM_CGROUP_ON_MOVE], cpu) = x;
>> + return 0;
>> +}
>> +
>> +static int drain_all_percpu(struct mem_cgroup *mem, void *data)
>> +{
>> + long cpu = (long)(data);
>> + int i;
>> + /* Drain data from dying cpu and move to local cpu */
>> + for (i = 0; i < MEM_CGROUP_STAT_DATA; i++) {
>> + s64 data = per_cpu(mem->stat->count[i], cpu);
>> + per_cpu(mem->stat->count[i], cpu) = 0;
>> + this_cpu_add(mem->stat->count[i], data);
>> + }
>> + /* Reset Move Count */
>> + per_cpu(mem->stat->count[MEM_CGROUP_ON_MOVE], cpu) = 0;
>> + return 0;
>> +}
>
> Some nice comments would be nice.
>
> I don't immediately see anything which guarantees that preemption (and
> cpu migration) are disabled here. It would be an odd thing to permit
> migration within a cpu-hotplug handler, but where did we guarantee it?
Above code doesn't assume preempt_disable(). Just modify a DEAD cpu's counter.
this_cpu_add() is preempt-safe. I'll add a comment.
> Also, the code appears to assume that the current CPU is the one which
> is being onlined. What guaranteed that? This is not the case for
> enable_nonboot_cpus().
>
I thought DEAD cpu is not on scheduler. DEAD notify is done after
cpu_disable().
Hmm, ONLINE handler may have some trouble, I'll write a fix. It's easy.
> It's conventional to put a blank line between end-of-locals and
> start-of-code. This patch ignored that convention rather a lot.
>
I tend to do that, my mistake.
> The comments in this patch Have Rather Strange Capitalisation Decisions.
>
Ah, sorry.
>> +static int __cpuinit memcg_cpuhotplug_callback(struct notifier_block *nb,
>> + unsigned long action,
>> + void *hcpu)
>> +{
>> + long cpu = (unsigned long)hcpu;
>> + struct memcg_stock_pcp *stock;
>> +
>> + if (action == CPU_ONLINE) {
>> + mem_cgroup_walk_all((void *)cpu, synchronize_move_stat);
>
> More typecasts which can go away if we make the above change to
> mem_cgroup_walk_all().
>
hmm, I'll rename the function as mem_cgroup_walk_all_cpu().
Thank you for review.
I'll write an update but it may take 3-4days, sorry.
-Kame
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists