[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20100510091726.f9a0642f.kamezawa.hiroyu@jp.fujitsu.com>
Date: Mon, 10 May 2010 09:17:26 +0900
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
linux-kernel@...r.kernel.org, mingo@...e.hu, laijs@...fujitsu.com,
dipankar@...ibm.com, mathieu.desnoyers@...ymtl.ca,
josh@...htriplett.org, dvhltc@...ibm.com, niv@...ibm.com,
tglx@...utronix.de, peterz@...radead.org, rostedt@...dmis.org,
Valdis.Kletnieks@...edu, dhowells@...hat.com,
eric.dumazet@...il.com,
Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
Balbir Singh <balbir@...ux.vnet.ibm.com>
Subject: Re: [PATCH tip/core/urgent 08/10] memcg: css_id() must be called
under rcu_read_lock()
On Fri, 7 May 2010 12:11:38 -0700
Andrew Morton <akpm@...ux-foundation.org> wrote:
> On Mon, 3 May 2010 11:53:17 -0700
> "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> wrote:
>
> > This patch fixes task_in_mem_cgroup(), mem_cgroup_uncharge_swapcache(),
> > mem_cgroup_move_swap_account(), and is_target_pte_for_mc() to protect
> > calls to css_id(). An additional RCU lockdep splat was reported for
> > memcg_oom_wake_function(), however, this function is not yet in
> > mainline as of 2.6.34-rc5.
> >
> > ...
> >
> > index f4ede99..e06490d 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -811,10 +811,12 @@ int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *mem)
> > * enabled in "curr" and "curr" is a child of "mem" in *cgroup*
> > * hierarchy(even if use_hierarchy is disabled in "mem").
> > */
> > + rcu_read_lock();
> > if (mem->use_hierarchy)
> > ret = css_is_ancestor(&curr->css, &mem->css);
> > else
> > ret = (curr == mem);
> > + rcu_read_unlock();
> > css_put(&curr->css);
> > return ret;
> > }
>
> The above hunk seems to be locking around css_is_ancestor(), not
> css_id() as the changelog states.
>
Hmm. I'll move rcu_xxx to cgroup.c::css_is_ancestor().
(But .....because we have css's reference count, rcu_read_lock isn't
necessary...lock-check founds it as bug but this was intentional.)
> > @@ -2312,7 +2314,9 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
> >
> > /* record memcg information */
> > if (do_swap_account && swapout && memcg) {
> > + rcu_read_lock();
> > swap_cgroup_record(ent, css_id(&memcg->css));
> > + rcu_read_unlock();
> > mem_cgroup_get(memcg);
> > }
> > if (swapout && memcg)
>
> That makes some sense - the lock is held across the call and the use of
> the result of the call.
>
>
> > @@ -2369,8 +2373,10 @@ static int mem_cgroup_move_swap_account(swp_entry_t entry,
> > {
> > unsigned short old_id, new_id;
> >
> > + rcu_read_lock();
> > old_id = css_id(&from->css);
> > new_id = css_id(&to->css);
> > + rcu_read_unlock();
> >
> > if (swap_cgroup_cmpxchg(entry, old_id, new_id) == old_id) {
> > mem_cgroup_swap_statistics(from, false);
>
> This doesn't make sense. We take the lock, read the values, drop the
> lock and then use the now-possibly-wrong values.
>
will fix.
> > @@ -4038,11 +4044,16 @@ static int is_target_pte_for_mc(struct vm_area_struct *vma,
> > put_page(page);
> > }
> > /* throught */
> > - if (ent.val && do_swap_account && !ret &&
> > - css_id(&mc.from->css) == lookup_swap_cgroup(ent)) {
> > - ret = MC_TARGET_SWAP;
> > - if (target)
> > - target->ent = ent;
> > + if (ent.val && do_swap_account && !ret) {
> > + unsigned short id;
>
> Please put a newline between end-of-locals and start-of-code.
>
will fix.
> > + rcu_read_lock();
> > + id = css_id(&mc.from->css);
> > + rcu_read_unlock();
> > + if (id == lookup_swap_cgroup(ent)) {
> > + ret = MC_TARGET_SWAP;
> > + if (target)
> > + target->ent = ent;
> > + }
>
> Again, when we use `id', the lock has been dropped. The value which
> css_id() returned might no longer be correct.
>
>
will fix.
>
> The merge of this patch caused rejections in -mm's
> memcg-clean-up-move-charge.patch (at least). It may have caused more,
> I haven't checked yet. The code I have here now does:
>
> if (ent.val && !ret) {
> unsigned short id;
>
> rcu_read_lock();
> id = css_id(&mc.from->css);
> rcu_read_unlock();
> if (id == lookup_swap_cgroup(ent)) {
> ret = MC_TARGET_SWAP;
> if (target)
> target->ent = ent;
> }
> }
>
> however I suspect it would be saner to do
>
> if (ent.val && !ret) {
> rcu_read_lock();
> if (css_id(&mc.from->css) == lookup_swap_cgroup(ent)) {
> ret = MC_TARGET_SWAP;
> if (target)
> target->ent = ent;
> }
> rcu_read_unlock();
> }
>
I'll prepare for -rc6 patch and for -mm patch.
> However this still doesn't make a lot of sense because three nanoseonds
> after we've done rcu_read_unlock(), the value of
>
> css_id(&mc.from->css) == lookup_swap_cgroup(ent)
>
> might have changed. So I'd ask the memcg guys to have a more serious
> think about all of this please. I get the feeling that the original
> patch just splattered rcu_read_lock() around the place to silence a
> runtime warning without digging into what the code is really supposed
> to be doing.
>
In most case, they are intentional and we have reference count of css.
I can think of
- css_id_rcu() .... use rcu_dereference().
- css_id() ... don't use rcu_dereference().
But this seems crazy.
> The mem_cgroup_move_swap_account() would benefit from some attention
> also please.
ok, I'll rewrite. If I find that I can't avoid rejection to -mm, I'll make
a patch for -rc6 to do minimal fixes. And add a patch for fixining remaining
things to -mm.
Thanks,
-Kame
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists