[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6599ad830803280705o4213c448r991cbf9da6ffe2f1@mail.gmail.com>
Date: Fri, 28 Mar 2008 07:05:18 -0700
From: "Paul Menage" <menage@...gle.com>
To: balbir@...ux.vnet.ibm.com
Cc: "Pavel Emelianov" <xemul@...nvz.org>,
"Hugh Dickins" <hugh@...itas.com>,
"Sudhir Kumar" <skumar@...ux.vnet.ibm.com>,
"YAMAMOTO Takashi" <yamamoto@...inux.co.jp>, lizf@...fujitsu.com,
linux-kernel@...r.kernel.org, taka@...inux.co.jp,
linux-mm@...ck.org, "David Rientjes" <rientjes@...gle.com>,
"Andrew Morton" <akpm@...ux-foundation.org>,
"KAMEZAWA Hiroyuki" <kamezawa.hiroyu@...fujitsu.com>
Subject: Re: [-mm] Add an owner to the mm_struct (v2)
On Fri, Mar 28, 2008 at 5:36 AM, Balbir Singh <balbir@...ux.vnet.ibm.com> wrote:
>
> > - you hold task_lock() for the new owner (which is necessary anyway to
> > ensure that the new owner's mm doesn't change while you're updating
> > mm->owner)
> >
>
> tsk->mm should not change unless the task is exiting or when a kernel thread
> does use_mm() (PF_BORROWED_MM).
>
Or the task calls execve().
> I see mm->owner changing when
>
> 1. The mm->owner exits
> 2. At fork time for clone calls with CLONE_VM
Why would a clone() call with CLONE_VM change mm->owner? We're sharing
with the existing owner, who is still using the mm, so we should leave
it as it is.
I don't see what invariant you're trying to protect with
mm->owner_lock. Can you explain it?
> >> @@ -1357,6 +1379,10 @@ static struct task_struct *copy_process(
> >> write_unlock_irq(&tasklist_lock);
> >> proc_fork_connector(p);
> >> cgroup_post_fork(p);
> >> +
> >> + if (!(clone_flags & CLONE_VM) && (p != p->group_leader))
> >> + mm_fork_init_owner(p);
> >> +
> >
> > I'm not sure I understand what this is doing.
> >
> > I read it as "if p has its own mm and p is a child thread, set
> > p->mm->owner to p->group_leader". But by definition if p has its own
> > mm, then p->group_leader->mm will be different to p->mm, therefore
> > we'd end up with mm->owner->mm != mm, which seems very bad.
> >
> > What's the intention of this bit of code?
> >
>
> The intention is to handle the case when clone is called without CLONE_VM and
> with CLONE_THREAD. This means that p can have it's own mm and a shared group_leader.
>
In which case p should be the owner of its new mm, not
p->group_leader. mm_fork_init_owner() does:
> + if (mm->owner != p)
> + rcu_assign_pointer(mm->owner, p->group_leader);
Since we just created a fresh mm for p in copy_mm(), and set mm->owner
= p, I don't see how this test can succeed. And if it does, then we
end up setting p->mm->owner to p->group_leader, which has to be bad
since we know that:
- p is the only user of its mm
- p != p->group_leader
therefore we're setting p->mm->owner to point to a process that
doesn't use p->mm. What am I missing?
> >> +/*
> >> + * Task p is exiting and it owned p, so lets find a new owner for it
> >> + */
> >> +static inline int
> >> +mm_need_new_owner(struct mm_struct *mm, struct task_struct *p)
> >> +{
> >> + int ret;
> >> +
> >> + rcu_read_lock();
> >> + ret = (mm && (rcu_dereference(mm->owner) == p) &&
> >> + (atomic_read(&mm->mm_users) > 1));
> >> + rcu_read_unlock();
> >> + return ret;
> >
> > The only way that rcu_read_lock() helps here is if mm freeing is
> > protected by RCU, which I don't think is the case.
> >
>
> rcu_read_lock() also ensures that preemption does not cause us to see incorrect
> values.
>
The only thing that RCU is protecting us against here is that if
mm->owner is non-NULL, the task it points to won't be freed until the
end of our RCU read section. But since we never dereference mm->owner
in the RCU read section, that buys us nothing and is just confusing.
>
> > But as long as p==current, there's no race, since no other process
> > will re-point mm->owner at themselves, so mm can't go away anyway
> > since we have a reference to it that we're going to be dropping soon.
> >
>
> mm cannot go away, but mm->owner can be different from current and could be
> going away.
But that's a problem for mm->owner, not for current.
By this point we've already cleared current->mm, so we're no longer a
candidate for becoming mm->owner if we're not already the owner.
Therefore the truth or falsity of (mm->owner == current) can't be
changed by races - either we are already the owner and no-one but us
can change mm->owner, or we're not the owner and no one can point
mm->owner at us.
> >> +
> >> + if (!mm_need_new_owner(mm, p))
> >> + return;
> >> +
> >> + /*
> >> + * Search in the children
> >> + */
> >> + list_for_each_entry(c, &p->children, sibling) {
> >> + if (c->mm == p->mm)
Oh, and at the point in exit_mm() when you call
mm_update_next_owner(), p->mm is NULL, so this will only match against
tasks that have no mm.
> >> + goto assign_new_owner;
> >> + }
> >
> > We need to keep checking mm_need_new_owner() since it can become false
> > if the only other user of the mm exits at the same time that we do.
> > (In which case there's nothing to do).
> >
>
> I would rather deal with the case where mm->owner is NULL, rather than keep
> checking
I mean that it's possible that there's one other task using mm at the
point when we enter mm_update_next_owner(), but while we're trying to
find it, it does an exit() or execve() and stops being a user of mm.
if that happens, we should bail from the search since we no longer
need to find another user to pass it to (and there is no other user to
find).
> (since even with constant checking we cannot guarantee that mm->owner
> will not become NULL)
>
>
> >> + * Search through everything else. We should not get
> >> + * here often
> >> + */
> >> + for_each_process(c) {
> >> + g = c;
> >> + do {
> >> + if (c->mm && (c->mm == p->mm))
> >> + goto assign_new_owner;
> >> + } while ((c = next_thread(c)) != g);
> >> + }
> >
> > Is there a reason to not code this as for_each_thread?
> >
>
> Is there a for_each_thread()?
>
Sorry, I meant do_each_thread()/while_each_thread(). Isn't that
basically the same thing as what you've explicitly coded here?
> >> +assign_new_owner:
> >> + spin_lock(&mm->owner_lock);
> >> + rcu_assign_pointer(mm->owner, c);
> >> + spin_unlock(&mm->owner_lock);
> >> +}
> >
> > This can break if c is also exiting and has passed the call to
> > mm_update_next_owner() by the time we assign mm->owner. That's why my
> > original suggested version had a function like:
> >
>
> Won't it better to check for c->flags & PF_EXITING?
>
Or if c is execing and gives up the mm that way.
>
> > static inline void try_give_mm_ownership(struct task_struct *task,
> > struct mm_struct *mm) {
> > if (task->mm != mm) return;
> > task_lock(task);
> > if (task->mm == mm) {
> > mm->owner = task;
> > }
> > task_unlock(task);
> > }
> >
> > i.e. determining that a task is a valid candidate and updating the
> > owner pointer has to be done in the same critical section.
> >
>
> Let me try and revamp the locking rules and see what that leads to. But, I don't
> like protecting an mm_struct's member with a task_struct's lock.
We're not protecting mm->owner with task_lock(new_owner). As I said,
the locking rule for mm->owner should that it can only be changed if
(current==mm->owner). What we're protecting is new_owner->mm - i.e.
we're ensuring that we don't break the invariant that (mm->owner->mm
== mm)
> >> config CGROUP_MEM_RES_CTLR
> >> bool "Memory Resource Controller for Control Groups"
> >> - depends on CGROUPS && RESOURCE_COUNTERS
> >> + depends on CGROUPS && RESOURCE_COUNTERS && MM_OWNER
> >
> > Maybe this should select MM_OWNER rather than depending on it?
> >
>
> I thought of it, but wondered if the user should make an informed choice about
> MM_OWNER and the overhead it brings along.
I would vote the other way on that - make it clear in the memory
controller config help that the memory controller can introduce
overhead, and have it implicitly enable MM_OWNER.
Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists