[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6599ad830803280838s19ffc366w1a950ebb12e2907b@mail.gmail.com>
Date: Fri, 28 Mar 2008 08:38:16 -0700
From: "Paul Menage" <menage@...gle.com>
To: balbir@...ux.vnet.ibm.com
Cc: "Pavel Emelianov" <xemul@...nvz.org>,
"Hugh Dickins" <hugh@...itas.com>,
"Sudhir Kumar" <skumar@...ux.vnet.ibm.com>,
"YAMAMOTO Takashi" <yamamoto@...inux.co.jp>, lizf@...fujitsu.com,
linux-kernel@...r.kernel.org, taka@...inux.co.jp,
linux-mm@...ck.org, "David Rientjes" <rientjes@...gle.com>,
"Andrew Morton" <akpm@...ux-foundation.org>,
"KAMEZAWA Hiroyuki" <kamezawa.hiroyu@...fujitsu.com>
Subject: Re: [-mm] Add an owner to the mm_struct (v2)
On Fri, Mar 28, 2008 at 7:52 AM, Balbir Singh <balbir@...ux.vnet.ibm.com> wrote:
>
> mm->owner_lock is there to protect mm->owner field from changing simultaneously
> as tasks fork/exit.
>
But the *hardware* already does that for you - individual writes to
pointers are already atomic operations and so will be serialized.
Using a lock to guard something only does anything useful if at least
one of the critical regions that takes the lock consists of more than
a single atomic operation, or if you have a mixture of read sections
and write sections. Now it's true that your critical region in
mm_fork_init_owner() is more than a single atomic op, but I'm arguing
below that it's a no-op. So that just leaves the single region
spin_lock(&mm->owner_lock);
mm->owner = new_owner;
spin_unlock(&mm->owner_lock);
which isn't observably different if you remove the spinlock.
>
> Oh! yes.. my bad again. The check should have been p == p->thread_group, but
> that is not required either. The check should now ideally be
>
> if (!(clone_flags & CLONE_VM))
>
OK, so if the new thread has its own mm (and hence will already have
mm->owner set up to point to p in mm_init()) then we do:
> + if (mm->owner != p)
> + rcu_assign_pointer(mm->owner, p->group_leader);
which is a no-op since we know mm->owner == p.
>
> Yes.. I think we need to call it earlier.
>
No, I think we need to call it later - after we've cleared current->mm
(from within task_lock(current)) - so we can't rely on p->mm in this
function, we have to pass it in. If we call it before while
current->mm == mm, then we risk a race where the (new or existing)
owner exits and passes it back to us *after* we've done a check to see
if we need to find a new owner. If we ensure that current->mm != mm
before we call mm_update_next_owner(), then we know we're not a
candidate for receiving the ownership if we don't have it already.
>
> But there is no way to guarantee that, what is the new_owner exec's after we've
> done the check and assigned. Won't we end up breaking the invariant? How about
> we have mm_update_new_owner() call in exec_mmap() as well? That way, we can
> still use owner_lock and keep the invariant.
>
Oops, I thought that exit_mm() already got called in the execve()
path, but you're right, it doesn't.
Yes, exit_mmap() should call mm_update_next_owner() after the call to
task_unlock(), i.e. after it's set its new mm.
So I need to express the invariant more carefully.
What we need to preserve is that, for every mm at all times, mm->owner
points to a valid task. So either:
1) mm->owner->mm == mm AND mm->owner will check to see whether it
needs to pass ownership before it exits or execs.
OR
2) mm->owner is the last user of mm and is about to free mm.
OR
3) mm->owner is currently searching for another user of mm to pass the
ownership to.
In order to get from state 3 to state 1 safely we have to hold
task_lock(new_owner). Otherwise we can race with an exit or exec in
new_owner, resulting in a process that has already passed the point of
checking current->mm->owner.
I don't see why we need mm->owner_lock to maintain this invariant.
(But am quite prepared to be proven wrong).
Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists