[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.1.10.0901061230460.10871@gandalf.stny.rr.com>
Date: Tue, 6 Jan 2009 12:37:10 -0500 (EST)
From: Steven Rostedt <rostedt@...dmis.org>
To: Andrew Morton <akpm@...ux-foundation.org>
cc: Peter Zijlstra <peterz@...radead.org>,
Matthew Wilcox <matthew@....cx>,
Andi Kleen <andi@...stfloor.org>,
Chris Mason <chris.mason@...cle.com>,
linux-kernel@...r.kernel.org,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
linux-btrfs <linux-btrfs@...r.kernel.org>,
Ingo Molnar <mingo@...e.hu>,
Thomas Gleixner <tglx@...utronix.de>,
Gregory Haskins <ghaskins@...ell.com>,
Nick Piggin <npiggin@...e.de>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH][RFC]: mutex: adaptive spin
On Tue, 6 Jan 2009, Andrew Morton wrote:
> On Tue, 06 Jan 2009 12:40:31 +0100 Peter Zijlstra <peterz@...radead.org> wrote:
>
> > Subject: mutex: adaptive spin
> > From: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> > Date: Tue Jan 06 12:32:12 CET 2009
> >
> > Based on the code in -rt, provide adaptive spins on generic mutexes.
> >
>
> How dumb is it to send a lump of uncommented, changelogged code as an
> rfc, only to have Ingo reply, providing the changelog for you?
Sounds smart to me. He was able to get someone else to do the work. ;-)
>
> Sigh.
>
> > --- linux-2.6.orig/kernel/mutex.c
> > +++ linux-2.6/kernel/mutex.c
> > @@ -46,6 +46,7 @@ __mutex_init(struct mutex *lock, const c
> > atomic_set(&lock->count, 1);
> > spin_lock_init(&lock->wait_lock);
> > INIT_LIST_HEAD(&lock->wait_list);
> > + lock->owner = NULL;
> >
> > debug_mutex_init(lock, name, key);
> > }
> > @@ -120,6 +121,28 @@ void __sched mutex_unlock(struct mutex *
> >
> > EXPORT_SYMBOL(mutex_unlock);
> >
> > +#ifdef CONFIG_SMP
> > +static int adaptive_wait(struct mutex_waiter *waiter,
> > + struct task_struct *owner, long state)
> > +{
> > + for (;;) {
> > + if (signal_pending_state(state, waiter->task))
> > + return 0;
> > + if (waiter->lock->owner != owner)
> > + return 0;
> > + if (!task_is_current(owner))
> > + return 1;
> > + cpu_relax();
> > + }
> > +}
>
> Each of the tests in this function should be carefully commented. It's
> really the core piece of the design.
Yep, I agree here too.
>
> > +#else
> > +static int adaptive_wait(struct mutex_waiter *waiter,
> > + struct task_struct *owner, long state)
> > +{
> > + return 1;
> > +}
> > +#endif
> > +
> > /*
> > * Lock a mutex (possibly interruptible), slowpath:
> > */
> > @@ -127,7 +150,7 @@ static inline int __sched
> > __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> > unsigned long ip)
> > {
> > - struct task_struct *task = current;
> > + struct task_struct *owner, *task = current;
> > struct mutex_waiter waiter;
> > unsigned int old_val;
> > unsigned long flags;
> > @@ -141,6 +164,7 @@ __mutex_lock_common(struct mutex *lock,
> > /* add waiting tasks to the end of the waitqueue (FIFO): */
> > list_add_tail(&waiter.list, &lock->wait_list);
> > waiter.task = task;
> > + waiter.lock = lock;
> >
> > old_val = atomic_xchg(&lock->count, -1);
> > if (old_val == 1)
> > @@ -175,11 +199,19 @@ __mutex_lock_common(struct mutex *lock,
> > debug_mutex_free_waiter(&waiter);
> > return -EINTR;
> > }
> > - __set_task_state(task, state);
> >
> > - /* didnt get the lock, go to sleep: */
> > + owner = lock->owner;
>
> What prevents *owner from exitting right here?
Yeah, this should be commented. Why this is not a race is because
the owner has the mutex here, and we have the lock->wait_lock. When
the owner releases the mutex it must go into the slow unlock path and
grab the lock->wait_lock spinlock. Thus it will block and not go away.
Of course if there's a bug elsewhere in the kernel that lets the task exit
without releasing the mutex, this will fail. But in that case, there's
bigger problems that need to be fixed.
>
> > + get_task_struct(owner);
> > spin_unlock_mutex(&lock->wait_lock, flags);
Here, we get the owner before releasing the wait_lock.
> > - schedule();
> > +
> > + if (adaptive_wait(&waiter, owner, state)) {
> > + put_task_struct(owner);
> > + __set_task_state(task, state);
> > + /* didnt get the lock, go to sleep: */
> > + schedule();
> > + } else
> > + put_task_struct(owner);
> > +
> > spin_lock_mutex(&lock->wait_lock, flags);
> > }
> >
> > @@ -187,7 +219,7 @@ done:
> > lock_acquired(&lock->dep_map, ip);
> > /* got the lock - rejoice! */
> > mutex_remove_waiter(lock, &waiter, task_thread_info(task));
> > - debug_mutex_set_owner(lock, task_thread_info(task));
> > + lock->owner = task;
> >
> > /* set it to 0 if there are no waiters left: */
> > if (likely(list_empty(&lock->wait_list)))
> > @@ -260,7 +292,7 @@ __mutex_unlock_common_slowpath(atomic_t
> > wake_up_process(waiter->task);
> > }
> >
> > - debug_mutex_clear_owner(lock);
> > + lock->owner = NULL;
> >
> > spin_unlock_mutex(&lock->wait_lock, flags);
> > }
> > @@ -352,7 +384,7 @@ static inline int __mutex_trylock_slowpa
> >
> > prev = atomic_xchg(&lock->count, -1);
> > if (likely(prev == 1)) {
> > - debug_mutex_set_owner(lock, current_thread_info());
> > + lock->owner = current;
> > mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_);
> > }
> > /* Set it back to 0 if there are no waiters: */
> > Index: linux-2.6/kernel/sched.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/sched.c
> > +++ linux-2.6/kernel/sched.c
> > @@ -697,6 +697,11 @@ int runqueue_is_locked(void)
> > return ret;
> > }
> >
> > +int task_is_current(struct task_struct *p)
> > +{
> > + return task_rq(p)->curr == p;
> > +}
>
> Please don't add kernel-wide infrastructure and leave it completely
> undocumented. Particularly functions which are as vague and dangerous
> as this one.
>
> What locking must the caller provide? What are the semantics of the
> return value?
>
> What must the caller do to avoid oopses if *p is concurrently exiting?
>
> etc.
>
>
> The overall design intent seems very smart to me, as long as the races
> can be plugged, if they're indeed present.
Thanks for the complement on the design. It came about lots of work by
many people in the -rt tree. What we need here is to comment the code for
those that did not see the evolution of the changes that were made.
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists