[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1316115135.4060.19.camel@twins>
Date: Thu, 15 Sep 2011 21:32:15 +0200
From: Peter Zijlstra <a.p.zijlstra@...llo.nl>
To: Manfred Spraul <manfred@...orfullife.com>
Cc: Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
Darren Hart <dvhart@...ux.intel.com>,
David Miller <davem@...emloft.net>,
Eric Dumazet <eric.dumazet@...il.com>,
Mike Galbraith <efault@....de>
Subject: Re: [RFC][PATCH 3/3] ipc/sem: Rework wakeup scheme
On Thu, 2011-09-15 at 19:29 +0200, Manfred Spraul wrote:
> Hi Peter,
> What is broken?
I'm not quite sure yet, but the results are that sembench doesn't
complete properly; http://oss.oracle.com/~mason/sembench.c
That seems to be happening is that we get spurious wakeups in the
ipc/sem code resulting it semtimedop returning -EINTR, even though
there's no pending signal.
(there really should be a if (!signal_pending(current)) goto again thing
in that semtimedop wait loop)
Adding a loop in userspace like:
again:
ret = semtimedop(semid_lookup[l->id], &sb, 1, tvp);
if (ret) {
if (errno == EINTR) {
l->spurious++;
kill_tracer();
goto again;
}
perror("semtimedop");
}
makes it complete again (although performance seems to suffer a lot
compared to a kernel without this patch).
It seems related to patch 2/3 converting the futex code, without that
patch I can't seem to reproduce. All this is strange though, because if
there were multiple wakeups on the same task wake_lists ought to result
in less wakeups in total, not more.
I've been trying to trace the thing but so far no luck.. when I enable
too much tracing it goes away.. silly heisenbugger.
> > +static void wake_up_sem_queue_prepare(struct wake_list_head *wake_list,
> > struct sem_queue *q, int error)
> > {
> > + struct task_struct *p = ACCESS_ONCE(q->sleeper);
> >
> > + get_task_struct(p);
> > + q->status = error;
> > + /*
> > + * implies a full barrier
> > + */
> > + wake_list_add(wake_list, p);
> > + put_task_struct(p);
> > }
> I think the get_task_struct()/put_task_struct is not necessary:
> Just do the wake_list_add() before writing q->status:
> wake_list_add() is identical to list_add_tail(&q->simple_list, pt).
> [except that it contains additional locking, which doesn't matter here]
But the moment we write q->status, q can disappear right?
Suppose the task gets a wakeup (say from a signal) right after we write
q->status. Then p can disappear (do_exit) and we'd try to enqueue dead
memory -> BOOM!
> > +static void wake_up_sem_queue_do(struct wake_list_head *wake_list)
> > {
> > + wake_up_list(wake_list, TASK_ALL);
> > }
> >
> wake_up_list() calls wake_up_state() that calls try_to_wake_up().
> try_to_wake_up() seems to return immediately when the state is TASK_DEAD.
>
> That leaves: Is it safe to call wake_up_list() in parallel with do_exit()?
> The current implementation avoids that.
Ah, wake_list_add() does get_task_struct() and wake_up_list() will first
issue the wakeup and then drop the reference.
Hrmm,. it looks like its all these atomic ops {get,put}_task_struct()
that are causing the performance drop.. I just removed the ones in
wake_up_sem_queue_prepare() just for kicks and I got about half my
performance gap back.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists