linux-kernel - Re: [RFC][PATCH 3/3] ipc/sem: Rework wakeup scheme

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1316115135.4060.19.camel@twins>
Date:	Thu, 15 Sep 2011 21:32:15 +0200
From:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
To:	Manfred Spraul <manfred@...orfullife.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Thomas Gleixner <tglx@...utronix.de>,
	linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
	Darren Hart <dvhart@...ux.intel.com>,
	David Miller <davem@...emloft.net>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Mike Galbraith <efault@....de>
Subject: Re: [RFC][PATCH 3/3] ipc/sem: Rework wakeup scheme

On Thu, 2011-09-15 at 19:29 +0200, Manfred Spraul wrote:
> Hi Peter,

> What is broken?

I'm not quite sure yet, but the results are that sembench doesn't
complete properly; http://oss.oracle.com/~mason/sembench.c

That seems to be happening is that we get spurious wakeups in the
ipc/sem code resulting it semtimedop returning -EINTR, even though
there's no pending signal.

(there really should be a if (!signal_pending(current)) goto again thing
in that semtimedop wait loop)

Adding a loop in userspace like:

again:
        ret = semtimedop(semid_lookup[l->id], &sb, 1, tvp);
        if (ret) {
                if (errno == EINTR) {
                        l->spurious++;
                        kill_tracer();
                        goto again;
                }
                perror("semtimedop");
        }

makes it complete again (although performance seems to suffer a lot
compared to a kernel without this patch).

It seems related to patch 2/3 converting the futex code, without that
patch I can't seem to reproduce. All this is strange though, because if
there were multiple wakeups on the same task wake_lists ought to result
in less wakeups in total, not more.

I've been trying to trace the thing but so far no luck.. when I enable
too much tracing it goes away.. silly heisenbugger.

> > +static void wake_up_sem_queue_prepare(struct wake_list_head *wake_list,
> >   				struct sem_queue *q, int error)
> >   {
> > +	struct task_struct *p = ACCESS_ONCE(q->sleeper);
> >
> > +	get_task_struct(p);
> > +	q->status = error;
> > +	/*
> > +	 * implies a full barrier
> > +	 */
> > +	wake_list_add(wake_list, p);
> > +	put_task_struct(p);
> >   }

> I think the get_task_struct()/put_task_struct is not necessary:
> Just do the wake_list_add() before writing q->status:
> wake_list_add() is identical to list_add_tail(&q->simple_list, pt).
> [except that it contains additional locking, which doesn't matter here]

But the moment we write q->status, q can disappear right? 

Suppose the task gets a wakeup (say from a signal) right after we write
q->status. Then p can disappear (do_exit) and we'd try to enqueue dead
memory -> BOOM!

> > +static void wake_up_sem_queue_do(struct wake_list_head *wake_list)
> >   {
> > +	wake_up_list(wake_list, TASK_ALL);
> >   }
> >   
> wake_up_list() calls wake_up_state() that calls try_to_wake_up().
> try_to_wake_up() seems to return immediately when the state is TASK_DEAD.
> 
> That leaves: Is it safe to call wake_up_list() in parallel with do_exit()?
> The current implementation avoids that.

Ah, wake_list_add() does get_task_struct() and wake_up_list() will first
issue the wakeup and then drop the reference.

Hrmm,. it looks like its all these atomic ops {get,put}_task_struct()
that are causing the performance drop.. I just removed the ones in
wake_up_sem_queue_prepare() just for kicks and I got about half my
performance gap back.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/