[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1364706119.6239.6.camel@buesod1.americas.hpqcorp.net>
Date: Sat, 30 Mar 2013 22:01:59 -0700
From: Davidlohr Bueso <davidlohr.bueso@...com>
To: Emmanuel Benisty <benisty.e@...il.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Dave Jones <davej@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Rik van Riel <riel@...riel.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
hhuang@...hat.com, "Low, Jason" <jason.low2@...com>,
Michel Lespinasse <walken@...gle.com>,
Larry Woodman <lwoodman@...hat.com>,
"Vinod, Chegu" <chegu_vinod@...com>,
Peter Hurley <peter@...leysoftware.com>
Subject: Re: ipc,sem: sysv semaphore scalability
On Sat, 2013-03-30 at 11:33 +0700, Emmanuel Benisty wrote:
> On Sat, Mar 30, 2013 at 10:46 AM, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> > On Fri, Mar 29, 2013 at 8:02 PM, Emmanuel Benisty <benisty.e@...il.com> wrote:
> >>
> >> Then I start building a random package and the problems start. They
> >> may also happen without compiling but this seems to trigger the bug
> >> quite quickly.
> >
> > I suspect it's about preemption, and the build just results in enough
> > scheduling load that you start hitting whatever race there is.
> >
> >> Anyway, some progress here, I hope: dmesg seems to be
> >> willing to reveal some secrets (using some pastebin service since this
> >> is pretty big):
> >>
> >> https://gist.github.com/anonymous/5275120
> >
> > That looks like exactly the exit_sem() bug that Davidlohr was talking
> > about, where the
> >
> > /* exit_sem raced with IPC_RMID, nothing to do */
> > if (IS_ERR(sma))
> > continue;
> >
> > should be moved to *before* the
> >
> > sem_lock(sma, NULL, -1);
> >
> > call. And apparently the bug I had found is already fixed in -next.
>
> I just tried the 7 original patches + the 2 one liners from -next +
> modified Linus' patch (attached) on the top of 3.9-rc4 using
> PREEMPT_NONE and after moving sem_lock(sma, NULL, -1) as explained
> above. I was building two packages at the same time, went away for 30
> seconds, came back and everything froze as soon as I touched the
> laptop's touchpad. Maybe a coincidence but anyway... Another shot in
> the dark, I had this weird message when trying to build gcc:
> semop(2): encountered an error: Identifier removed
*sigh*. I had high hopes for this being the bug triggering your issue,
specially after seeing exit_sem() in the trace.
Emmanuel, just to be sure, does your changes reflect the patch below?
Specially dropping the rcu read lock before the continue statement
(sorry for not mentioning this in the last email).
Anyway, this is still a bug. Andrew, the patch below applies to
linux-next, please queue this up if you don't have any objections.
Thanks,
Davidlohr
---8<---
From: Davidlohr Bueso <davidlohr.bueso@...com>
Subject: [PATCH] ipc, sem: do not call sem_lock when bogus sma
In exit_sem() we attempt to acquire the sma->sem_perm.lock by calling
sem_lock() immediately after obtaining sma. However, if sma isn't valid,
then calling sem_lock() will tend to do bad things.
Move the sma error check right after the sem_obtain_object_check() call instead.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@...com>
---
ipc/sem.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/ipc/sem.c b/ipc/sem.c
index f257afe..74cedfe 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -1867,8 +1867,7 @@ void exit_sem(struct task_struct *tsk)
struct sem_array *sma;
struct sem_undo *un;
struct list_head tasks;
- int semid;
- int i;
+ int semid, i;
rcu_read_lock();
un = list_entry_rcu(ulp->list_proc.next,
@@ -1884,12 +1883,13 @@ void exit_sem(struct task_struct *tsk)
}
sma = sem_obtain_object_check(tsk->nsproxy->ipc_ns, un->semid);
- sem_lock(sma, NULL, -1);
-
/* exit_sem raced with IPC_RMID, nothing to do */
- if (IS_ERR(sma))
+ if (IS_ERR(sma)) {
+ rcu_read_unlock();
continue;
+ }
+ sem_lock(sma, NULL, -1);
un = __lookup_undo(ulp, semid);
if (un == NULL) {
/* exit_sem raced with IPC_RMID+semget() that created
--
1.7.11.7
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists