[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20061027015930.40C67180057@magilla.sf.frob.com>
Date: Thu, 26 Oct 2006 18:59:30 -0700 (PDT)
From: Roland McGrath <roland@...hat.com>
To: Ernie Petrides <petrides@...hat.com>
Cc: Andrew Morton <akpm@...l.org>, Jakub Jelinek <jakub@...hat.com>,
"Ulrich Drepper <drepper@...hat.com>Ingo Molnar" <mingo@...e.hu>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] fix for zeroed user-space tids in multi-threaded core dumps
I concur with the analysis and the approach to avoiding the problem. Any
change half this subtle deserves some comments in the code. Other places
that look at mm->core_waiters hold mm->mmap_sem, though I'm not entirely
sure it matters in practice here.
I would actually go further and skip the clearing on any signal exit,
rather than looking specifically for the core dump issue. I've CC'd
Ulrich, the creator of the feature, who can verify that the intended
userland ABI specification has always been concerned only with the sys_exit
path. It's arguably incorrect by POSIX as it is now (and would still be
with just Ernie's change), in that a thread taking SIGTERM should not cause
the race where another thread's pthread_join on that thread might return
before the SIGTERM took effect to kill all the threads. (I'm not at all
sure such a race is actually possible, due to other interactions.)
This patch is race-free in that it depends only on the activities of the
current thread itself. If it had started exiting normally before the
process-wide core dump happened, then the core dump can show it that way.
The PF_SIGNALED check covers the core dump case, and also the normal
no-core fatal signal case (so a superset of what Ernie's change covers
modulo the race difference, though still not including the several obscure
cases of direct do_exit calls that aren't from the normal signal code).
Thanks,
Roland
---
[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit.
The CLONE_CHILD_CLEARTID flag is used by NPTL to have its threads
communicate via memory/futex when they exit, so pthread_join can
synchronize using a simple futex wait. The word of user memory where NPTL
stores a thread's own TID is what it passes; this gets reset to zero at
thread exit.
It is not desireable to touch this user memory when threads are dying due
to a fatal signal. A core dump is more usefully representative of the
dying program state if the threads live at the time of the crash have their
NPTL data structures unperturbed. The userland expectation of
CLONE_CHILD_CLEARTID has only ever been that it works for a thread making
an _exit system call.
This problem was identified by Ernie Petrides <petrides@...hat.com>.
Signed-off-by: Roland McGrath <roland@...hat.com>
---
kernel/fork.c | 21 +++++++++++++++------
1 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 29ebb30..da3e2e1 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -448,7 +448,16 @@ void mm_release(struct task_struct *tsk,
tsk->vfork_done = NULL;
complete(vfork_done);
}
- if (tsk->clear_child_tid && atomic_read(&mm->mm_users) > 1) {
+
+ /*
+ * If we're exiting normally, clear a user-space tid field if
+ * requested. We leave this alone when dying by signal, to leave
+ * the value intact in a core dump, and to save the unnecessary
+ * trouble otherwise. Userland only wants this done for a sys_exit.
+ */
+ if (tsk->clear_child_tid
+ && !(tsk->flags & PF_SIGNALED)
+ && atomic_read(&mm->mm_users) > 1) {
u32 __user * tidptr = tsk->clear_child_tid;
tsk->clear_child_tid = NULL;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists