[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101220150037.GE11583@redhat.com>
Date: Mon, 20 Dec 2010 16:00:37 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Tejun Heo <tj@...nel.org>
Cc: roland@...hat.com, linux-kernel@...r.kernel.org,
torvalds@...ux-foundation.org, akpm@...ux-foundation.org,
rjw@...k.pl, jan.kratochvil@...hat.com
Subject: Re: [PATCH 10/16] ptrace: clean transitions between TASK_STOPPED
and TRACED
On 12/06, Tejun Heo wrote:
>
> @@ -93,6 +99,7 @@ int ptrace_check_attach(struct task_struct *child, int kill)
> * we are sure that this is our traced child and that can only
> * be changed by us so it's not changing right after this.
> */
> +relock:
> read_lock(&tasklist_lock);
> if ((child->ptrace & PT_PTRACED) && child->parent == current) {
> ret = 0;
> @@ -101,10 +108,30 @@ int ptrace_check_attach(struct task_struct *child, int kill)
> * does ptrace_unlink() before __exit_signal().
> */
> spin_lock_irq(&child->sighand->siglock);
> - if (task_is_stopped(child))
> - child->state = TASK_TRACED;
> - else if (!task_is_traced(child) && !kill)
> + if (!task_is_traced(child) && !kill) {
> + /*
> + * If GROUP_STOP_TRAPPING is set, it is known that
> + * the tracee will enter either TRACED or the bit
> + * will be cleared in definite amount of (userland)
> + * time. Wait while the bit is set.
> + *
> + * This hides PTRACE_ATTACH initiated transition
> + * from STOPPED to TRACED from userland.
> + */
> + if (child->group_stop & GROUP_STOP_TRAPPING) {
> + const int bit = ilog2(GROUP_STOP_TRAPPING);
> + DEFINE_WAIT_BIT(wait, &child->group_stop, bit);
Unused "wait_bit_queue wait"
> +
> + spin_unlock_irq(&child->sighand->siglock);
> + read_unlock(&tasklist_lock);
> +
> + wait_on_bit(&child->group_stop, bit,
Hmm. we could probably use ->wait_chldexit/__wake_up_parent instead,
although I am not sure this would be more clean...
> + ptrace_wait_trap,
> + TASK_UNINTERRUPTIBLE);
I am nervous ;) To me, TASK_KILLABLE makes more sense, and it is
safer if we have a bug.
> + goto relock;
We already set ret = 0. "relock" should set -ESRCH.
> + }
> ret = -ESRCH;
> + }
Probably this deserves a minor cleanup,
relock:
ret = -ESRCH;
read_lock(&tasklist_lock);
if (task_is_traced() || kill) {
ret = 0;
} else {
...
OK. So, ptrace_attach() asks a stopped tracee to s/STOPPED/TRACED/.
If it is not stopped, it should call ptrace_stop() eventually.
This doesn't work if ptrace_attach() races with clone(CLONE_STOPPED).
ptrace_check_attach() can return the wrong ESRCH after that. Perhaps
it is time to kill the CLONE_STOPPED code in do_fork().
> @@ -204,6 +231,26 @@ int ptrace_attach(struct task_struct *task)
> __ptrace_link(task, current);
> send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
>
> + spin_lock(&task->sighand->siglock);
> +
> + /*
> + * If the task is already STOPPED, set GROUP_STOP_PENDING and
> + * TRAPPING, and kick it so that it transits to TRACED. TRAPPING
> + * will be cleared if the child completes the transition or any
> + * event which clears the group stop states happens. The bit is
> + * waited by ptrace_check_attach() to hide the transition from
> + * userland.
> + *
> + * The following is safe as both transitions in and out of STOPPED
> + * are protected by siglock.
> + */
> + if (task_is_stopped(task)) {
> + task->group_stop |= GROUP_STOP_PENDING | GROUP_STOP_TRAPPING;
> + signal_wake_up(task, 1);
> + }
> +
> + spin_unlock(&task->sighand->siglock);
Well. I do not know whether this matters, but "hide the transition from
userland" is not 100% correct. I mean, this change is still visible.
ptrace_check_attach()->wait_on_bit() logic fixes the previous example,
but:
1. the tracer knows that the tracee is stopped
2. the tracer does ptrace(ATTACH)
3. the tracer does do_wait()
In this case do_wait() can see the tracee in TASK_RUNNING state,
this breaks wait_task_stopped(ptrace => true).
Jan?
> @@ -1799,22 +1830,28 @@ static int do_signal_stop(int signr)
> */
> sig->group_exit_code = signr;
>
> - current->group_stop = gstop;
> + current->group_stop &= ~GROUP_STOP_SIGMASK;
> + current->group_stop |= signr | gstop;
> sig->group_stop_count = 1;
> - for (t = next_thread(current); t != current; t = next_thread(t))
> + for (t = next_thread(current); t != current;
> + t = next_thread(t)) {
> + t->group_stop &= ~GROUP_STOP_SIGMASK;
> /*
> * Setting state to TASK_STOPPED for a group
> * stop is always done with the siglock held,
> * so this check has no races.
> */
> if (!(t->flags & PF_EXITING) && !task_is_stopped(t)) {
> - t->group_stop = gstop;
> + t->group_stop |= signr | gstop;
> sig->group_stop_count++;
> signal_wake_up(t, 0);
> - } else
> + } else {
> task_clear_group_stop(t);
This looks racy. Suppose that "current" is ptraced, in this case
it can initiate the new group-stop even if SIGNAL_STOP_STOPPED
is set and we have another TASK_STOPPED thead T.
Suppose that another (or same) debugger ataches to this thread T,
wakes it up and sets GROUP_STOP_TRAPPING.
T resumes, calls ptrace_stop() in TASK_STOPPED, and temporary drops
->siglock.
Now, this task_clear_group_stop(T) confuses ptrace_check_attach(T).
I think ptrace_stop() should be called in TASK_RUNNING state.
This also makes sense because we may call arch_ptrace_stop().
> + t->group_stop |= signr;
Probably this doesn't really matter, but why do we need to
change the GROUP_STOP_SIGMASK part of t->group_stop? If it
is exiting, this is not needed. If it is already stopped, then
it already has the correct (previous) signr.
> + }
> + }
> }
> -
> +retry:
> current->exit_code = sig->group_exit_code;
> __set_current_state(TASK_STOPPED);
It is no longer needed to set ->exit_code here. The only reason
was s/STOPPED/TRACED/ change in ptrace_check_attach(). Now that
we rely on ptrace_stop() which sets ->exit_state, this can be
removed.
And,
> @@ -1842,7 +1879,18 @@ static int do_signal_stop(int signr)
>
> spin_lock_irq(¤t->sighand->siglock);
> } else
> - ptrace_stop(current->exit_code, CLD_STOPPED, 0, NULL);
> + ptrace_stop(current->group_stop & GROUP_STOP_SIGMASK,
> + CLD_STOPPED, 0, NULL);
Perhaps it would be more clean to clear ->exit_code here, in the
"else" branch.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists