linux-kernel - Re: [PATCH 1/1] ptrace: make sure do_wait() won't hang after PTRACE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110207153723.GA27997@redhat.com>
Date:	Mon, 7 Feb 2011 16:37:23 +0100
From:	Oleg Nesterov <oleg@...hat.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Roland McGrath <roland@...hat.com>, jan.kratochvil@...hat.com,
	linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org,
	akpm@...ux-foundation.org
Subject: Re: [PATCH 1/1] ptrace: make sure do_wait() won't hang after
	PTRACE_ATTACH

On 02/07, Tejun Heo wrote:
>
> Hello, Oleg.
>
> On Mon, Feb 07, 2011 at 02:42:35PM +0100, Oleg Nesterov wrote:
> > > That's the shortcomings of the current implementation.  The specific
> > > problem sure can be fixed by putting group stop on top of ptrace but
> > > that is not the only direction.  In fact, that actually is the
> > > direction we CAN'T take with ptrace because changing that will create
> > > a lot more problems that can't be worked around.
> >
> > Which problems?
>
> I was talking about prioritizing group stop over ptrace in general.
> Please see the following messages.
>
>  http://article.gmane.org/gmane.linux.kernel/1095119
>  http://article.gmane.org/gmane.linux.kernel/1095603

Yes, I tried to read this... But I have to admit I can hardly understand
your discussion with Roland. More precisely, I don't understand what
exactly you have in mind.

One (may be off-topic) note,

	On 01/31, Tejun Heo wrote:
	>
	> On Fri, Jan 28, 2011 at 01:30:09PM -0800, Roland McGrath wrote:
	> > > A visible behavior change is increased likelihood of delayed group
	> > > stop completion if the thread group contains one or more ptraced
	> > > tasks.
	> >
	> > I object to that difference in behavior.  As I've said before, I don't
	> > think there should be any option to circumvent a group stop via ptrace.
	> > If you think otherwise, you have a hard road to convince me of it.

I agree with Roland here.

	> Yes, I do have some other ideas.  When a ptraced task gets a stop
	> signal, its delivery is controlled by the tracer, right?

Right, but note that the tracer does not fully control the group-stop.
One a thread dequeues SIGSTOP (and please note this thread can be !traced),
all other threads (traced or not) should participate.

As for SIGCONT priority, see below.

> Notifying the parent w/o making group stop superior to ptrace sure is
> a possibility.

Could you please reiterate? I think I missed something before, and
now I do not really understand what do you mean.

> > > For example, the problem in this thread is cleanly solved by
> > > really examining the problem and fixing the problem at the source (the
> > > mixup of group and ptrace stop)
> >
> > Yes, but I am worried that this change (in its current form) makes
> > impossible to create a TASK_STOPPED tracee, but you already know this.
>
> Why is that a problem?

See above. Because I think ptrace should not "hide" jctl stops (at
least by default), and SIGCONT should work in this case.

> A ptraced task stops in TASK_TRACED.

Unless it reacts to SIGSTOP/group_stop_count.

> > OK. But what I can't understand is why the alternative change is
> > not better. Once again:
> >
> > 	- the stopping thread always notifies the debugger
> >
> > 	- the last thread notifies both: debugger and real_parent
> >
> > 	- do_wait() is modified so that WSTOPPED always works
> > 	  for real_parent, even if its child is ptraced.
>
> I think the disconnection comes from the scope of the problem.  If we
> restrict our attention to group stop notification.

Of course, we shouldn't restrict.

> I agree that what
> you're describing seems like a good compromise.  What I was objecting
> to was putting group stop mechanism in general on top of ptrace.  I
> can't see how that would work.

And I still can't understand why this can't work ;)

And I don't really understand "putting group stop mechanism in general
on top of ptrace". It is very possible I am wrong, but I see this from
the different angle: stop/ptrace should be "parallel".

> Also, for a ptraced task, what would you consider to be participating
> in a group stop?

Yes, this is the question.

> I think it should only include the case where the
> tracee actually stops for group stop excluding all other trapping
> points.

I was thinking about this too and probably this makes sense. But
I think at least initial changes should keep the current behaviour
(assuming this behaviour is fixed).

> But, I don't think this really changes the need for state tracking.
> We would still have to put the tracee into approriate mode on detach.

Sure, but we already have SIGNAL_STOP_STOPPED/group_signal_stop. I meant,
we do not need to remember the state per-thread.


As for SIGCONT. Roland suggests (roughly) to change ptrace_resume()
so that it doesn't wakeup the stopped tracee until the real SIGCONT
comes. And this makes sense to me.

	On 02/03, Tejun Heo wrote:
	>
	> I've been thinking about this and the more I think about it I don't
	> see how we can make this priority flipping without adversely affecting
	> the expect userland behavior.
	>
	> For example, if a gdb traced task is instructed to participate in a
	> group stop and then hits a ptrace trap, it would have to participate
	> in the group stop as it enters ptrace trap, right?  gdb's wait(2)
	> would complete indicating ptrace trap.  After the user tells the task
	> to continue, the task shouldn't resume until SIGCONT is received;

Yes. But to me, this looks correct! The tracee shouldn't resume exactly
because it is stopped.

	> however, at this point, there's no way for gdb to tell what's going on
	> with the tracee.

Yes. I think this should be improved somehow, currently gdb can only
look in /proc/tid/status to detect this case.

	> If ptrace behaved like that from the beginning, gdb would have behaved
	> differently and worked around those cases but that hasn't been the
	> case

Cough... I thought we agreed it is better to break some corner cases
but make ptrace more consistent ;)

But yes, I see your point. And while I think that Roland's suggestion is
fine, I also have another proposal

	- never send CLD_CONTINUED to the tracer, always send it to parent.

	  Firstly, this is completely pointless: ptrace is per-thread, while
	  this notification is per-process

	- change do_wait() so that WCONTINUED for real_parent

	- change ptrace_resume() to check SIGNAL_STOP_STOPPED case. It should
	  act as SIGCONT in this case. Yes: "act as SIGCONT" needs more
	  discussion.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/