linux-kernel - Re: ptrace() hangs on attempt to seize/attach stopped & frozen task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <564DFDAF.3000402@redhat.com>
Date:	Thu, 19 Nov 2015 16:49:51 +0000
From:	Pedro Alves <palves@...hat.com>
To:	Oleg Nesterov <oleg@...hat.com>, Tejun Heo <tj@...nel.org>
CC:	Jan Kratochvil <jan.kratochvil@...hat.com>,
	Andrey Ryabinin <aryabinin@...tuozzo.com>,
	Roland McGrath <roland@...k.frob.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: ptrace() hangs on attempt to seize/attach stopped & frozen task

On 11/17/2015 07:34 PM, Oleg Nesterov wrote:
> On 11/16, Tejun Heo wrote:

>>> And perhaps we can simply remove this logic? I forgot why do we hide this
>>> STOPPED -> RUNNING -> TRACED transition from the attaching thread. But the
>>> vague feeling tells me that we discussed this before and perhaps it was me
>>> who suggested to avoid the user-visible change when you introduced this
>>> transition...
>>
>> Heh, it was too long ago for me to remember much. :)
> 
> Same here...
> 
>>> Anyway, now I do not understand why do we want to hide it. Lets consider
>>> the following "test-case",
>>>
>>> 	void test(int pid)
>>> 	{
>>> 		kill(pid, SIGSTOP);
>>> 		waitpid(pid, NULL, WSTOPPED);
>>>
>>> 		ptrace(PTRACE_ATTACH-OR-PTRACE_SEIZE, pid, 0,0);
>>>
>>> 		assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
>>> 	}
>>>
>>> Yes, it will fail if we remove JOBCTL_TRAPPING. But it can equally fail
>>> if SIGCONT comes before ATTACH, so perhaps we do not really care?
>>>
>>> Jan, Pedro, do you think the patch below can break gdb somehow? With this
>>> patch you can never assume that waitpid(WNOHANG) or ptrace(WHATEVER) will
>>> succeed right after PTRACE_ATTACH/PTRACE_SEIZE, even if you know that the
>>> tracee was TASK_STOPPED before attach.

Not sure, because I don't think I fully understand that proposed change.

Both GDB and gdbserver have special processing for attaching to already-stopped
processes.  (and neither use PTRACE_SEIZE yet.)

Here's the gdbserver version:

 https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/gdbserver/linux-low.c;h=41ab510fa4ac5654f101f08efb68e26b5bc5dbd7;hb=HEAD#l903

Copied here for convenience:

 907 linux_attach_lwp (ptid_t ptid)
 908 {
 909   struct lwp_info *new_lwp;
 910   int lwpid = ptid_get_lwp (ptid);
 911
 912   if (ptrace (PTRACE_ATTACH, lwpid, (PTRACE_TYPE_ARG3) 0, (PTRACE_TYPE_ARG4) 0)
 913       != 0)
 914     return errno;
 915
 916   new_lwp = add_lwp (ptid);
 917
 918   /* We need to wait for SIGSTOP before being able to make the next
 919      ptrace call on this LWP.  */
 920   new_lwp->must_set_ptrace_flags = 1;
 921
 922   if (linux_proc_pid_is_stopped (lwpid))
 923     {
 924       if (debug_threads)
 925         debug_printf ("Attached to a stopped process\n");
 926
 927       /* The process is definitely stopped.  It is in a job control
 928          stop, unless the kernel predates the TASK_STOPPED /
 929          TASK_TRACED distinction, in which case it might be in a
 930          ptrace stop.  Make sure it is in a ptrace stop; from there we
 931          can kill it, signal it, et cetera.
 932
 933          First make sure there is a pending SIGSTOP.  Since we are
 934          already attached, the process can not transition from stopped
 935          to running without a PTRACE_CONT; so we know this signal will
 936          go into the queue.  The SIGSTOP generated by PTRACE_ATTACH is
 937          probably already in the queue (unless this kernel is old
 938          enough to use TASK_STOPPED for ptrace stops); but since
 939          SIGSTOP is not an RT signal, it can only be queued once.  */
 940       kill_lwp (lwpid, SIGSTOP);
 941
 942       /* Finally, resume the stopped process.  This will deliver the
 943          SIGSTOP (or a higher priority signal, just like normal
 944          PTRACE_ATTACH), which we'll catch later on.  */
 945       ptrace (PTRACE_CONT, lwpid, (PTRACE_TYPE_ARG3) 0, (PTRACE_TYPE_ARG4) 0);
 946     }
 947
 948   /* The next time we wait for this LWP we'll see a SIGSTOP as PTRACE_ATTACH
 949      brings it to a halt.
 950

linux_proc_pid_is_stopped checks whether the state in /proc/pid/status is "T (stopped)".

Here's the equivalent in gdb:

  https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/linux-nat.c;h=841ec3949c37438dfba924d8db6b37ffc416dd29;hb=HEAD#l974

This queuing of a SIGSTOP + PTRACE_CONT was necessary because
otherwise when gdb attaches to a job stopped process, gdb would hang in the waitpid
after PTRACE_ATTACH, waiting for the initial SIGSTOP which would never arrive.

If the proposed change makes it so that a new intermediate state can be observed
right after PTRACE_ATTACH, and so linux_proc_pid_is_stopped can return false,
then there's potential for breakage.  But maybe not, if we're sure that
that when that happens, waitpid returns for the initial
PTRACE_ATTACH-induced SIGSTOP.

Thanks,
Pedro Alves

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/