linux-kernel - Re: Question regarding ptrace work for LInux v3.1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALJO4zEH6tLLQQ33qFJ8STTLrRPcbvPqOWKUT8=Qu3-S82Ecng@mail.gmail.com>
Date:	Mon, 21 Mar 2016 15:24:10 -0400
From:	Patrick Donnelly <pdonnel3@...edu>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Tejun Heo <tj@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: Question regarding ptrace work for LInux v3.1

On Mon, Mar 21, 2016 at 3:07 PM, Oleg Nesterov <oleg@...hat.com> wrote:
> On 03/21, Patrick Donnelly wrote:
>>
>> That seems to be the case but it will only report certain events (not
>> syscalls). I have observed PTRACE_EVENT_EXIT and PTRACE_EVENT_CLONE
>> events... Hmm, now that I think about this, it would be necessary to
>> see the initial SIGSTOP (or PTRACE_EVENT_STOP) in order to initiate
>> syscall tracing via PTRACE_SYSCALL. So that does seem to indicate the
>> problem.
>
> Yes, exactly, you need to see the initial SIGSTOP or another event which
> can be reported before it.

Assuming a SIGSTOP is being silenced, is there anything we can do to
forcibly start tracing syscalls? (For kernels without PTRACE_SEIZE)

>> > To clarify, the usage of SIGSTOP in ptrace was always buggy by design.
>> > For example, SIGCONT from somewhere can remove the pending (and not yet
>> > reported) SIGSTOP, and this _can_ explain the problem you hit.
>>
>> The tree of processes being traced do no send any signals but an
>> external process may have.
>
> I am looking into
>
>    https://github.com/cooperative-computing-lab/cctools/blob/5ccb04599ba2ee125730981f53add80d98cf8161/parrot/src/pfs_main.cc
>
> and this code
>
>         case SIGSTOP:
>         /* Black magic to get threads working on old Linux kernels... */
>
>         if(p->nsyscalls == 0) { /* stop before we begin running the process */
>                 debug(D_DEBUG, "suppressing bootstrap SIGSTOP for %d",pid);
>                 signum = 0; /* suppress delivery */
>                 kill(p->pid,SIGCONT);
>         }
>         break;
>
> doesn't look right. Note that kill(pid,SIGCONT) affects the whole thread-
> group. So if this kill() races with another thread doing clone() you can
> hit the problem you described.

You're right, that should be tkill! I will give that a try and report
back if that solved the issue for our collaborators...

>> > But unless you use PTRACE_SEIZE the same can happen on v3.1 so it seems
>> > there is something else.
>>
>> Okay, it might be that PTRACE_SEIZE fixes it.
>
> Yes, but iiuc you do not see this problem on v3.1 even with PTRACE_ATTACH?

I have not tested on >v3.1 with PTRACE_ATTACH. As you know, v3.1 was
when the PTRACE_SEIZE code was merged along with many other changes.
[I actually thought the merge occurred in 3.4 because of the ptrace
man page. I have submitted a bug report to get that fixed.] I have not
had any reports of the problem with Linux versions after and including
v3.1.

Again, I will see if the kill system call was the cause and report
back if so. Thanks for taking the time to look at the code!

-- 
Patrick Donnelly