linux-kernel - Re: "run seccomp after ptrace" changes expose "missing PTRACE_EVENT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOp6jLZsbjFmYjF=uqDtq41RKybvV20vEqkDZqt3f1NgvjFr6Q@mail.gmail.com>
Date:	Thu, 4 Aug 2016 11:55:55 +1200
From:	"Robert O'Callahan" <robert@...llahan.org>
To:	Kees Cook <keescook@...omium.org>
Cc:	linux-kernel@...r.kernel.org, Andy Lutomirski <luto@...nel.org>
Subject: Re: "run seccomp after ptrace" changes expose "missing
 PTRACE_EVENT_EXIT" bug

I work on rr (http://rr-project.org/), a record-and-replay
reverse-execution debugger which is a heavy user of ptrace and
seccomp. The recent change to perform syscall-entry PTRACE_SYSCALL
stops before PTRACE_EVENT_SECCOMP stops broke rr, which is fine
because I'm fixing rr and this change actually makes rr faster
(thanks!). However, it exposed an existing kernel bug which creates a
problem for us, and which I'm not sure how to fix.

The problem is that if a tracee task is in a PTRACE_EVENT_SECCOMP
trap, or has been resumed after such a trap but not yet been
scheduled, and another task in the thread-group calls exit_group(),
then the tracee task exits without the ptracer receiving a
PTRACE_EVENT_EXIT notification. Small-ish testcase here:
https://gist.github.com/rocallahan/1344f7d01183c233d08a2c6b93413068.

The bug happens because when __seccomp_filter() detects
fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and
that task is descheduled, __schedule() notices that there is a fatal
signal pending and changes its state from TASK_TRACED to TASK_RUNNING.
That prevents the ptracer's waitpid() from returning the ptrace event.
A more detailed analysis is here:
https://github.com/mozilla/rr/issues/1762#issuecomment-237396255.

This bug has been in the kernel for a while. rr never hit it before
because we trace all threads and mostly run only one tracee thread at
a time. Immediately after each PTRACE_EVENT_SECCOMP notification we'd
issue a PTRACE_SYSCALL to get that task to the syscall-entry
PTRACE_SYSCALL stop, so there was never an opportunity for one tracee
thread to call exit_group while another tracee was in the problematic
part of __seccomp_filter(). Unfortunately now there is no way for us
to avoid that possibility.

My guess is that __seccomp_filter() should dequeue the fatal signal it
detects before calling do_exit(), to behave more like get_signal(). Is
that correct, and if so, what would be the right way to do that?

Thanks,
Robert O'Callahan
-- 
lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf toD
selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t rdsme,aoreseoouoto
o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea lurpr
.a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt sstvr  esn