linux-kernel - Re: "run seccomp after ptrace" changes expose "missing PTRACE_EVENT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <CAGXu5jJZ=ZsfNANscGw-nG=qyPkK+FiEqbUGQOVDp1tC=zykfA@mail.gmail.com>
Date:	Wed, 3 Aug 2016 22:24:28 -0700
From:	Kees Cook <keescook@...omium.org>
To:	robert@...llahan.org
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Andy Lutomirski <luto@...nel.org>
Subject: Re: "run seccomp after ptrace" changes expose "missing
 PTRACE_EVENT_EXIT" bug

On Wed, Aug 3, 2016 at 4:51 PM, Robert O'Callahan <robert@...llahan.org> wrote:
> I work on rr (http://rr-project.org/), a record-and-replay reverse-execution
> debugger which is a heavy user of ptrace and seccomp. The recent change to
> perform syscall-entry PTRACE_SYSCALL stops before PTRACE_EVENT_SECCOMP stops
> broke rr, which is fine because I'm fixing rr and this change actually makes
> rr faster (thanks!). However, it exposed an existing kernel bug which
> creates a problem for us, and which I'm not sure how to fix.
>
> The problem is that if a tracee task is in a PTRACE_EVENT_SECCOMP trap, or
> has been resumed after such a trap but not yet been scheduled, and another
> task in the thread-group calls exit_group(), then the tracee task exits
> without the ptracer receiving a PTRACE_EVENT_EXIT notification. Small-ish
> testcase here:
> https://gist.github.com/rocallahan/1344f7d01183c233d08a2c6b93413068.
>
> The bug happens because when __seccomp_filter() detects
> fatal_signal_pending(), it calls do_exit() without dequeuing the fatal
> signal. When do_exit() sends the PTRACE_EVENT_EXIT notification and that
> task is descheduled, __schedule() notices that there is a fatal signal
> pending and changes its state from TASK_TRACED to TASK_RUNNING. That
> prevents the ptracer's waitpid() from returning the ptrace event. A more
> detailed analysis is here:
> https://github.com/mozilla/rr/issues/1762#issuecomment-237396255.
>
> This bug has been in the kernel for a while. rr never hit it before because
> we trace all threads and mostly run only one tracee thread at a time.
> Immediately after each PTRACE_EVENT_SECCOMP notification we'd issue a
> PTRACE_SYSCALL to get that task to the syscall-entry PTRACE_SYSCALL stop, so
> there was never an opportunity for one tracee thread to call exit_group
> while another tracee was in the problematic part of __seccomp_filter().
> Unfortunately now there is no way for us to avoid that possibility.
>
> My guess is that __seccomp_filter() should dequeue the fatal signal it
> detects before calling do_exit(), to behave more like get_signal(). Is that
> correct, and if so, what would be the right way to do that?

Thanks for the detailed analysis! I'll take a look at what can be done
here. Off the top of my head, I don't see a problem with what you're
suggesting. Let me see what I can come up with.

-Kees

>
> Thanks,
> Robert O'Callahan
> --
> lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf
> toD
> selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t
> rdsme,aoreseoouoto
> o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea lurpr
> .a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt sstvr
> esn



-- 
Kees Cook
Brillo & Chrome OS Security