linux-kernel - Re: [PATCHv4] exec: Fix a deadlock in ptrace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 3 Mar 2020 10:34:08 +0000
From:   Bernd Edlinger <bernd.edlinger@...mail.de>
To:     Christian Brauner <christian.brauner@...ntu.com>,
        Kees Cook <keescook@...omium.org>
CC:     "Eric W. Biederman" <ebiederm@...ssion.com>,
        Jann Horn <jannh@...gle.com>, Jonathan Corbet <corbet@....net>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Alexey Dobriyan <adobriyan@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Oleg Nesterov <oleg@...hat.com>,
        Frederic Weisbecker <frederic@...nel.org>,
        Andrei Vagin <avagin@...il.com>,
        Ingo Molnar <mingo@...nel.org>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Yuyang Du <duyuyang@...il.com>,
        David Hildenbrand <david@...hat.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Anshuman Khandual <anshuman.khandual@....com>,
        David Howells <dhowells@...hat.com>,
        James Morris <jamorris@...ux.microsoft.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Christian Kellner <christian@...lner.me>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Aleksa Sarai <cyphar@...har.com>,
        "Dmitry V. Levin" <ldv@...linux.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCHv4] exec: Fix a deadlock in ptrace

On 3/3/20 9:58 AM, Christian Brauner wrote:
> On Mon, Mar 02, 2020 at 06:26:47PM -0800, Kees Cook wrote:
>> On Mon, Mar 02, 2020 at 10:18:07PM +0000, Bernd Edlinger wrote:
>>> This fixes a deadlock in the tracer when tracing a multi-threaded
>>> application that calls execve while more than one thread are running.
>>>
>>> I observed that when running strace on the gcc test suite, it always
>>> blocks after a while, when expect calls execve, because other threads
>>> have to be terminated.  They send ptrace events, but the strace is no
>>> longer able to respond, since it is blocked in vm_access.
>>>
>>> The deadlock is always happening when strace needs to access the
>>> tracees process mmap, while another thread in the tracee starts to
>>> execve a child process, but that cannot continue until the
>>> PTRACE_EVENT_EXIT is handled and the WIFEXITED event is received:
>>>
>>> strace          D    0 30614  30584 0x00000000
>>> Call Trace:
>>> __schedule+0x3ce/0x6e0
>>> schedule+0x5c/0xd0
>>> schedule_preempt_disabled+0x15/0x20
>>> __mutex_lock.isra.13+0x1ec/0x520
>>> __mutex_lock_killable_slowpath+0x13/0x20
>>> mutex_lock_killable+0x28/0x30
>>> mm_access+0x27/0xa0
>>> process_vm_rw_core.isra.3+0xff/0x550
>>> process_vm_rw+0xdd/0xf0
>>> __x64_sys_process_vm_readv+0x31/0x40
>>> do_syscall_64+0x64/0x220
>>> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>
>>> expect          D    0 31933  30876 0x80004003
>>> Call Trace:
>>> __schedule+0x3ce/0x6e0
>>> schedule+0x5c/0xd0
>>> flush_old_exec+0xc4/0x770
>>> load_elf_binary+0x35a/0x16c0
>>> search_binary_handler+0x97/0x1d0
>>> __do_execve_file.isra.40+0x5d4/0x8a0
>>> __x64_sys_execve+0x49/0x60
>>> do_syscall_64+0x64/0x220
>>> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>
>>> The proposed solution is to take the cred_guard_mutex only
>>> in a critical section at the beginning, and at the end of the
>>> execve function, and let PTRACE_ATTACH fail with EAGAIN while
>>> execve is not complete, but other functions like vm_access are
>>> allowed to complete normally.
>>
>> Sorry to be bummer, but I don't think this will work. A few more things
>> during the exec process depend on cred_guard_mutex being held.
>>
>> If I'm reading this patch correctly, this changes the lifetime of the
>> cred_guard_mutex lock to be:
>> 	- during prepare_bprm_creds()
>> 	- from flush_old_exec() through install_exec_creds()
>> Before, cred_guard_mutex was held from prepare_bprm_creds() through
>> install_exec_creds().
>>
>> That means, for example, that check_unsafe_exec()'s documented invariant
>> is violated:
>>     /*
>>      * determine how safe it is to execute the proposed program
>>      * - the caller must hold ->cred_guard_mutex to protect against
>>      *   PTRACE_ATTACH or seccomp thread-sync
>>      */
>>     static void check_unsafe_exec(struct linux_binprm *bprm) ...
>> which is looking at no_new_privs as well as other details, and making
>> decisions about the bprm state from the current state.
>>
>> I think it also means that the potentially multiple invocations
>> of bprm_fill_uid() (via prepare_binprm() via binfmt_script.c and
>> binfmt_misc.c) would be changing bprm->cred details (uid, gid) without
>> a lock (another place where current's no_new_privs is evaluated).
>>
>> Related, it also means that cred_guard_mutex is unheld for every
>> invocation of search_binary_handler() (which can loop via the previously
>> mentioned binfmt_script.c and binfmt_misc.c), if any of them have hidden
>> dependencies on cred_guard_mutex. (Thought I only see bprm_fill_uid()
>> currently.)
> 
> So one issue I see with having to reacquire the cred_guard_mutex might
> be that this would allow tasks holding the cred_guard_mutex to block a
> killed exec'ing task from exiting, right?
> 

Yes maybe, but I think it will not be worse than it is now.
Since the second time the mutex is acquired it is done with
mutex_lock_killable, so at least kill -9 should get it terminated.


Bernd.