lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87a74zmfc9.fsf@x220.int.ebiederm.org>
Date:   Mon, 02 Mar 2020 00:38:14 -0600
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Bernd Edlinger <bernd.edlinger@...mail.de>
Cc:     Jann Horn <jannh@...gle.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Jonathan Corbet <corbet@....net>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Alexey Dobriyan <adobriyan@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Oleg Nesterov <oleg@...hat.com>,
        Frederic Weisbecker <frederic@...nel.org>,
        Andrei Vagin <avagin@...il.com>,
        Ingo Molnar <mingo@...nel.org>,
        "Peter Zijlstra \(Intel\)" <peterz@...radead.org>,
        Yuyang Du <duyuyang@...il.com>,
        David Hildenbrand <david@...hat.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Anshuman Khandual <anshuman.khandual@....com>,
        David Howells <dhowells@...hat.com>,
        James Morris <jamorris@...ux.microsoft.com>,
        Kees Cook <keescook@...omium.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Shakeel Butt <shakeelb@...gle.com>,
        Jason Gunthorpe <jgg@...pe.ca>,
        Christian Kellner <christian@...lner.me>,
        Andrea Arcangeli <aarcange@...hat.com>,
        Aleksa Sarai <cyphar@...har.com>,
        "Dmitry V. Levin" <ldv@...linux.org>,
        "linux-doc\@vger.kernel.org" <linux-doc@...r.kernel.org>,
        "linux-kernel\@vger.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-fsdevel\@vger.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-mm\@kvack.org" <linux-mm@...ck.org>,
        "stable\@vger.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCHv2] exec: Fix a deadlock in ptrace

Bernd Edlinger <bernd.edlinger@...mail.de> writes:

> This fixes a deadlock in the tracer when tracing a multi-threaded
> application that calls execve while more than one thread are running.
>
> I observed that when running strace on the gcc test suite, it always
> blocks after a while, when expect calls execve, because other threads
> have to be terminated.  They send ptrace events, but the strace is no
> longer able to respond, since it is blocked in vm_access.
>
> The deadlock is always happening when strace needs to access the
> tracees process mmap, while another thread in the tracee starts to
> execve a child process, but that cannot continue until the
> PTRACE_EVENT_EXIT is handled and the WIFEXITED event is received:

I think your patch works, but I don't think to solve your case another
mutex is necessary.  Possibly it is justified, but I hesitate to
introduce yet another concept in the code.

Having read elsewhere in the thread that this does not solve the problem
Oleg has mentioned I am really hesitant to add more complexity to the
situation.


For your case there is a straight forward and local workaround.

When the current task is ptracing the target task don't bother with
cred_gaurd_mutex and ptrace_may_access in access_mm as those tests
have already passed.  Instead just confirm the ptrace status. AKA
the permission check in ptraces_access_vm.

I think something like this is all we need.

diff --git a/kernel/fork.c b/kernel/fork.c
index cee89229606a..b0ab98c84589 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1224,6 +1224,16 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode)
 	struct mm_struct *mm;
 	int err;
 
+	if (task->ptrace && (current == task->parent)) {
+		mm = get_task_mm(task);
+		if ((get_dumpable(mm) != SUID_DUMP_USER) &&
+		    !ptracer_capable(task, mm->user_ns)) {
+			mmput(mm);
+			mm = ERR_PTR(-EACCESS);
+		}
+		return mm;
+	}
+
 	err =  mutex_lock_killable(&task->signal->cred_guard_mutex);
 	if (err)
 		return ERR_PTR(err);

Does this solve your test case?

The patch above is short the approriate locking for the ptrace attached
check.  (tasklist_lock I think).  But is enough to illustrate the idea,
and it is probably a check we want in any event so that if the tracer
starts dropping privileges process_vm_readv and process_vm_writev will
still be usable by the tracer.

Eric


> strace          D    0 30614  30584 0x00000000
> Call Trace:
> __schedule+0x3ce/0x6e0
> schedule+0x5c/0xd0
> schedule_preempt_disabled+0x15/0x20
> __mutex_lock.isra.13+0x1ec/0x520
> __mutex_lock_killable_slowpath+0x13/0x20
> mutex_lock_killable+0x28/0x30
> mm_access+0x27/0xa0
> process_vm_rw_core.isra.3+0xff/0x550
> process_vm_rw+0xdd/0xf0
> __x64_sys_process_vm_readv+0x31/0x40
> do_syscall_64+0x64/0x220
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> expect          D    0 31933  30876 0x80004003
> Call Trace:
> __schedule+0x3ce/0x6e0
> schedule+0x5c/0xd0
> flush_old_exec+0xc4/0x770
> load_elf_binary+0x35a/0x16c0
> search_binary_handler+0x97/0x1d0
> __do_execve_file.isra.40+0x5d4/0x8a0
> __x64_sys_execve+0x49/0x60
> do_syscall_64+0x64/0x220
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> The proposed solution is to have a second mutex that is
> used in mm_access, so it is allowed to continue while the
> dying threads are not yet terminated.
>
> I also took the opportunity to improve the documentation
> of prepare_creds, which is obviously out of sync.
>
> Signed-off-by: Bernd Edlinger <bernd.edlinger@...mail.de>
> ---
>  Documentation/security/credentials.rst    | 18 ++++++------
>  fs/exec.c                                 |  9 ++++++
>  include/linux/binfmts.h                   |  6 +++-
>  include/linux/sched/signal.h              |  1 +
>  init/init_task.c                          |  1 +
>  kernel/cred.c                             |  2 +-
>  kernel/fork.c                             |  5 ++--
>  mm/process_vm_access.c                    |  2 +-
>  tools/testing/selftests/ptrace/Makefile   |  4 +--
>  tools/testing/selftests/ptrace/vmaccess.c | 46 +++++++++++++++++++++++++++++++
>  10 files changed, 79 insertions(+), 15 deletions(-)
>  create mode 100644 tools/testing/selftests/ptrace/vmaccess.c
>
> v2: adds a test case which passes when this patch is applied.
>
>
> diff --git a/Documentation/security/credentials.rst b/Documentation/security/credentials.rst
> index 282e79f..c98e0a8 100644
> --- a/Documentation/security/credentials.rst
> +++ b/Documentation/security/credentials.rst
> @@ -437,9 +437,13 @@ new set of credentials by calling::
>  
>  	struct cred *prepare_creds(void);
>  
> -this locks current->cred_replace_mutex and then allocates and constructs a
> -duplicate of the current process's credentials, returning with the mutex still
> -held if successful.  It returns NULL if not successful (out of memory).
> +this allocates and constructs a duplicate of the current process's credentials.
> +It returns NULL if not successful (out of memory).
> +
> +If called from __do_execve_file, the mutex current->signal->cred_guard_mutex
> +is acquired before this function gets called, and the mutex
> +current->signal->cred_change_mutex is acquired later, while the credentials
> +and the process mmap are actually changed.
>  
>  The mutex prevents ``ptrace()`` from altering the ptrace state of a process
>  while security checks on credentials construction and changing is taking place
> @@ -466,9 +470,8 @@ by calling::
>  
>  This will alter various aspects of the credentials and the process, giving the
>  LSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to
> -actually commit the new credentials to ``current->cred``, it will release
> -``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it
> -will notify the scheduler and others of the changes.
> +actually commit the new credentials to ``current->cred``, and it will notify
> +the scheduler and others of the changes.
>  
>  This function is guaranteed to return 0, so that it can be tail-called at the
>  end of such functions as ``sys_setresuid()``.
> @@ -486,8 +489,7 @@ invoked::
>  
>  	void abort_creds(struct cred *new);
>  
> -This releases the lock on ``current->cred_replace_mutex`` that
> -``prepare_creds()`` got and then releases the new credentials.
> +This releases the new credentials.
>  
>  
>  A typical credentials alteration function would look something like this::
> diff --git a/fs/exec.c b/fs/exec.c
> index 74d88da..a6884e4 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1266,6 +1266,12 @@ int flush_old_exec(struct linux_binprm * bprm)
>  	if (retval)
>  		goto out;
>  
> +	retval = mutex_lock_killable(&current->signal->cred_change_mutex);
> +	if (retval)
> +		goto out;
> +
> +	bprm->called_flush_old_exec = 1;
> +
>  	/*
>  	 * Must be called _before_ exec_mmap() as bprm->mm is
>  	 * not visibile until then. This also enables the update
> @@ -1420,6 +1426,8 @@ static void free_bprm(struct linux_binprm *bprm)
>  {
>  	free_arg_pages(bprm);
>  	if (bprm->cred) {
> +		if (bprm->called_flush_old_exec)
> +			mutex_unlock(&current->signal->cred_change_mutex);
>  		mutex_unlock(&current->signal->cred_guard_mutex);
>  		abort_creds(bprm->cred);
>  	}
> @@ -1469,6 +1477,7 @@ void install_exec_creds(struct linux_binprm *bprm)
>  	 * credentials; any time after this it may be unlocked.
>  	 */
>  	security_bprm_committed_creds(bprm);
> +	mutex_unlock(&current->signal->cred_change_mutex);
>  	mutex_unlock(&current->signal->cred_guard_mutex);
>  }
>  EXPORT_SYMBOL(install_exec_creds);
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index b40fc63..2e1318b 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -44,7 +44,11 @@ struct linux_binprm {
>  		 * exec has happened. Used to sanitize execution environment
>  		 * and to set AT_SECURE auxv for glibc.
>  		 */
> -		secureexec:1;
> +		secureexec:1,
> +		/*
> +		 * Set by flush_old_exec, when the cred_change_mutex is taken.
> +		 */
> +		called_flush_old_exec:1;
>  #ifdef __alpha__
>  	unsigned int taso:1;
>  #endif
> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 8805025..37eeabe 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -225,6 +225,7 @@ struct signal_struct {
>  	struct mutex cred_guard_mutex;	/* guard against foreign influences on
>  					 * credential calculations
>  					 * (notably. ptrace) */
> +	struct mutex cred_change_mutex; /* guard against credentials change */
>  } __randomize_layout;
>  
>  /*
> diff --git a/init/init_task.c b/init/init_task.c
> index 9e5cbe5..6cd9a0f 100644
> --- a/init/init_task.c
> +++ b/init/init_task.c
> @@ -26,6 +26,7 @@
>  	.multiprocess	= HLIST_HEAD_INIT,
>  	.rlim		= INIT_RLIMITS,
>  	.cred_guard_mutex = __MUTEX_INITIALIZER(init_signals.cred_guard_mutex),
> +	.cred_change_mutex = __MUTEX_INITIALIZER(init_signals.cred_change_mutex),
>  #ifdef CONFIG_POSIX_TIMERS
>  	.posix_timers = LIST_HEAD_INIT(init_signals.posix_timers),
>  	.cputimer	= {
> diff --git a/kernel/cred.c b/kernel/cred.c
> index 809a985..e4c78de 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -676,7 +676,7 @@ void __init cred_init(void)
>   *
>   * Returns the new credentials or NULL if out of memory.
>   *
> - * Does not take, and does not return holding current->cred_replace_mutex.
> + * Does not take, and does not return holding ->cred_guard_mutex.
>   */
>  struct cred *prepare_kernel_cred(struct task_struct *daemon)
>  {
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 0808095..0395154 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1224,7 +1224,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode)
>  	struct mm_struct *mm;
>  	int err;
>  
> -	err =  mutex_lock_killable(&task->signal->cred_guard_mutex);
> +	err =  mutex_lock_killable(&task->signal->cred_change_mutex);
>  	if (err)
>  		return ERR_PTR(err);
>  
> @@ -1234,7 +1234,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode)
>  		mmput(mm);
>  		mm = ERR_PTR(-EACCES);
>  	}
> -	mutex_unlock(&task->signal->cred_guard_mutex);
> +	mutex_unlock(&task->signal->cred_change_mutex);
>  
>  	return mm;
>  }
> @@ -1594,6 +1594,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
>  	sig->oom_score_adj_min = current->signal->oom_score_adj_min;
>  
>  	mutex_init(&sig->cred_guard_mutex);
> +	mutex_init(&sig->cred_change_mutex);
>  
>  	return 0;
>  }
> diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
> index 357aa7b..b3e6eb5 100644
> --- a/mm/process_vm_access.c
> +++ b/mm/process_vm_access.c
> @@ -204,7 +204,7 @@ static ssize_t process_vm_rw_core(pid_t pid, struct iov_iter *iter,
>  	if (!mm || IS_ERR(mm)) {
>  		rc = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
>  		/*
> -		 * Explicitly map EACCES to EPERM as EPERM is a more a
> +		 * Explicitly map EACCES to EPERM as EPERM is a more
>  		 * appropriate error code for process_vw_readv/writev
>  		 */
>  		if (rc == -EACCES)
> diff --git a/tools/testing/selftests/ptrace/Makefile b/tools/testing/selftests/ptrace/Makefile
> index c0b7f89..2f1f532 100644
> --- a/tools/testing/selftests/ptrace/Makefile
> +++ b/tools/testing/selftests/ptrace/Makefile
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -CFLAGS += -iquote../../../../include/uapi -Wall
> +CFLAGS += -std=c99 -pthread -iquote../../../../include/uapi -Wall
>  
> -TEST_GEN_PROGS := get_syscall_info peeksiginfo
> +TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess
>  
>  include ../lib.mk
> diff --git a/tools/testing/selftests/ptrace/vmaccess.c b/tools/testing/selftests/ptrace/vmaccess.c
> new file mode 100644
> index 0000000..ef08c9f
> --- /dev/null
> +++ b/tools/testing/selftests/ptrace/vmaccess.c
> @@ -0,0 +1,46 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Copyright (c) 2020 Bernd Edlinger <bernd.edlinger@...mail.de>
> + * All rights reserved.
> + *
> + * Check whether /proc/$pid/mem can be accessed without causing deadlocks
> + * when de_thread is blocked with ->cred_guard_mutex held.
> + */
> +
> +#include "../kselftest_harness.h"
> +#include <stdio.h>
> +#include <fcntl.h>
> +#include <pthread.h>
> +#include <signal.h>
> +#include <unistd.h>
> +#include <sys/ptrace.h>
> +
> +static void *thread(void *arg)
> +{
> +	ptrace(PTRACE_TRACEME, 0, 0, 0);
> +	return NULL;
> +}
> +
> +TEST(vmaccess)
> +{
> +	int f, pid = fork();
> +	char mm[64];
> +
> +	if (!pid) {
> +		pthread_t pt;
> +		pthread_create(&pt, NULL, thread, NULL);
> +		pthread_join(pt, NULL);
> +		execlp("true", "true", NULL);
> +	}
> +
> +	sleep(1);
> +	sprintf(mm, "/proc/%d/mem", pid);
> +	f = open(mm, O_RDONLY);
> +	ASSERT_LE(0, f)
> +		close(f);
> +	/* this is not fixed! ptrace(PTRACE_ATTACH, pid, 0,0); */
> +	f = kill(pid, SIGCONT);
> +	ASSERT_EQ(0, f);
> +}
> +
> +TEST_HARNESS_MAIN

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ