lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 09 Apr 2012 20:25:22 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Cyrill Gorcunov <gorcunov@...nvz.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Pavel Emelyanov <xemul@...allels.com>,
	Andrey Vagin <avagin@...nvz.org>,
	KOSAKI Motohiro <kosaki.motohiro@...il.com>,
	Ingo Molnar <mingo@...e.hu>, "H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Glauber Costa <glommer@...allels.com>,
	Andi Kleen <andi@...stfloor.org>, Tejun Heo <tj@...nel.org>,
	Matt Helsley <matthltc@...ibm.com>,
	Pekka Enberg <penberg@...nel.org>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Vasiliy Kulikov <segoon@...nwall.com>,
	Alexey Dobriyan <adobriyan@...il.com>, Valdis.Kletnieks@...edu,
	Michal Marek <mmarek@...e.cz>,
	Frederic Weisbecker <fweisbec@...il.com>,
	linux-kernel@...r.kernel.org, Jonathan Corbet <corbet@....net>
Subject: Re: + syscalls-x86-add-__nr_kcmp-syscall-v8.patch added to -mm tree

Andrew Morton <akpm@...ux-foundation.org> writes:

> Back on to kcmp.
>
> On Wed, 15 Feb 2012 20:27:52 +0400
> Cyrill Gorcunov <gorcunov@...nvz.org> wrote:
>
>> On Wed, Feb 15, 2012 at 05:06:52PM +0100, Oleg Nesterov wrote:
>> > Not a comment, but the question. I am just curious...
>> > 
>> > > +/*
>> > > + * We don't expose real in-memory order of objects for security
>> > > + * reasons, still the comparison results should be suitable for
>> > > + * sorting. Thus, we obfuscate kernel pointers values and compare
>> > > + * the production instead.
>> > > + */
>> > > +static unsigned long cookies[KCMP_TYPES][2] __read_mostly;
>> > > +
>> > > +static long kptr_obfuscate(long v, int type)
>> > > +{
>> > > +       return (v ^ cookies[type][0]) * cookies[type][1];
>> > > +}
>> > 
>> > OK, but why do we need this per type? Just to add more obfuscation
>> > or there is another reason?
>> 
>> Just to add more obfuscation.
>
> Having re-read most of the (enormous) email discussion on the kcmp()
> syscall patch, I'm thinking:
>
> - Nobody seems to understand the obfuscation logic.  Jon sounded
>   confused, Oleg sounds confused and it's rather unclear what it does,
>   how it does it and why it does it.

Peter explained it fairly well earlier.  The xor trivially makes sense
to me.  I don't recall what the multiplication does.

It would be nice if someone would get Peter's comment on why the
multiply into a comment.  But obscuring things makes sense.

> - Lots of people have looked at the code and made comments and there
>   have been lots of changes.  But we presently have zero Acked-by's and
>   Reviewed-by's.
>
> I guess this means that at present nobody is aware of any issues with
> the proposal, btu nobody is terribly excisted about it either?

Having just read through it again the only possible issue I can see is
that we compare file descriptors after dropping all of the locks.

Since rcu_read_lock doesn't participate in ABBA deadlocks. My gut feel
is that we should hold rcu_read_lock across the hole file pointer
comparison to remove the possibility of races as file descriptor
pointers go away.

Still in practice I don't think it matters.  At worst there is the
slightest possibility of returning a value instead of -EBADF.  The
expectation is for all of the tasks we are operating on to be frozen,
and even if the tasks are not frozen it is a very tiny window for a race
to be in.

> So what do people think?  Any issues?  Any nacks?  Should I sneak it
> into Linus this week or do we need to go another round with it all?

Acked-by: "Eric W. Biederman" <ebiederm@...ssion.com>

> I'd like to at least have a fighting chance of understnading what's
> going on with that obfuscation code.

My gut feel is that this code is good enough.  Any and all security
issues have been addressed.  Having the system call would seem
to add momentum to the people working on checkpoint/restart.
So I don't see why this patch should not go forward, unless someone
can point out an outright bug.

Eric

> From: Cyrill Gorcunov <gorcunov@...nvz.org>
> Subject: syscalls, x86: add __NR_kcmp syscall
>
> While doing the checkpoint-restore in the user space one need to determine
> whether various kernel objects (like mm_struct-s of file_struct-s) are
> shared between tasks and restore this state.
>
> The 2nd step can be solved by using appropriate CLONE_ flags and the
> unshare syscall, while there's currently no ways for solving the 1st one.
>
> One of the ways for checking whether two tasks share e.g.  mm_struct is to
> provide some mm_struct ID of a task to its proc file, but showing such
> info considered to be not that good for security reasons.
>
> Thus after some debates we end up in conclusion that using that named
> 'comparison' syscall might be the best candidate.  So here is it --
> __NR_kcmp.
>
> It takes up to 5 arguments - the pids of the two tasks (which
> characteristics should be compared), the comparison type and (in case of
> comparison of files) two file descriptors.
>
> Lookups for pids are done in the caller's PID namespace only.
>
> At moment only x86 is supported and tested.
>
> [akpm@...ux-foundation.org: fix up selftests, warnings]
> [akpm@...ux-foundation.org: include errno.h]
> Signed-off-by: Cyrill Gorcunov <gorcunov@...nvz.org>
> Cc: "Eric W. Biederman" <ebiederm@...ssion.com>
> Cc: Pavel Emelyanov <xemul@...allels.com>
> Cc: Andrey Vagin <avagin@...nvz.org>
> Cc: KOSAKI Motohiro <kosaki.motohiro@...il.com>
> Cc: Ingo Molnar <mingo@...e.hu>
> Cc: H. Peter Anvin <hpa@...or.com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Glauber Costa <glommer@...allels.com>
> Cc: Andi Kleen <andi@...stfloor.org>
> Cc: Tejun Heo <tj@...nel.org>
> Cc: Matt Helsley <matthltc@...ibm.com>
> Cc: Pekka Enberg <penberg@...nel.org>
> Cc: Eric Dumazet <eric.dumazet@...il.com>
> Cc: Vasiliy Kulikov <segoon@...nwall.com>
> Cc: Alexey Dobriyan <adobriyan@...il.com>
> Cc: Valdis.Kletnieks@...edu
> Cc: Michal Marek <mmarek@...e.cz>
> Cc: Frederic Weisbecker <fweisbec@...il.com>
> Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>
> ---
>
>  arch/x86/syscalls/syscall_32.tbl         |    1 
>  arch/x86/syscalls/syscall_64.tbl         |    1 
>  include/linux/kcmp.h                     |   17 +
>  include/linux/syscalls.h                 |    2 
>  kernel/Makefile                          |    3 
>  kernel/kcmp.c                            |  187 +++++++++++++++++++++
>  kernel/sys_ni.c                          |    3 
>  tools/testing/selftests/Makefile         |    2 
>  tools/testing/selftests/kcmp/Makefile    |   29 +++
>  tools/testing/selftests/kcmp/kcmp_test.c |   94 ++++++++++
>  10 files changed, 338 insertions(+), 1 deletion(-)
>
> diff -puN arch/x86/syscalls/syscall_32.tbl~syscalls-x86-add-__nr_kcmp-syscall-v8 arch/x86/syscalls/syscall_32.tbl
> --- a/arch/x86/syscalls/syscall_32.tbl~syscalls-x86-add-__nr_kcmp-syscall-v8
> +++ a/arch/x86/syscalls/syscall_32.tbl
> @@ -355,3 +355,4 @@
>  346	i386	setns			sys_setns
>  347	i386	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
>  348	i386	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
> +349	i386	kcmp			sys_kcmp
> diff -puN arch/x86/syscalls/syscall_64.tbl~syscalls-x86-add-__nr_kcmp-syscall-v8 arch/x86/syscalls/syscall_64.tbl
> --- a/arch/x86/syscalls/syscall_64.tbl~syscalls-x86-add-__nr_kcmp-syscall-v8
> +++ a/arch/x86/syscalls/syscall_64.tbl
> @@ -318,6 +318,7 @@
>  309	common	getcpu			sys_getcpu
>  310	64	process_vm_readv	sys_process_vm_readv
>  311	64	process_vm_writev	sys_process_vm_writev
> +312	64	kcmp			sys_kcmp
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
>  # for native 64-bit operation.
> diff -puN /dev/null include/linux/kcmp.h
> --- /dev/null
> +++ a/include/linux/kcmp.h
> @@ -0,0 +1,17 @@
> +#ifndef _LINUX_KCMP_H
> +#define _LINUX_KCMP_H
> +
> +/* Comparison type */
> +enum kcmp_type {
> +	KCMP_FILE,
> +	KCMP_VM,
> +	KCMP_FILES,
> +	KCMP_FS,
> +	KCMP_SIGHAND,
> +	KCMP_IO,
> +	KCMP_SYSVSEM,
> +
> +	KCMP_TYPES,
> +};
> +
> +#endif /* _LINUX_KCMP_H */
> diff -puN include/linux/syscalls.h~syscalls-x86-add-__nr_kcmp-syscall-v8 include/linux/syscalls.h
> --- a/include/linux/syscalls.h~syscalls-x86-add-__nr_kcmp-syscall-v8
> +++ a/include/linux/syscalls.h
> @@ -858,4 +858,6 @@ asmlinkage long sys_process_vm_writev(pi
>  				      unsigned long riovcnt,
>  				      unsigned long flags);
>  
> +asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type,
> +			 unsigned long idx1, unsigned long idx2);
>  #endif
> diff -puN kernel/Makefile~syscalls-x86-add-__nr_kcmp-syscall-v8 kernel/Makefile
> --- a/kernel/Makefile~syscalls-x86-add-__nr_kcmp-syscall-v8
> +++ a/kernel/Makefile
> @@ -25,6 +25,9 @@ endif
>  obj-y += sched/
>  obj-y += power/
>  
> +ifeq ($(CONFIG_CHECKPOINT_RESTORE),y)
> +obj-$(CONFIG_X86) += kcmp.o
> +endif
>  obj-$(CONFIG_FREEZER) += freezer.o
>  obj-$(CONFIG_PROFILING) += profile.o
>  obj-$(CONFIG_STACKTRACE) += stacktrace.o
> diff -puN /dev/null kernel/kcmp.c
> --- /dev/null
> +++ a/kernel/kcmp.c
> @@ -0,0 +1,187 @@
> +#include <linux/kernel.h>
> +#include <linux/syscalls.h>
> +#include <linux/fdtable.h>
> +#include <linux/string.h>
> +#include <linux/random.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/errno.h>
> +#include <linux/cache.h>
> +#include <linux/bug.h>
> +#include <linux/err.h>
> +#include <linux/kcmp.h>
> +
> +#include <asm/unistd.h>
> +
> +/*
> + * We don't expose real in-memory order of objects for security
> + * reasons, still the comparison results should be suitable for
> + * sorting. Thus, we obfuscate kernel pointers values and compare
> + * the production instead.
> + */
> +static unsigned long cookies[KCMP_TYPES][2] __read_mostly;
> +
> +static long kptr_obfuscate(long v, int type)
> +{
> +	return (v ^ cookies[type][0]) * cookies[type][1];
> +}
> +
> +/*
> + * 0 - equal, i.e. v1 = v2
> + * 1 - less than, i.e. v1 < v2
> + * 2 - greater than, i.e. v1 > v2
> + * 3 - not equal but ordering unavailable (reserved for future)
> + */
> +static int kcmp_ptr(void *v1, void *v2, enum kcmp_type type)
> +{
> +	long ret;
> +
> +	ret = kptr_obfuscate((long)v1, type) - kptr_obfuscate((long)v2, type);
> +
> +	return (ret < 0) | ((ret > 0) << 1);
> +}
> +
> +/* The caller must have pinned the task */
> +static struct file *
> +get_file_raw_ptr(struct task_struct *task, unsigned int idx)
> +{
> +	struct file *file = NULL;
> +
> +	task_lock(task);
> +	rcu_read_lock();
> +
> +	if (task->files)
> +		file = fcheck_files(task->files, idx);
> +
> +	rcu_read_unlock();
> +	task_unlock(task);
> +
> +	return file;
> +}
> +
> +static void kcmp_unlock(struct mutex *m1, struct mutex *m2)
> +{
> +	if (likely(m2 != m1))
> +		mutex_unlock(m2);
> +	mutex_unlock(m1);
> +}
> +
> +static int kcmp_lock(struct mutex *m1, struct mutex *m2)
> +{
> +	int err;
> +
> +	if (m2 > m1)
> +		swap(m1, m2);
> +
> +	err = mutex_lock_killable(m1);
> +	if (!err && likely(m1 != m2)) {
> +		err = mutex_lock_killable_nested(m2, SINGLE_DEPTH_NESTING);
> +		if (err)
> +			mutex_unlock(m1);
> +	}
> +
> +	return err;
> +}
> +
> +SYSCALL_DEFINE5(kcmp, pid_t, pid1, pid_t, pid2, int, type,
> +		unsigned long, idx1, unsigned long, idx2)
> +{
> +	struct task_struct *task1, *task2;
> +	int ret;
> +
> +	rcu_read_lock();
> +
> +	/*
> +	 * Tasks are looked up in caller's PID namespace only.
> +	 */
> +	task1 = find_task_by_vpid(pid1);
> +	task2 = find_task_by_vpid(pid2);
> +	if (!task1 || !task2)
> +		goto err_no_task;
> +
> +	get_task_struct(task1);
> +	get_task_struct(task2);
> +
> +	rcu_read_unlock();
> +
> +	/*
> +	 * One should have enough rights to inspect task details.
> +	 */
> +	ret = kcmp_lock(&task1->signal->cred_guard_mutex,
> +			&task2->signal->cred_guard_mutex);
> +	if (ret)
> +		goto err;
> +	if (!ptrace_may_access(task1, PTRACE_MODE_READ) ||
> +	    !ptrace_may_access(task2, PTRACE_MODE_READ)) {
> +		ret = -EPERM;
> +		goto err_unlock;
> +	}
> +
> +	switch (type) {
> +	case KCMP_FILE: {
> +		struct file *filp1, *filp2;
> +
> +		filp1 = get_file_raw_ptr(task1, idx1);
> +		filp2 = get_file_raw_ptr(task2, idx2);
> +
> +		if (filp1 && filp2)
> +			ret = kcmp_ptr(filp1, filp2, KCMP_FILE);
> +		else
> +			ret = -EBADF;
> +		break;
> +	}
> +	case KCMP_VM:
> +		ret = kcmp_ptr(task1->mm, task2->mm, KCMP_VM);
> +		break;
> +	case KCMP_FILES:
> +		ret = kcmp_ptr(task1->files, task2->files, KCMP_FILES);
> +		break;
> +	case KCMP_FS:
> +		ret = kcmp_ptr(task1->fs, task2->fs, KCMP_FS);
> +		break;
> +	case KCMP_SIGHAND:
> +		ret = kcmp_ptr(task1->sighand, task2->sighand, KCMP_SIGHAND);
> +		break;
> +	case KCMP_IO:
> +		ret = kcmp_ptr(task1->io_context, task2->io_context, KCMP_IO);
> +		break;
> +	case KCMP_SYSVSEM:
> +#ifdef CONFIG_SYSVIPC
> +		ret = kcmp_ptr(task1->sysvsem.undo_list,
> +			       task2->sysvsem.undo_list,
> +			       KCMP_SYSVSEM);
> +#else
> +		ret = -EOPNOTSUPP;
> +#endif
> +		break;
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +
> +err_unlock:
> +	kcmp_unlock(&task1->signal->cred_guard_mutex,
> +		    &task2->signal->cred_guard_mutex);
> +err:
> +	put_task_struct(task1);
> +	put_task_struct(task2);
> +
> +	return ret;
> +
> +err_no_task:
> +	rcu_read_unlock();
> +	return -ESRCH;
> +}
> +
> +static __init int kcmp_cookies_init(void)
> +{
> +	int i;
> +
> +	get_random_bytes(cookies, sizeof(cookies));
> +
> +	for (i = 0; i < KCMP_TYPES; i++)
> +		cookies[i][1] |= (~(~0UL >>  1) | 1);
> +
> +	return 0;
> +}
> +arch_initcall(kcmp_cookies_init);
> diff -puN kernel/sys_ni.c~syscalls-x86-add-__nr_kcmp-syscall-v8 kernel/sys_ni.c
> --- a/kernel/sys_ni.c~syscalls-x86-add-__nr_kcmp-syscall-v8
> +++ a/kernel/sys_ni.c
> @@ -203,3 +203,6 @@ cond_syscall(sys_fanotify_mark);
>  cond_syscall(sys_name_to_handle_at);
>  cond_syscall(sys_open_by_handle_at);
>  cond_syscall(compat_sys_open_by_handle_at);
> +
> +/* compare kernel pointers */
> +cond_syscall(sys_kcmp);
> diff -puN tools/testing/selftests/Makefile~syscalls-x86-add-__nr_kcmp-syscall-v8 tools/testing/selftests/Makefile
> --- a/tools/testing/selftests/Makefile~syscalls-x86-add-__nr_kcmp-syscall-v8
> +++ a/tools/testing/selftests/Makefile
> @@ -1,4 +1,4 @@
> -TARGETS = breakpoints vm
> +TARGETS = breakpoints vm kcmp
>  
>  all:
>  	for TARGET in $(TARGETS); do \
> diff -puN /dev/null tools/testing/selftests/kcmp/Makefile
> --- /dev/null
> +++ a/tools/testing/selftests/kcmp/Makefile
> @@ -0,0 +1,29 @@
> +uname_M := $(shell uname -m 2>/dev/null || echo not)
> +ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/)
> +ifeq ($(ARCH),i386)
> +        ARCH := X86
> +	CFLAGS := -DCONFIG_X86_32 -D__i386__
> +endif
> +ifeq ($(ARCH),x86_64)
> +	ARCH := X86
> +	CFLAGS := -DCONFIG_X86_64 -D__x86_64__
> +endif
> +
> +CFLAGS += -I../../../../arch/x86/include/generated/
> +CFLAGS += -I../../../../include/
> +CFLAGS += -I../../../../usr/include/
> +CFLAGS += -I../../../../arch/x86/include/
> +
> +all:
> +ifeq ($(ARCH),X86)
> +	gcc $(CFLAGS) kcmp_test.c -o run_test
> +else
> +	echo "Not an x86 target, can't build kcmp selftest"
> +endif
> +
> +run-tests: all
> +	./kcmp_test
> +
> +clean:
> +	rm -fr ./run_test
> +	rm -fr ./test-file
> diff -puN /dev/null tools/testing/selftests/kcmp/kcmp_test.c
> --- /dev/null
> +++ a/tools/testing/selftests/kcmp/kcmp_test.c
> @@ -0,0 +1,94 @@
> +#define _GNU_SOURCE
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <signal.h>
> +#include <limits.h>
> +#include <unistd.h>
> +#include <errno.h>
> +#include <string.h>
> +#include <fcntl.h>
> +
> +#include <linux/unistd.h>
> +#include <linux/kcmp.h>
> +
> +#include <sys/syscall.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <sys/wait.h>
> +
> +static long sys_kcmp(int pid1, int pid2, int type, int fd1, int fd2)
> +{
> +	return syscall(__NR_kcmp, pid1, pid2, type, fd1, fd2);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	const char kpath[] = "kcmp-test-file";
> +	int pid1, pid2;
> +	int fd1, fd2;
> +	int status;
> +
> +	fd1 = open(kpath, O_RDWR | O_CREAT | O_TRUNC, 0644);
> +	pid1 = getpid();
> +
> +	if (fd1 < 0) {
> +		perror("Can't create file");
> +		exit(1);
> +	}
> +
> +	pid2 = fork();
> +	if (pid2 < 0) {
> +		perror("fork failed");
> +		exit(1);
> +	}
> +
> +	if (!pid2) {
> +		int pid2 = getpid();
> +		int ret;
> +
> +		fd2 = open(kpath, O_RDWR, 0644);
> +		if (fd2 < 0) {
> +			perror("Can't open file");
> +			exit(1);
> +		}
> +
> +		/* An example of output and arguments */
> +		printf("pid1: %6d pid2: %6d FD: %2ld FILES: %2ld VM: %2ld "
> +		       "FS: %2ld SIGHAND: %2ld IO: %2ld SYSVSEM: %2ld "
> +		       "INV: %2ld\n",
> +		       pid1, pid2,
> +		       sys_kcmp(pid1, pid2, KCMP_FILE,		fd1, fd2),
> +		       sys_kcmp(pid1, pid2, KCMP_FILES,		0, 0),
> +		       sys_kcmp(pid1, pid2, KCMP_VM,		0, 0),
> +		       sys_kcmp(pid1, pid2, KCMP_FS,		0, 0),
> +		       sys_kcmp(pid1, pid2, KCMP_SIGHAND,	0, 0),
> +		       sys_kcmp(pid1, pid2, KCMP_IO,		0, 0),
> +		       sys_kcmp(pid1, pid2, KCMP_SYSVSEM,	0, 0),
> +
> +			/* This one should fail */
> +		       sys_kcmp(pid1, pid2, KCMP_TYPES + 1,	0, 0));
> +
> +		/* This one should return same fd */
> +		ret = sys_kcmp(pid1, pid2, KCMP_FILE, fd1, fd1);
> +		if (ret) {
> +			printf("FAIL: 0 expected but %d returned\n", ret);
> +			ret = -1;
> +		} else
> +			printf("PASS: 0 returned as expected\n");
> +
> +		/* Compare with self */
> +		ret = sys_kcmp(pid1, pid1, KCMP_VM, 0, 0);
> +		if (ret) {
> +			printf("FAIL: 0 expected but %li returned\n", ret);
> +			ret = -1;
> +		} else
> +			printf("PASS: 0 returned as expected\n");
> +
> +		exit(ret);
> +	}
> +
> +	waitpid(pid2, &status, P_ALL);
> +
> +	return 0;
> +}
> _
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ