linux-kernel - Re: [patch 07/10] unprivileged mounts: add sysctl tunable for "safe" property

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080206202110.GA20528@sergelap.ibm.com>
Date:	Wed, 6 Feb 2008 14:21:10 -0600
From:	"Serge E. Hallyn" <serue@...ibm.com>
To:	Miklos Szeredi <miklos@...redi.hu>
Cc:	akpm@...ux-foundation.org, hch@...radead.org, serue@...ibm.com,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [patch 07/10] unprivileged mounts: add sysctl tunable for
	"safe" property

Quoting Miklos Szeredi (miklos@...redi.hu):
> From: Miklos Szeredi <mszeredi@...e.cz>
> 
> Add the following:
> 
>   /proc/sys/fs/types/${FS_TYPE}/usermount_safe
> 
> Signed-off-by: Miklos Szeredi <mszeredi@...e.cz>

Thanks, Miklos, good explanations in the docs.

Acked-by: Serge Hallyn <serue@...ibm.com>

One comment inline, but not imo your problem :)

> ---
> 
> Index: linux/fs/filesystems.c
> ===================================================================
> --- linux.orig/fs/filesystems.c	2008-02-04 23:47:46.000000000 +0100
> +++ linux/fs/filesystems.c	2008-02-04 23:48:04.000000000 +0100
> @@ -12,6 +12,7 @@
>  #include <linux/kmod.h>
>  #include <linux/init.h>
>  #include <linux/module.h>
> +#include <linux/sysctl.h>
>  #include <asm/uaccess.h>
> 
>  /*
> @@ -51,6 +52,57 @@ static struct file_system_type **find_fi
>  	return p;
>  }
> 
> +#define MAX_FILESYSTEM_VARS 1
> +
> +struct filesystem_sysctl_table {
> +	struct ctl_table_header *header;
> +	struct ctl_table table[MAX_FILESYSTEM_VARS + 1];
> +};
> +
> +/*
> + * Create /sys/fs/types/${FSNAME} directory with per fs-type tunables.
> + */
> +static int filesystem_sysctl_register(struct file_system_type *fs)
> +{
> +	struct filesystem_sysctl_table *t;
> +	struct ctl_path path[] = {
> +		{ .procname = "fs", .ctl_name = CTL_FS },
> +		{ .procname = "types", .ctl_name = CTL_UNNUMBERED },
> +		{ .procname = fs->name, .ctl_name = CTL_UNNUMBERED },
> +		{ }
> +	};
> +
> +	t = kzalloc(sizeof(*t), GFP_KERNEL);
> +	if (!t)
> +		return -ENOMEM;
> +
> +
> +	t->table[0].ctl_name = CTL_UNNUMBERED;
> +	t->table[0].procname = "usermount_safe";
> +	t->table[0].maxlen = sizeof(int);
> +	t->table[0].data = &fs->fs_safe;
> +	t->table[0].mode = 0644;

Yikes, this could be a problem for containers, as it's simply tied to
uid 0, whereas tying it to a capability would let us solve it with
capability bounds.

This might mean more urgency to get user namespaces working at least
with sysfs, else this is a quick way around having CAP_SYS_ADMIN taken
out of a container's capability bounding set.

> +	t->table[0].proc_handler = &proc_dointvec;
> +
> +	t->header = register_sysctl_paths(path, t->table);
> +	if (!t->header) {
> +		kfree(t);
> +		return -ENOMEM;
> +	}
> +
> +	fs->sysctl_table = t;
> +
> +	return 0;
> +}
> +
> +static void filesystem_sysctl_unregister(struct file_system_type *fs)
> +{
> +	struct filesystem_sysctl_table *t = fs->sysctl_table;
> +
> +	unregister_sysctl_table(t->header);
> +	kfree(t);
> +}
> +
>  /**
>   *	register_filesystem - register a new filesystem
>   *	@fs: the file system structure
> @@ -80,6 +132,13 @@ int register_filesystem(struct file_syst
>  	else
>  		*p = fs;
>  	write_unlock(&file_systems_lock);
> +
> +	if (res == 0) {
> +		res = filesystem_sysctl_register(fs);
> +		if (res != 0)
> +			unregister_filesystem(fs);
> +	}
> +
>  	return res;
>  }
> 
> @@ -108,6 +167,7 @@ int unregister_filesystem(struct file_sy
>  			*tmp = fs->next;
>  			fs->next = NULL;
>  			write_unlock(&file_systems_lock);
> +			filesystem_sysctl_unregister(fs);
>  			return 0;
>  		}
>  		tmp = &(*tmp)->next;
> Index: linux/include/linux/fs.h
> ===================================================================
> --- linux.orig/include/linux/fs.h	2008-02-04 23:48:02.000000000 +0100
> +++ linux/include/linux/fs.h	2008-02-04 23:48:04.000000000 +0100
> @@ -1444,6 +1444,7 @@ struct file_system_type {
>  	struct module *owner;
>  	struct file_system_type * next;
>  	struct list_head fs_supers;
> +	struct filesystem_sysctl_table *sysctl_table;
> 
>  	struct lock_class_key s_lock_key;
>  	struct lock_class_key s_umount_key;
> Index: linux/Documentation/filesystems/proc.txt
> ===================================================================
> --- linux.orig/Documentation/filesystems/proc.txt	2008-02-04 23:47:58.000000000 +0100
> +++ linux/Documentation/filesystems/proc.txt	2008-02-04 23:48:04.000000000 +0100
> @@ -44,6 +44,7 @@ Table of Contents
>    2.14	/proc/<pid>/io - Display the IO accounting fields
>    2.15	/proc/<pid>/coredump_filter - Core dump filtering settings
>    2.16	/proc/<pid>/mountinfo - Information about mounts
> +  2.17	/proc/sys/fs/types - File system type specific parameters
> 
>  ------------------------------------------------------------------------------
>  Preface
> @@ -2392,4 +2393,34 @@ For more information see:
>    Documentation/filesystems/sharedsubtree.txt
> 
> 
> +2.17 /proc/sys/fs/types/ - File system type specific parameters
> +----------------------------------------------------------------
> +
> +There's a separate directory /proc/sys/fs/types/<type>/ for each
> +filesystem type, containing the following files:
> +
> +usermount_safe
> +--------------
> +
> +Setting this to non-zero will allow filesystems of this type to be
> +mounted by unprivileged users (note, that there are other
> +prerequisites as well).
> +
> +Fuse has been designed to be as safe as possible, and some
> +distributions already ship with unprivileged fuse mounts enabled by
> +default.  There are still some situations (multi-user systems with
> +untrusted users in particular), where enabling this for fuse might not
> +be appropriate.  For more details, see Documentation/filesystems/fuse.txt
> +
> +Procfs is also safe, but unprivileged mounting of it is not usually
> +necessary (bind mounting is equivalent).
> +
> +Most other filesystems are unsafe.  Here are just some of the issues,
> +that must be resolved before a filesystem can be declared safe:
> +
> + - no strict input checking (buffer overruns, directory loops, etc)
> + - network filesystem deadlocks when mounting from localhost
> + - no permission checking when opening the device
> + - changing mount options when mounting a new instance of a filesystem
> +
>  ------------------------------------------------------------------------------
> 
> --
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/