linux-kernel - Re: [PATCH] binderfs: implement "max" mount option

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181223112944.GC27818@kroah.com>
Date:   Sun, 23 Dec 2018 12:29:44 +0100
From:   Greg KH <gregkh@...uxfoundation.org>
To:     Christian Brauner <christian@...uner.io>
Cc:     tkjos@...roid.com, devel@...verdev.osuosl.org,
        linux-kernel@...r.kernel.org, joel@...lfernandes.org,
        arve@...roid.com, maco@...roid.com, Todd Kjos <tkjos@...gle.com>
Subject: Re: [PATCH] binderfs: implement "max" mount option

On Sat, Dec 22, 2018 at 10:18:06PM +0100, Christian Brauner wrote:
> Since binderfs can be mounted by userns root in non-initial user namespaces
> some precautions are in order. First, a way to set a maximum on the number
> of binder devices that can be allocated per binderfs instance and second, a
> way to reserve a reasonable chunk of binderfs devices for the initial ipc
> namespace.
> A first approach as seen in [1] used sysctls similiar to devpts but was
> shown to be flawed (cf. [2] and [3]) since some aspects were unneeded. This
> is an alternative approach which avoids sysctls completely and instead
> switches to a single mount option.
> 
> Starting with this commit binderfs instances can be mounted with a limit on
> the number of binder devices that can be allocated. The max=<count> mount
> option serves as a per-instance limit. If max=<count> is set then only
> <count> number of binder devices can be allocated in this binderfs
> instance.

Ok, this is fine, but why such a big default?  You only need 4 to run a
modern android system, and anyone using binder outside of android is
really too crazy to ever be using it in a container :)

> Additionally, the binderfs instance in the initial ipc namespace will
> always have a reserve of at least 1024 binder devices unless explicitly
> capped via max=<count>.

Again, why so many?  And why wouldn't that initial ipc namespace already
have their device nodes created _before_ anything else is mounted?

Some comments on the patch below:

> +/*
> + * Ensure that the initial ipc namespace always has a good chunk of devices
> + * available.
> + */
> +#define BINDERFS_MAX_MINOR_CAPPED (BINDERFS_MAX_MINOR - 1024)

Again that seems crazy big, how about splitting this into two different
patches, one for the max= stuff, and one for this "reserve some minors"
thing, so we can review them separately.

>  
>  static struct vfsmount *binderfs_mnt;
>  
> @@ -46,6 +52,24 @@ static dev_t binderfs_dev;
>  static DEFINE_MUTEX(binderfs_minors_mutex);
>  static DEFINE_IDA(binderfs_minors);
>  
> +/**
> + * binderfs_mount_opts - mount options for binderfs
> + * @max: maximum number of allocatable binderfs binder devices
> + */
> +struct binderfs_mount_opts {
> +	int max;
> +};
> +
> +enum {
> +	Opt_max,
> +	Opt_err
> +};
> +
> +static const match_table_t tokens = {
> +	{ Opt_max, "max=%d" },
> +	{ Opt_err, NULL     }
> +};
> +
>  /**
>   * binderfs_info - information about a binderfs mount
>   * @ipc_ns:         The ipc namespace the binderfs mount belongs to.
> @@ -55,13 +79,16 @@ static DEFINE_IDA(binderfs_minors);
>   *                  created.
>   * @root_gid:       gid that needs to be used when a new binder device is
>   *                  created.
> + * @mount_opts:     The mount options in use.
> + * @device_count:   The current number of allocated binder devices.
>   */
>  struct binderfs_info {
>  	struct ipc_namespace *ipc_ns;
>  	struct dentry *control_dentry;
>  	kuid_t root_uid;
>  	kgid_t root_gid;
> -
> +	struct binderfs_mount_opts mount_opts;
> +	atomic_t device_count;

Why atomic?

You should already have the lock held every time this is accessed,
so no need to use an atomic value, just use an int.

>  	/* Reserve new minor number for the new device. */
>  	mutex_lock(&binderfs_minors_mutex);
> -	minor = ida_alloc_max(&binderfs_minors, BINDERFS_MAX_MINOR, GFP_KERNEL);
> +	if (atomic_inc_return(&info->device_count) < info->mount_opts.max)

No need for atomic, see, your lock is held :)

thanks,

greg k-h