linux-kernel - Re: [PATCH] binderfs: implement "max" mount option

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181223130339.iyadarp53jhu4byi@brauner.io>
Date:   Sun, 23 Dec 2018 14:03:40 +0100
From:   Christian Brauner <christian@...uner.io>
To:     Greg KH <gregkh@...uxfoundation.org>
Cc:     tkjos@...roid.com, devel@...verdev.osuosl.org,
        linux-kernel@...r.kernel.org, joel@...lfernandes.org,
        arve@...roid.com, maco@...roid.com, Todd Kjos <tkjos@...gle.com>
Subject: Re: [PATCH] binderfs: implement "max" mount option

On Sun, Dec 23, 2018 at 12:29:44PM +0100, Greg KH wrote:
> On Sat, Dec 22, 2018 at 10:18:06PM +0100, Christian Brauner wrote:
> > Since binderfs can be mounted by userns root in non-initial user namespaces
> > some precautions are in order. First, a way to set a maximum on the number
> > of binder devices that can be allocated per binderfs instance and second, a
> > way to reserve a reasonable chunk of binderfs devices for the initial ipc
> > namespace.
> > A first approach as seen in [1] used sysctls similiar to devpts but was
> > shown to be flawed (cf. [2] and [3]) since some aspects were unneeded. This
> > is an alternative approach which avoids sysctls completely and instead
> > switches to a single mount option.
> > 
> > Starting with this commit binderfs instances can be mounted with a limit on
> > the number of binder devices that can be allocated. The max=<count> mount
> > option serves as a per-instance limit. If max=<count> is set then only
> > <count> number of binder devices can be allocated in this binderfs
> > instance.
> 
> Ok, this is fine, but why such a big default?  You only need 4 to run a
> modern android system, and anyone using binder outside of android is
> really too crazy to ever be using it in a container :)
> 
> > Additionally, the binderfs instance in the initial ipc namespace will
> > always have a reserve of at least 1024 binder devices unless explicitly
> > capped via max=<count>.
> 
> Again, why so many?  And why wouldn't that initial ipc namespace already
> have their device nodes created _before_ anything else is mounted?

Right, my issue is with re-creating devices, like if binderfs gets
unmounted or if devices get removed via rm. But we can lower the number
to 4 (see below).

> 
> Some comments on the patch below:

Thanks!

> 
> > +/*
> > + * Ensure that the initial ipc namespace always has a good chunk of devices
> > + * available.
> > + */
> > +#define BINDERFS_MAX_MINOR_CAPPED (BINDERFS_MAX_MINOR - 1024)
> 
> Again that seems crazy big, how about splitting this into two different
> patches, one for the max= stuff, and one for this "reserve some minors"
> thing, so we can review them separately.

Yes, let's do that. I will also lower this to 4 reserved devices.

> 
> >  
> >  static struct vfsmount *binderfs_mnt;
> >  
> > @@ -46,6 +52,24 @@ static dev_t binderfs_dev;
> >  static DEFINE_MUTEX(binderfs_minors_mutex);
> >  static DEFINE_IDA(binderfs_minors);
> >  
> > +/**
> > + * binderfs_mount_opts - mount options for binderfs
> > + * @max: maximum number of allocatable binderfs binder devices
> > + */
> > +struct binderfs_mount_opts {
> > +	int max;
> > +};
> > +
> > +enum {
> > +	Opt_max,
> > +	Opt_err
> > +};
> > +
> > +static const match_table_t tokens = {
> > +	{ Opt_max, "max=%d" },
> > +	{ Opt_err, NULL     }
> > +};
> > +
> >  /**
> >   * binderfs_info - information about a binderfs mount
> >   * @ipc_ns:         The ipc namespace the binderfs mount belongs to.
> > @@ -55,13 +79,16 @@ static DEFINE_IDA(binderfs_minors);
> >   *                  created.
> >   * @root_gid:       gid that needs to be used when a new binder device is
> >   *                  created.
> > + * @mount_opts:     The mount options in use.
> > + * @device_count:   The current number of allocated binder devices.
> >   */
> >  struct binderfs_info {
> >  	struct ipc_namespace *ipc_ns;
> >  	struct dentry *control_dentry;
> >  	kuid_t root_uid;
> >  	kgid_t root_gid;
> > -
> > +	struct binderfs_mount_opts mount_opts;
> > +	atomic_t device_count;
> 
> Why atomic?
> 
> You should already have the lock held every time this is accessed,
> so no need to use an atomic value, just use an int.
> 
> >  	/* Reserve new minor number for the new device. */
> >  	mutex_lock(&binderfs_minors_mutex);
> > -	minor = ida_alloc_max(&binderfs_minors, BINDERFS_MAX_MINOR, GFP_KERNEL);
> > +	if (atomic_inc_return(&info->device_count) < info->mount_opts.max)
> 
> No need for atomic, see, your lock is held :)

Habit, to be honest.

Thanks, fixed version to follow in a bit.
Christian