linux-kernel - Re: [PATCH] binder: implement binderfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181205214203.cwcvsqe6ndu4e3vv@brauner.io>
Date:   Wed, 5 Dec 2018 22:42:06 +0100
From:   Christian Brauner <christian@...uner.io>
To:     Greg KH <gregkh@...uxfoundation.org>
Cc:     tkjos@...roid.com, maco@...roid.com, linux-kernel@...r.kernel.org,
        devel@...verdev.osuosl.org, kilobyte@...band.pl,
        darrick.wong@...cle.com, chouryzhou@...cent.com,
        david@...morbit.com, arve@...roid.com, joel@...lfernandes.org,
        Todd Kjos <tkjos@...gle.com>
Subject: Re: [PATCH] binder: implement binderfs

On Wed, Dec 05, 2018 at 09:01:45PM +0100, Greg KH wrote:
> On Tue, Dec 04, 2018 at 02:12:39PM +0100, Christian Brauner wrote:
> > As discussed at Linux Plumbers Conference 2018 in Vancouver [1] this is the
> > implementation of binderfs. If you want to skip reading and just see how it
> > works, please go to [2].
> 
> First off, thanks for doing this so quickly.  I think the overall idea
> and implementation is great, I just have some minor issues with the user
> api:

Thanks! :)

> 
> > /* binder-control */
> > Each new binderfs instance comes with a binder-control device. No other
> > devices will be present at first. The binder-control device can be used to
> > dynamically allocate binder devices. All requests operate on the binderfs
> > mount the binder-control device resides in:
> > - BINDER_CTL_ADD
> >   Allocate a new binder device.
> > Assuming a new instance of binderfs has been mounted at /dev/binderfs via
> > mount -t binderfs binderfs /dev/binderfs. Then a request to create a new
> > binder device can be made via:
> > 
> > struct binderfs_device device = {0};
> > int fd = open("/dev/binderfs/binder-control", O_RDWR);
> > ioctl(fd, BINDER_CTL_ADD, &device);
> > 
> > The struct binderfs_device will be used to return the major and minor
> > number, as well as the index used as the new name for the device.
> > Binderfs devices can simply be removed via unlink().
> 
> I think you should provide a name in the BINDER_CTL_ADD command.  That
> way you can easily emulate the existing binder queues, and it saves you
> a create/rename sequence that you will be forced to do otherwise.  Why
> not do it just in a single command?

Sounds reasonable. How do you feel about capping the name length at 255
bytes aka the standard Linux file name length (e.g. xfs, ext4 etc.)?

#define BINDERFS_NAME_MAX 255

struct binderfs_device {
        char name[BINDERFS_NAME_MAX + 1];
	__u32 major;
	__u32 minor;
}

> 
> That way also you don't need to care about the major/minor number at
> all.  Userspace should never need to worry about that, use a name,
> that's the best thing.  Also, it allows you to drop the use of the idr,
> making the kernel code simpler overall.
> 
> > /* Implementation details */
> > - When binderfs is registered as a new filesystem it will dynamically
> >   allocate a new major number. The allocated major number will be returned
> >   in struct binderfs_device when a new binder device is allocated.
> 
> Why does userspace care about major/minor numbers at all?  You should

Userspace cares for the sake of the devices cgroup which operates on
device numnbers to restrict access to devices. Since binderfs doesn't
have a static major number returning that information is helpful.
One example is a managining process sharing a binderfs mounts between
multiple ipc+mount namespaces via bind-mounts to avoid having separate
binderfs mounts. If the managing process only wants to grant access to binder0
and no other device for a given ipc+mnt namespace combination then it
can use the devices cgroup but for that it needs to know the major and
minor number.

> just be able to deal with the binder "names", that's all that userspace
> uses normally as you are not calling mknod() yourself.
> 
> >   Minor numbers that have been given out are tracked in a global idr struct
> >   that is capped at BINDERFS_MAX_MINOR. The minor number tracker is
> >   protected by a global mutex. This is the only point of contention between
> >   binderfs mounts.
> 
> I doubt this will be any real contention given that setting up / tearing
> down binder mounts is going to be rare, right?  Well, hopefully, who
> knows with some container systems...

Yeah, it's very unlikely. Device allocation is a rare event that is
basically done once for most interesting use-cases. If one cares so much
about performance bind-mounts + device cgroup restrictions are possible
solution to this problem.

> 
> > - The naming scheme for binder devices is binder%d. Each binderfs mount
> >   starts numbering of new binder devices at 0 up to n. The indeces used in
> >   constructing the name are tracked in a struct idr that is per-binderfs
> >   super block.
> 
> Again, let userspace pick the name, as you will have to rename it anyway
> to get userspace to work properly with it.
> 
> I'll stop repeating myself now :)

Thanks for the feedback! :)

Christian