linux-kernel - Re: [RFC PATCH 0/2] Loop device psuedo filesystem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrUZO42qk7GFcNOT8+aMRXvPLiAUOv6FH33Fx6o1XrNVxg@mail.gmail.com>
Date:	Tue, 27 May 2014 15:19:15 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Seth Forshee <seth.forshee@...onical.com>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	LXC development mailing-list 
	<lxc-devel@...ts.linuxcontainers.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	James Bottomley <James.Bottomley@...senpartnership.com>,
	Serge Hallyn <serge.hallyn@...ntu.com>,
	"Michael H. Warfield" <mhw@...tsend.com>,
	Marian Marinov <mm@...com>,
	Eric Biederman <ebiederm@...ssion.com>,
	Richard Weinberger <richard.weinberger@...il.com>,
	Michael J Coss <michael.coss@...atel-lucent.com>
Subject: Re: [RFC PATCH 0/2] Loop device psuedo filesystem

On Tue, May 27, 2014 at 2:58 PM, Seth Forshee
<seth.forshee@...onical.com> wrote:
> I'm posting these patches in response to the ongoing discussion of loop
> devices in containers at [1].
>
> The patches implement a psuedo filesystem for loop devices, which will
> allow use of loop devices in containters using standard utilities. Under
> normal use a loopfs mount will initially contain a single device node
> for loop-control which can be used to request and release loop devices.
> Any devices allocated via this node will automatically appear in that
> loopfs mount (and in devtmpfs) but not in any other loopfs mounts.
> CAP_SYS_ADMIN in the userns of the process which performed the mount is
> allowed to perform privileged loop ioctls on these devices.
>
> Alternately loopfs can be mounted with the hostmount option, intended
> for mounting /dev/loop in the host. This is the default mount for any
> devices not created via loop-control in a loopfs mount (e.g. devices
> created during driver init, devices created via /dev/loop-control, etc).
> This is only available to system-wide CAP_SYS_ADMIN.
>
> I still have some testing to do on these patches, but they work at
> minimum for simple use cases. It's possible to use an unmodified losetup
> if it's new enough to know about loop-control, with a couple of caveats:
>
>  * /dev/loop-control must be symlinked to /dev/loop/loop-control
>  * In some cases losetup attempts to use /dev/loopN when the device node
>    is at /dev/loop/N. For example, 'losetup -f disk.img' fails.
>
> Device nodes for loop partitions are not created in loopfs. These
> devices are created by the generic block layer, and the loop driver has
> no way of knowing when they are created, so some kind of hook into the
> driver will be needed to support this.

This is entertaining and a bit terrifying :)

ISTM that what you've done is to create a way for per-userns devices
to live in a special filesystem and for userns containers to
instantiate those devices by offloading all the hard work to the
kernel.

What if we generalized this?

For example, we could add a concept of ephemeral devices.  An
ephemeral device is a device that can be referenced by an inode with a
guarantee that the inode will *never* accidentally point to a
different device [1].  Then we add a concept of the userns that owns a
struct device.

To make this safe, we'll need to make sure that old host udev will not
see non-init-userns devices, ever.  This is easy enough to do, but
doing it elegantly might take some design work.

To make this useful, we'll need a way for things inside user
namespaces to create the device nodes.  I can imagine at least three
ways to make this work.

a) Allow mknod on a tmpfs created by a particular userns to succeed if
the targetting struct device is owned by that userns or a child and if
the caller is ns_capable(CAP_MKNOD).
b) Create a new filesystem that has some special ioctl or whatever to do it.
c) Have real per-user-ns devtmpfs.

Now, to get loop working in a userns, we need a way for the userns (or
the host!) to create a new loop-control device owned by that userns
and we need to tweak the loop driver to make the created loop devices
be owned by the userns.

(Note: I'm deliberately ignoring the fact that just doing this for
loop seems to be almost entirely useless right now: you still can't
mount the things.)

Thoughts?


[1]  For example, there could be a special set of device numbers that
are not reused until reboot.  Ephemeral device nodes point to these
devices by number.  Alternatively, the inodes could keep references to
the struct device.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/