linux-kernel - Re: Controlling devices and device namespaces

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sun, 16 Sep 2012 11:15:38 -0500
From:	Serge Hallyn <serge@...lyn.com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
CC:	Alan Cox <alan@...rguk.ukuu.org.uk>,
	Aristeu Rozanski <aris@...vo.org>,
	Neil Horman <nhorman@...driver.com>,
	"Serge E. Hallyn" <serue@...ibm.com>,
	containers@...ts.linux-foundation.org,
	linux-kernel@...r.kernel.org, Michal Hocko <mhocko@...e.cz>,
	Thomas Graf <tgraf@...g.ch>, Paul Mackerras <paulus@...ba.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
	Johannes Weiner <hannes@...xchg.org>,
	Tejun Heo <tj@...nel.org>, cgroups@...r.kernel.org,
	Paul Turner <pjt@...gle.com>, Ingo Molnar <mingo@...hat.com>
Subject: Re: Controlling devices and device namespaces

On 09/16/2012 09:23 AM, Eric W. Biederman wrote:
> Serge Hallyn <serge@...lyn.com> writes:
>
>> On 09/16/2012 07:17 AM, Eric W. Biederman wrote:
>>> ebiederm@...ssion.com (Eric W. Biederman) writes:
>>>
>>>> Alan Cox <alan@...rguk.ukuu.org.uk> writes:
>>>>
>>>>>> One piece of the puzzle is that we should be able to allow unprivileged
>>>>>> device node creation and access for any device on any filesystem
>>>>>> for which it unprivileged access is safe.
>>>>>
>>>>> Which devices are "safe" is policy for all interesting and useful cases,
>>>>> as are file permissions, security tags, chroot considerations and the
>>>>> like.
>>>>>
>>>>> It's a complete non starter.
>>>
>>> Come to think of it mknod is completely unnecessary.
>>>
>>> Without mknod.  Without being able to mount filesystems containing
>>> device nodes.
>>
>> Hm?  That sounds like it will really upset init/udev/upgrades in the
>> container.
>
> udev does not create device nodes.  For an older udev the worst
> I can see it doing is having mknod failing with EEXIST because
> the device node already exists.
>
> We should be able to make it look to init like a ramdisk mounted the
> filesystems.
>
> Why should upgrades care?  Package installation shouldn't be calling
> mknod.
>
> At least with a recent modern distro I can't imagine this to be an
> issue.  I expect we could have a kernel build option that removed the
> mknod system call and a modern distro wouldn't notice.
>
>> Are you saying all filesystems containing device nodes will need to be
>> mounted in advance by the process setting up the container?
>
> As a general rule.
>
> I think in practice there is wiggle room for special cases
> like mounting a fresh devpts.  devpts at least in always create a new
> instance on mount mode seems safe, as it can not give you access to
> any existing devices.
>
> You can also do a lot of what would normally be done with mknod
> with bind mounts to the original devices location.
>
>>> The mount namespace is sufficient to prevent all of the
>>> cases that the device control group prevents (open and mknod on device
>>> nodes).
>>>
>>> So I honestly think the device control group is superflous, and it is
>>> probably wise to deprecate it and move to a model where it does not
>>> exist.
>>>
>>> Eric
>>>
>>
>> That's what I said a few emails ago :)  The device cgroup was meant as
>> a short-term workaround for lack of user (and device) namespaces.
>
> I am saying something stronger.  The device cgroup doesn't seem to have
> a practical function now.

"Now" is wrong.  The user namespace is not complete and not yet usable 
for a full system container.  We still need the device control group.

I'd like us to have a sprint (either a day at UDS in person, or a few 
days with a virtual sprint) with the focus of getting a full system 
container working the way you envision it, as cleanly as possible.  I 
can take two or three consecutave days sometime in the next 2-3 weeks, 
we can sit on irc and share a few instances on which to experiment?

>  That for the general case we don't need any
> kernel support.  That all of this should be a matter of some user space
> glue code, and just the tiniest bit of sorting out how hotplug events are
> sent.
>
> The only thing I can think we would need a device namespace for is
> for migration.
 >
> For migration with direct access to real hardware devices we must treat
> it as hardware hotunplug.  There is nothing else we can do.
>
> If there is any other case where we need to preserve device numbers
> etc we have the example of devpts.
>
> So at this point I really don't think we need a device namespace or a
> device control group.  (Just emulate devtmpfs, sysfs and uevents).
>
> Eric
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/