[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081008141818.GA23453@us.ibm.com>
Date: Wed, 8 Oct 2008 09:18:18 -0500
From: "Serge E. Hallyn" <serue@...ibm.com>
To: Greg KH <greg@...ah.com>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>,
Al Viro <viro@...IV.linux.org.uk>,
Benjamin Thery <benjamin.thery@...l.net>,
linux-kernel@...r.kernel.org, Al Viro <viro@....linux.org.uk>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Tejun Heo <tj@...nel.org>
Subject: Re: sysfs: tagged directories not merged completely yet
Quoting Greg KH (greg@...ah.com):
> On Tue, Oct 07, 2008 at 07:12:03PM -0500, Serge E. Hallyn wrote:
> > Quoting Greg KH (greg@...ah.com):
> > > On Tue, Oct 07, 2008 at 05:54:24PM -0500, Serge E. Hallyn wrote:
> > > > Quoting Greg KH (greg@...ah.com):
> > > > > On Tue, Oct 07, 2008 at 01:27:17AM -0700, Eric W. Biederman wrote:
> > > > > > Unless someone will give an example of how having multiple superblocks
> > > > > > sharing inodes is a problem in practice for sysfs and call it good
> > > > > > for 2.6.28. Certainly it shouldn't be an issue if the network namespace
> > > > > > code is compiled out. And it should greatly improve testing of the
> > > > > > network namespace to at least have access to sysfs.
> > > > >
> > > > > But if the network namespace code is in? THen we have problems, right?
> > > > > And that's the whole point here.
> > > > >
> > > > > The fact that you are trying to limit userspace view of in-kernel data
> > > > > structures, based on that specific user, is, in my opinion, crazy.
> > > > >
> > > > > Why not just keep all users from seeing sysfs, and then have a user
> > > > > daemon doing something on top of FUSE if you really want to see this
> > > > > kind of stuff.
> > > >
> > > > Well the blocker is really that when you create a new network namespace,
> > > > it wants to create a new loopback interface, but
> > > > /sys/devices/virtual/net/lo already exists. That's the same issue with
> > > > user namespace when the fair scheduler is enabled, which tries to
> > > > re-create /sys/kernel/uids/0.
> > > >
> > > > Otherwise yeah at least for my own uses, containers wouldn't need to
> > > > look at /sys at all.
> > > >
> > > > Heck you wouldn't even need FUSE, just mount -t tmpfs /sys/class/net
> > > > and manually link the right devices from /sys/devices/virtual/net.
> > >
> > > Great, that sounds like a solution.
> > >
> > > So tell me again why we need these huge sysfs reworks? :)
> >
> > Because :
> >
> > > > Well the blocker is really that when you create a new network namespace,
>
> No, wait. Why would you want to do such a thing in the first place?
So I can have db2, a few apaches, etc, each in different containers with
their network devices and their own ipfilter rules.
So I can take one of those apache containers and migrate it along with
its ip address to another machine.
So I can do the openvz/vserver thing and run a 'virtual machine' (or 50)
without the overhead of another full OS. Now like Eric said our goal
isn't to fool the distro installed in the container and not let it know
it's in a container. But the same tools should be able to administer
inside a container as outside a container. That was the reason for the
filtering of /proc to show the right pids inside a container, for
instance.
So given that, what I describe below should probably suffice. Though I
wonder whether things depending on uevents will get messed up in a
container. It should be fine, I assume, so long as the devicename (lo)
is sent along withthe filename (lo.childXYZ).
> > > > it wants to create a new loopback interface, but
> > > > /sys/devices/virtual/net/lo already exists. That's the same issue with
> >
> > So at least we'd have to do something to allow creation of 'duplicate'
> > devices in different namespaces. It might be fine if we just ended up
> > with /sys/devices/virtual/net/lo, if created in a child net namespace,
> > be named /sys/devices/virtual/net/lo.childXYZ. Then userspace can
> > mount -t tmpfs none /sys/class/net and ln -s
> > /sys/devices/virtual/net/lo.childXYZ /sys/class/net/lo.
>
> ick.
>
> I agree with Tejun here, what's this whole network namespace stuff, what
> problems is it trying to solve and what are its goals?
>
> thanks,
>
> greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists