[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1hawe79ds.fsf@fess.ebiederm.org>
Date: Fri, 20 Apr 2012 07:42:07 -0700
From: ebiederm@...ssion.com (Eric W. Biederman)
To: "Serge E. Hallyn" <serge@...lyn.com>
Cc: David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
Gao feng <gaofeng@...fujitsu.com>, pablo@...filter.org,
Stephen Hemminger <shemminger@...tta.com>,
Pavel Emelyanov <xemul@...nvz.org>
Subject: Re: [PATCH net-next 04/19] net: Kill register_sysctl_rotable
"Serge E. Hallyn" <serge@...lyn.com> writes:
> Quoting Eric W. Biederman (ebiederm@...ssion.com):
>>
>> register_sysctl_rotable never caught on as an interesting way to
>> register sysctls. My take on the situation is that what we want are
>> sysctls that we can only see in the initial network namespace. What we
>> have implemented with register_sysctl_rotable are sysctls that we can
>> see in all of the network namespaces and can only change in the initial
>> network namespace.
>>
>> That is a very silly way to go. Just register the network sysctls
>> in the initial network namespace and we don't have any weird special
>> cases to deal with.
>>
>> The sysctls affected are:
>> /proc/sys/net/ipv4/ipfrag_secret_interval
>> /proc/sys/net/ipv4/ipfrag_max_dist
>> /proc/sys/net/ipv6/ip6frag_secret_interval
>> /proc/sys/net/ipv6/mld_max_msf
>>
>> I really don't expect anyone will miss them if they can't read them in a
>> child user namespace.
>
> If there was something userspace could do to work around certain values
> of these settings then I'd say keeping the readonly values is worthwhile,
> but AFAICS if a bad network context requires ipfrag_max_dist 0, there's
> nothing userspace can do about it...
>
>
> So from a container pov view at least, I'm happy with this. I'm far from
> qualified on the netns code itself, but taking a look in the unlikely case
> I can spot something :)
In this case I figured I would copy you and a few others who have been
talking about similar things recently, and also because you might care
that a whole bunch of networking sysctls that aren't per network
namespace will stop showing up in containers.
It is my hope that we use some of these same mechanisms that allow per
network namespace sysctls will be used to allow per pid and uts
namespace sysctls as well. It isn't as important as the files don't
change, but we can do it cleanly and one of these days I will get around
to making /proc/sys a symlink to /proc/<pid>/sys so that I can remove
the very unorthodox d_compare tricks that we use today.
The sysctl internal data structures are now a hair cleaner than what
sysfs uses for the same class of problem so I might someday go back and
fix sysfs to use the same idea of internal links, so I can get the sysfs
dirent size down some more, and be able to more cleanly isolate the
namespace handling from the rest of the sysfs code. It isn't bad today
but it is the source of most of the surprises and bugs when people tweak
the sysfs code.
Anyway I ramble. Now I need to get back to your review comments on my
user namespace patchset.
Thanks for taking a glance here,
Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists