[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1obv9rkyp.fsf@fess.ebiederm.org>
Date: Fri, 16 Dec 2011 00:15:26 -0800
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Lucian Adrian Grijincu <lucian.grijincu@...il.com>
Cc: linux-kernel <linux-kernel@...r.kernel.org>,
netdev@...r.kernel.org, Octavian Purdila <tavi@...pub.ro>,
"David S . Miller" <davem@...emloft.net>,
Alexey Dobriyan <adobriyan@...il.com>,
Damien Millescamps <damien.millescamps@...nd.com>,
Anca Emanuel <anca.emanuel@...il.com>,
Damien Millescamps <damien.millescamps@...nd.com>,
Benjamin LaHaise <bcrl@...ck.org>
Subject: Re: v6: faster tree-based sysctl implementation
Bringing this back on list because my reply is a design discussion and
that has every reason to be public.
Lucian Adrian Grijincu <lucian.grijincu@...il.com> writes:
> On Thu, Dec 8, 2011 at 2:19 AM, Eric W. Biederman <ebiederm@...ssion.com> wrote:
> > Which is my cue to say that I am behind the power curve in reviewing
> > this. I will see if I can get to this before the end of the week.
>
>
> Did you had time to take a look over the code?
Not as much time as I would have liked. However I have looked at it
some and I have brought back some of the state from my head so I
can remember where we last at in the conversation.
> I'm inclined to do a bit of redesign: currently files are kept in
> lists of files and sub-directories in a rbtree; I'd like to put
> everything in a rbtree and make things simpler.
I like the idea of putting everything in an rbtree. And
I like the idea of making things simpler.
> But I'd first like some sort of feedback from you. I wouldn't like to
> spend time working on something only to be told that the base idea of
> the reworking is unacceptable and everything needs to be rewritten.
As things stand I don't think your changes really solve the problem.
At least not if your change log can be believed.
> = New algorithm =
>
> == Time complexity ==
>
> - registration of N headers. Registration means adding new directories
> at each level or incrementing an existing directory's refcount.
>
> - O(N * lnN) - if the paths to the headers are evenly distributed
>
> - O(N^2) - if most of the headers registered are children of the
> same parent directory (searching the list of subdirs takes O(N)).
> There are cases where this happens (e.g. registering sysctl
> entries for net devices under /proc/sys/net/ipv4|6/conf/device).
This worries me because this is the case that we really care about.
> A few later patches will add an optimisation, to fix locations
> that might trigger the O(N^2) issue.
Is it your rb directory tree patch that addresses this?
....
My big concerns.
- I don't want to inflict churn on the users of sysctl unless we are
certain it will gain us something.
- I really would like to see the core become simple enough that
non-sysctl experts can look at the code and understand what is going
on.
It doesn't look like your design has achieved sufficient simplicity yet
so that a non sysctl expert can look the code and understand what is
going on and why.
....
There are two very worthwhile things I see us doing with this patchset.
- Moving the users of sysctl to full directory semantics.
- Improving the data structures so that the data structures are
efficient.
....
*** Moving the users of sysctl to full directory semantics. ***
As I have been watching the code sysctl has been moving from
a custom union filesystem (where each ctl_table was unioned
together) to normal filesystem semantics. The last hold out
for union semantics are directories.
The truly weird case in this is "/proc/sys/fs/binfmt_misc/" an empty
directory for mounting the "binfmt_misc" filesystem on.
Which means that fundamentally we need a way to create, remove and
represent empty directories. So while it is attractive to be lazy
and handle directories with some variant of lazy semantics creating
them when needed and deleting them when there is nothing left in the
directory I don't believe that will work.
Currently sysctl directories live under a relaxed version of the normal
fs rules:
- A directory must be created we can create children in it.
- A directory may not be deleted before we can remove children from it.
Since we in theory already have this strong property I don't see a
benefit of killing the sysctl .child table entry in the beginning of
the patchset. It means a lot of churn and a lot more code to review
without giving us more useful properties. Furthermore it retains the
clumsy ctl_path structure, which now that we don't need to add ctl_names
is a very clumsy structure to use. A normal pathname would be better.
So initially I think at most we want to enhance sysctl_check to verify
that we retain normal directory create and remove discipline with the
current sysctl users.
....
*** Improving the internal sysctl data structures ***
With that the simplest way I can see of dealing with the sysctl data
structures is simply replace our current internal kernel data structures
with something that we want to use, with a compatibility layer from
the current users of sysctl to what we want to see.
My guess is that we would like to have a data structure like:
struct proc_sysctl_dirent {
struct ctl_table *table; /* Point to the original ctl_table */
struct rb_node rb_node;
struct rb_root children;
atomic_t count;
};
That is not perfect but I think building a data structure like that
without changing the userspace interface might actually reduce the
line count of sysctl, and drastically reduce the challenge of
understanding the code.
In a perfect world we might even want to use proc_dir_entry for the
new data structure but that is probably more complication than it is
worth right now.
My biggest concern is how do we handle the duplication of directories
that comes with the network namespace? There are some weird things
in there.
....
Lucian is that anything like what you were thinking?
Does just improving the sysctl data structures and treating ctl_table
and it's kin as just a weird interface the callers use sound like it
might be a feasible path to you?
Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists