lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1obv9rkyp.fsf@fess.ebiederm.org>
Date:	Fri, 16 Dec 2011 00:15:26 -0800
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Lucian Adrian Grijincu <lucian.grijincu@...il.com>
Cc:	linux-kernel <linux-kernel@...r.kernel.org>,
	netdev@...r.kernel.org, Octavian Purdila <tavi@...pub.ro>,
	"David S . Miller" <davem@...emloft.net>,
	Alexey Dobriyan <adobriyan@...il.com>,
	Damien Millescamps <damien.millescamps@...nd.com>,
	Anca Emanuel <anca.emanuel@...il.com>,
	Damien Millescamps <damien.millescamps@...nd.com>,
	Benjamin LaHaise <bcrl@...ck.org>
Subject: Re: v6: faster tree-based sysctl implementation


Bringing this back on list because my reply is a design discussion and
that has every reason to be public.

Lucian Adrian Grijincu <lucian.grijincu@...il.com> writes:
> On Thu, Dec 8, 2011 at 2:19 AM, Eric W. Biederman <ebiederm@...ssion.com> wrote:
> > Which is my cue to say that I am behind the power curve in reviewing
> > this.  I will see if I can get to this before the end of the week.
> 
> 
> Did you had time to take a look over the code?

Not as much time as I would have liked.  However I have looked at it
some and I have brought back some of the state from my head so I
can remember where we last at in the conversation.

> I'm inclined to do a bit of redesign: currently files are kept in
> lists of files and sub-directories in a rbtree; I'd like to put
> everything in a rbtree and make things simpler.

I like the idea of putting everything in an rbtree. And
I like the idea of making things simpler.

> But I'd first like some sort of feedback from you. I wouldn't like to
> spend time working on something only to be told that the base idea of
> the reworking is unacceptable and everything needs to be rewritten.

As things stand I don't think your changes really solve the problem.
At least not if your change log can be believed.

> = New algorithm =
> 
> == Time complexity ==
> 
> - registration of N headers. Registration means adding new directories
>   at each level or incrementing an existing directory's refcount.
> 
>   - O(N * lnN) - if the paths to the headers are evenly distributed
> 
>   - O(N^2) - if most of the headers registered are children of the
>     same parent directory (searching the list of subdirs takes O(N)).
>     There are cases where this happens (e.g. registering sysctl
>     entries for net devices under /proc/sys/net/ipv4|6/conf/device).

This worries me because this is the case that we really care about.

>     A few later patches will add an optimisation, to fix locations
>     that might trigger the O(N^2) issue.

Is it your rb directory tree patch that addresses this?

....

My big concerns.
- I don't want to inflict churn on the users of sysctl unless we are
  certain it will gain us something.

- I really would like to see the core become simple enough that
  non-sysctl experts can look at the code and understand what is going
  on.

It doesn't look like your design has achieved sufficient simplicity yet
so that a non sysctl expert can look the code and understand what is
going on and why.

....

There are two very worthwhile things I see us doing with this patchset.
- Moving the users of sysctl to full directory semantics.
- Improving the data structures so that the data structures are
  efficient.

....

*** Moving the users of sysctl to full directory semantics. ***

As I have been watching the code sysctl has been moving from
a custom union filesystem (where each ctl_table was unioned
together) to normal filesystem semantics.   The last hold out
for union semantics are directories.

The truly weird case in this is "/proc/sys/fs/binfmt_misc/" an empty
directory for mounting the "binfmt_misc" filesystem on.

Which means that fundamentally we need a way to create, remove and
represent empty directories.  So while it is attractive to be lazy
and handle directories with some variant of lazy semantics creating
them when needed and deleting them when there is nothing left in the
directory I don't believe that will work.

Currently sysctl directories live under a relaxed version of the normal
fs rules:
- A directory must be created we can create children in it.
- A directory may not be deleted before we can remove children from it.

Since we in theory already have this strong property I don't see a
benefit of killing the sysctl .child table entry in the beginning of
the patchset.    It means a lot of churn and a lot more code to review
without giving us more useful properties.  Furthermore it retains the
clumsy ctl_path structure, which now that we don't need to add ctl_names
is a very clumsy structure to use.  A normal pathname would be better.

So initially I think at most we want to enhance sysctl_check to verify
that we retain normal directory create and remove discipline with the
current sysctl users.

....

*** Improving the internal sysctl data structures ***

With that the simplest way I can see of dealing with the sysctl data
structures is simply replace our current internal kernel data structures
with something that we want to use, with a compatibility layer from
the current users of sysctl to what we want to see.

My guess is that we would like to have a data structure like:
struct proc_sysctl_dirent {
	struct ctl_table *table; /* Point to the original ctl_table */
        struct rb_node	rb_node;
        struct rb_root  children;
        atomic_t count;
};

That is not perfect but I think building a data structure like that
without changing the userspace interface might actually reduce the
line count of sysctl, and drastically reduce the challenge of
understanding the code.

In a perfect world we might even want to use proc_dir_entry for the
new data structure but that is probably more complication than it is
worth right now.

My biggest concern is how do we handle the duplication of directories
that comes with the network namespace?   There are some weird things
in there.

....

Lucian is that anything like what you were thinking?

Does just improving the sysctl data structures and treating ctl_table
and it's kin as just a weird interface the callers use sound like it
might be a feasible path to you?

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ