lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 01 May 2011 20:06:27 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Lucian Adrian Grijincu <lucian.grijincu@...il.com>
Cc:	linux-kernel@...r.kernel.org,
	Alexey Dobriyan <adobriyan@...il.com>,
	Octavian Purdila <tavi@...pub.ro>,
	"David S . Miller" <davem@...emloft.net>
Subject: Re: [PATCH 00/69] faster tree-based sysctl implementation

Lucian Adrian Grijincu <lucian.grijincu@...il.com> writes:

> On Mon, May 2, 2011 at 12:49 AM, Eric W. Biederman
> <ebiederm@...ssion.com> wrote:
>>>> Since we are touching most if not all of the sysctl registrations this
>>>> would also be a good time to pass a path string instead of the weird
>>>> ctl_path data structure.  We needed ctl_path when we had both binary
>>>> and proc paths to worry about but we no longer have that concern.
>>>
>>>
>>> I still find good use for it in the next patches (some optimisations).
>>> Getting rid of it makes some things more difficult:
>>> - I wouldn't like to parse strings into path components at registeration
>>
>> I don't expect '/' being more difficult to deal with than an array.  In
>> general I expect a single string to be more space efficient and easier
>> for human comprehension.
>
>
> We also use the string from ctl_path as a name for the sysctl
> directory. We would need to either:
> * strdup part of the string for each directory, remember to kfree
> * replace '/' with '\0' in the given string (meaning it can't be put
> in a read-only zone)

If we are only registering leaves, we can just deal with the tail of the
path and point just past the final /. There should be no need to
duplicate anything.

> Also I make use of the ctl_path to add some optimisations that deal
> with the case when there are very many known-to-be-uniquely-named
> sub-directories like for /proc/sys/net/ipv4/conf/DEVICE. IXIACOM which
> sponsored this work has usecases where they need to create 10^3..10^5
> virtual network devices and these optimisations really add up for that
> many interfaces.

I am convinced the places where we have network devices in the path are
indeed the pain points for scaling.

My gut feel is that we should use a balanced binary tree instead of a
doubly linked list for the directories.  The space cost of a tree
is just an extra color member instead of two pointers.

For 100000 entries your target of a binary tree should only be 17
entries tall.  Maybe twice that for if the tree is an rbtree.  17 or
even 33 should be a small enough value log(N) to keep the cost from
being painful.  And using a binary tree means fewer special cases
overall.


A binary tree is faster than your special case for lookup.  Which means
it solves the case of actually using the sysctl entries as well as the
case for creating them.

Furthermore to we also need to change sysfs because it also has
directories that will contain all 100000 of the network devices,
and I don't expect simply skipping the duplicate check is going to
fly in sysfs.

We could do something besides a data structure without a logN
insert/remove/lookup cost complexity.  But I think we need numbers
to show that won't scale.  So far all we have are numbers that show
a linked list doesn't scale.

> For details about the optimisation see patches:
>  61/69 http://thread.gmane.org/gmane.linux.kernel/1133667/focus=1133694 and
>  62/69 http://thread.gmane.org/gmane.linux.kernel/1133667/focus=1133711
>
>
> I will make another function that would take a string, parse it up,
> create a ctl_path array and register it, but I'd really like to keep
> ctl_path both in the implementation and as a means to register a
> table.

Using a string path certainly isn't critical at this point.  But so
far I don't see practical down sides.

>>> - users of the register_sysctl_paths would need to create strings with
>>> dynamic components (for example "net/conf/%s/" - where %s is a
>>> netdevice-name or "kernel/sched_domain/%s/%s" with cpu-name and
>>> domain-name).
>>
>> This is a good point.
>>
>> In the normal proc implementation this is solved by being able to
>> pass the equivalent of a ctl_table_header into the registration
>> function, which allows the use of relative paths in the registration
>> function.
>>
>> In the examples you have given relative paths should also work for
>> sysctl.
>
>
> Hmm, I don't think we're on the same channel here. I don't understand
> what you're trying to say
> - normal proc implementation?
> - the equivalent of a ctl_table_header?
> - relative paths?

I was looking at effectively other virtual filesystems that have
had similar problems and talking about other solutions used.

In particular I was referring to create_proc_entry.  It takes
a path and an optional parent directory.

> I was saying that if we are to *replace* the ctl_path based mechanism
> with a string denoting the path, then some other registrants will need
> to allocate memory for those strings because the paths they register
> are computed at runtime. Then I gave two distinct examples where this
> is done. In both of those cases, ctl_path saves us from allocating a
> string before allocation, only to chop it then back to pieces in the
> __register function.

And I was saying if that string was treated as a relative path.  We
could have:

struct ctl_table_header *register_sysctl_path(struct ctl_table_header *parent,
						const char *path,
						struct ctl_table *table);

The optional parent parameter would save us from the pain of having to
even place the sysctl entry in a ctl_path.  __register_sysctl_paths
already has a very similar interface.

Eric


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ