lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA93jw67hfoGPJyTWOz5ghuam725+18PviOK_hYkQ+RZjjqkXw@mail.gmail.com>
Date:	Wed, 11 Mar 2015 10:13:46 -0700
From:	Dave Taht <dave.taht@...il.com>
To:	Alexander Duyck <alexander.h.duyck@...hat.com>
Cc:	Alexander Duyck <alexander.duyck@...il.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Stephen Hemminger <stephen@...workplumber.org>,
	Jiří Pírko <jiri@...nulli.us>,
	Scott Feldman <sfeldma@...il.com>,
	"davem@...emloft.net" <davem@...emloft.net>,
	Sabrina Dubroca <sd@...asysnail.net>
Subject: Re: [net-next PATCH] ipv4: FIB Local/MAIN table collapse

On Wed, Mar 11, 2015 at 9:23 AM, Alexander Duyck
<alexander.h.duyck@...hat.com> wrote:
> On 03/11/2015 09:03 AM, Dave Taht wrote:
>>
>> On Wed, Mar 11, 2015 at 7:58 AM, Alexander Duyck
>> <alexander.duyck@...il.com> wrote:
>>>
>>> On 03/06/2015 01:47 PM, Alexander Duyck wrote:
>>>>
>>>> This patch is meant to collapse local and main into one by converting
>>>> tb_data from an array to a pointer.  Doing this allows us to point the
>>>> local table into the main while maintaining the same variables in the
>>>> table.
>>>>
>>>> As such the tb_data was converted from an array to a pointer, and a new
>>>> array called data is added in order to still provide an object for
>>>> tb_data
>>>> to point to.
>>>>
>>>> In order to track the origin of the fib aliases a tb_id value was added
>>>> in
>>>> a hole that existed on 64b systems.  Using this we can also reverse the
>>>> merge in the event that custom FIB rules are enabled.
>>>>
>>>> With this patch I am seeing an improvement of 20ns to 30ns for routing
>>>> lookups as long as custom rules are not enabled, with custom rules
>>>> enabled
>>>> we fall back to split tables and the original behavior.
>>>>
>>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@...hat.com>
>>>> ---
>>>>
>>>> Changes since the RFC:
>>>>    Added tb_id value so I could split main and local for custom rules
>>>>    Added functionality to split tables if custom rules were enabled
>>>>    Added table replacement and unmerge functions
>>>>
>>>> I have done some testing on this to verify performance gains and that I
>>>> can
>>>> split the tables correctly when I enable custom rules, but this patch is
>>>> what I would consider to be high risk since I am certain there are
>>>> things I
>>>> have not considered.
>>>>
>>>> If this gets pulled into someone's switchdev tree instead of into
>>>> net-next I
>>>> would be perfectly fine with that as I am sure this can use some
>>>> additional
>>>> testing.
>>>
>>> Has anyone out there had a chance to review this patch?  I had suggested
>>> holding off on applying it to net-next for additional testing, but I
>>> haven't found anything, and the only related issue is the one issue
>>> Sabrina reported for the RTNL locking which was already in net-next
>>> anyway.
>>
>>
>> My environment consists largely of 32 bit routing platforms (openwrt)
>> running the current homenet routing protocol (babel), using, in
>> particular, the babeld source specific routing extensions ( see
>> http://arxiv.org/pdf/1403.0445v3.pdf  ) which do use multiple routing
>> tables, and other uncommon things like p2p routes and odd netmasks
>> like /27 and /30.
>
>
> Can you define "use multiple routing tables"?  If you are adding other
> routing tables than local/main and adding custom rules to determine the
> order then this latest patch should automatically be switched off.  This
> patch only applies to the case where custom rules are not enabled, otherwise
> it splits the tables and you are right back to two tables one for local and
> one for main.

OK, I am comforted. somewhat.

> This patch is based on work Dave Miller originally sent me to merge the
> tables for local/main in order to improve performance in the standard end
> user case.  As far as Quagga and such I am not that familiar with how they
> configure the tables so I cannot say if there is a performance gain
> available or not.

Not so much concerned about performance as correctness. Well, am
concerned about performance too, on (massive or weird) route
updates... (and I am mostly commenting on the entire patchset not this
patch! Sorry for hijacking the thread)

On my specific bugaboo, atomic route changes, the relevant bit of
netlink userspace code that does a delete + add in babel is here:

https://github.com/boutier/babeld/blob/source-specific/kernel_netlink.c#L971

And in quagga,  somewhere in here:
http://git.savannah.gnu.org/gitweb/?p=quagga.git;a=blob;f=zebra/rt_netlink.c;h=2350070c60cbd6801b0319e5a8ee0f82901a1b54;hb=HEAD

(I am not terribly familiar with other daemons, and only barely, quagga)

So far as I know these daemons do not do atomic updates/changes due to
an ancient, barely remembered bug in Linux ECMP, which for all I know
was fixed long ago. add + delete seemed to have had bugs, also (even
though it is saner to put in a new route superseding an old one, than
lose all the packets in between a delete + add)

The detailed netlink knowledge that is in "ip route" today to do a
change correctly has not made it up to any routing daemon folk I know
of - (I am too dumb to understand it myself!), and some communication
about what you are up to on the relevant mailing lists (like
quagga-dev, bird-users, babel-users,  ) might prove a fruitful
exchange of knowledge and ideas (outside of my specific bugaboo on
atomic updates!) up and down the stack, and reveal other known
bottlenecks and problems on their fronts.

https://lists.quagga.net/mailman/listinfo/quagga-dev
http://lists.alioth.debian.org/mailman/listinfo/babel-users
http://bird.network.cz/mailman/listinfo/bird-users

and a few others such as olsr, xorp, etc.

If you could provide an introduction to your work to these mailing
lists, that might gain more testers and feedback.

>> But: I simply can't keep up with you and this entire patchset changes
>> so dramatically how routing works that you make me nervous. I would
>> like to see quagga (and babeld) improved to use atomic updates in
>> particular (They do a delete + add for no good reason), and for
>> protocols like ospf, bgp, etc, to be actively tested on real traffic
>> loads on live data against this entire patchset.
>
>
> Yeah, looking back on it I probably shouldn't have sat on my patches for an
> entire release cycle before pushing them but as it was I was trying to
> stretch things out over two releases.  As far as Quagga and such that is
> several levels above this work as I have no idea about the internals of the
> user space routing daemons.

As noted, I think valuable feedback can be gained by an idea or
prisoner exchange with those folk.

>> Your 64 bit x86 benchmarks are very exciting and I do look forward to
>> one day soon attempting to evaluate and benchmark these changes on
>> teeny 32 bit platforms both in the general case and in this
>> environment, and am mostly just hoping that others that are doing
>> higher end real world routing with big tables (such as BGP) are paying
>> more attention than I can.
>
>
> Merging the two tables is one of the requests that came out of netconf this
> year.  I hadn't anticipated the amount of improvement that I have seen,
> though it makes sense since we are essentially reducing the memory
> utilization significantly as we reduce the backtrace and main table look-up
> to something like one tnode since in the case of most /24 subnets you only
> have 3 or 4 addresses that are tracked in the trie and the route from main
> will usually just end up overlapping with the .0 broadcast address.

Yes, they look pretty good! very shiny!

>
>> What test procedures are you using at present (and what is everyone else
>> using)?
>
>
> What I have been doing is mostly performance testing for different portions
> of the trie, performance of shortest path to find a match, performance of
> longest path to find a match, load/unload/reload interface, insert/delete
> route or address, dumping route, fib_trie, and fib_triestat, and with this
> most recent patch adding custom rules to force the tables to split.

If you can publish those tests somewhere perhaps folk like myself could look
them over and add additional ones.


> - Alex

I am unsure as as to how to even look for multiple routing table
usage. My most complex test setup (which includes hip, the tinc vpn,
(I've heard strongswan is often tossed into table 222, but I am not
using that) and a couple source specific routes from babel), I just
did a walk of everything with:

for i in `seq 0 64000`; do echo table $i :; ip route show table $i;
done > tables.txt

it took a long time, but I stuck those results up at:

http://snapon.lab.bufferbloat.net/~d/routing_tables/tables.txt

that was on kernel 3.18. From what I think I now understand your
folding of these tables together will fall back to multiple tables in
my cases, in this patch, but it was the entire patchset that was
concerning me for the routing protocol users.

-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ