lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 31 Dec 2014 18:32:52 -0800
From:	Alexander Duyck <>
To:	David Miller <>,
Subject: Re: [net-next PATCH 00/17] fib_trie: Reduce time spent in fib_table_lookup
 by 35 to 75%

On 12/31/2014 03:46 PM, David Miller wrote:
> From: Alexander Duyck <>
> Date: Wed, 31 Dec 2014 10:55:23 -0800
>> These patches are meant to address several performance issues I have seen 
>> in the fib_trie implementation, and fib_table_lookup specifically.  With 
>> these changes in place I have seen a reduction of up to 35 to 75% for the 
>> total time spent in fib_table_lookup depending on the type of search being 
>> performed.
>  ...
>> Changes since RFC:
>>   Replaced this_cpu_ptr with correct call to this_cpu_inc in patch 1
>>   Changed test for leaf_info mismatch to (key ^ n->key) & li->mask_plen in patch 10
> As before, this looks awesome.


> All applied to net-next, thanks!
> This knocks about 35 cpu cycles off of a lookup that ends up using the
> default route on sparc64.  From about ~438 cycles to ~403.

Did that 438 value include both fib_table_lookup and check_leaf?  Just
curious as the overall gain seems smaller than what I have been seeing
on the x86 system I was testing with, but then again it could just be a
sparc64 thing.

I've started work on a second round of patches.  With any luck they
should be ready by the time the next net-next opens.  My hope is to cut
the look-up time by another 30 to 50%, though it will take some time as
I have to go though and drop the leaf_info structure, and look at
splitting the tnode in half to break the key/pos/bits and child pointer
dependency chain which will hopefully allow for a significant reduction
in memory read stalls.

I am also planning to take a look at addressing the memory waste that
occurs on nodes larger than 256 bytes due to the way kmalloc allocates
memory as powers of 2.  I'm thinking I might try encouraging the growth
of smaller nodes, and discouraging anything over 256 by implementing a
"truesize" type logic that can be used in the inflate/halve functions so
that the memory usage is more accurately reflected.

- Alex
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to
More majordomo info at

Powered by blists - more mailing lists