lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c8ff8557-9e2c-4316-8642-fd7ab1553ffb@gmail.com>
Date: Wed, 15 May 2024 15:52:22 +0200
From: Leone Fernando <leone4fernando@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com,
 dsahern@...nel.org, willemb@...gle.com, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next v2 0/4] net: route: improve route hinting

> On Tue, May 7, 2024 at 2:43 PM Leone Fernando <leone4fernando@...il.com> wrote:
>>
>> In 2017, Paolo Abeni introduced the hinting mechanism [1] to the routing
>> sub-system. The hinting optimization improves performance by reusing
>> previously found dsts instead of looking them up for each skb.
>>
>> This patch series introduces a generalized version of the hinting mechanism that
>> can "remember" a larger number of dsts. This reduces the number of dst
>> lookups for frequently encountered daddrs.
>>
>> Before diving into the code and the benchmarking results, it's important
>> to address the deletion of the old route cache [2] and why
>> this solution is different. The original cache was complicated,
>> vulnerable to DOS attacks and had unstable performance.
>>
>> The new input dst_cache is much simpler thanks to its lazy approach,
>> improving performance without the overhead of the removed cache
>> implementation. Instead of using timers and GC, the deletion of invalid
>> entries is performed lazily during their lookups.
>> The dsts are stored in a simple, lightweight, static hash table. This
>> keeps the lookup times fast yet stable, preventing DOS upon cache misses.
>> The new input dst_cache implementation is built over the existing
>> dst_cache code which supplies a fast lockless percpu behavior.
>>
>> The measurement setup is comprised of 2 machines with mlx5 100Gbit NIC.
>> I sent small UDP packets with 5000 daddrs (10x of cache size) from one
>> machine to the other while also varying the saddr and the tos. I set
>> an iptables rule to drop the packets after routing. the receiving
>> machine's CPU (i9) was saturated.
>>
>> Thanks a lot to David Ahern for all the help and guidance!
>>
>> I measured the rx PPS using ifpps and the per-queue PPS using ethtool -S.
>> These are the results:
> 
> How device dismantles are taken into account ?
> 
> I am currently tracking a bug in dst_cache, triggering sometimes when
> running pmtu.sh selftest.
> 
> Apparently, dst_cache_per_cpu_dst_set() can cache dst that have no
> dst->rt_uncached
> linkage.

The dst_cache_input that was introduced in this series caches input
routes that are owned by the fib tree.
These routes have a rt_uncached linkage. So I think this bug will not
replicate to dst_cache_input.

> There is no cleanup (at least in vxlan) to make sure cached dst are
> either freed or
> their dst->dev changed.
> 
> 
> TEST: ipv6: cleanup of cached exceptions - nexthop objects          [ OK ]
> [ 1001.344490] vxlan: __vxlan_fdb_free calling
> dst_cache_destroy(ffff8f12422cbb90)
> [ 1001.345253] dst_cache_destroy dst_cache=ffff8f12422cbb90
> ->cache=0000417580008d30
> [ 1001.378615] vxlan: __vxlan_fdb_free calling
> dst_cache_destroy(ffff8f12471e31d0)
> [ 1001.379260] dst_cache_destroy dst_cache=ffff8f12471e31d0
> ->cache=0000417580008608
> [ 1011.349730] unregister_netdevice: waiting for veth_A-R1 to become
> free. Usage count = 7
> [ 1011.350562] ref_tracker: veth_A-R1@...000009392ed3b has 1/6 users at
> [ 1011.350562]      dst_alloc+0x76/0x160
> [ 1011.350562]      ip6_dst_alloc+0x25/0x80
> [ 1011.350562]      ip6_pol_route+0x2a8/0x450
> [ 1011.350562]      ip6_pol_route_output+0x1f/0x30
> [ 1011.350562]      fib6_rule_lookup+0x163/0x270
> [ 1011.350562]      ip6_route_output_flags+0xda/0x190
> [ 1011.350562]      ip6_dst_lookup_tail.constprop.0+0x1d0/0x260
> [ 1011.350562]      ip6_dst_lookup_flow+0x47/0xa0
> [ 1011.350562]      udp_tunnel6_dst_lookup+0x158/0x210
> [ 1011.350562]      vxlan_xmit_one+0x4c6/0x1550 [vxlan]
> [ 1011.350562]      vxlan_xmit+0x535/0x1500 [vxlan]
> [ 1011.350562]      dev_hard_start_xmit+0x7b/0x1e0
> [ 1011.350562]      __dev_queue_xmit+0x20c/0xe40
> [ 1011.350562]      arp_xmit+0x1d/0x50
> [ 1011.350562]      arp_send_dst+0x7f/0xa0
> [ 1011.350562]      arp_solicit+0xf6/0x2f0
> [ 1011.350562]
> [ 1011.350562] ref_tracker: veth_A-R1@...000009392ed3b has 3/6 users at
> [ 1011.350562]      dst_alloc+0x76/0x160
> [ 1011.350562]      ip6_dst_alloc+0x25/0x80
> [ 1011.350562]      ip6_pol_route+0x2a8/0x450
> [ 1011.350562]      ip6_pol_route_output+0x1f/0x30
> [ 1011.350562]      fib6_rule_lookup+0x163/0x270
> [ 1011.350562]      ip6_route_output_flags+0xda/0x190
> [ 1011.350562]      ip6_dst_lookup_tail.constprop.0+0x1d0/0x260
> [ 1011.350562]      ip6_dst_lookup_flow+0x47/0xa0
> [ 1011.350562]      udp_tunnel6_dst_lookup+0x158/0x210
> [ 1011.350562]      vxlan_xmit_one+0x4c6/0x1550 [vxlan]
> [ 1011.350562]      vxlan_xmit+0x535/0x1500 [vxlan]
> [ 1011.350562]      dev_hard_start_xmit+0x7b/0x1e0
> [ 1011.350562]      __dev_queue_xmit+0x20c/0xe40
> [ 1011.350562]      ip6_finish_output2+0x2ea/0x6e0
> [ 1011.350562]      ip6_finish_output+0x143/0x320
> [ 1011.350562]      ip6_output+0x74/0x140
> [ 1011.350562]
> [ 1011.350562] ref_tracker: veth_A-R1@...000009392ed3b has 1/6 users at
> [ 1011.350562]      netdev_get_by_index+0xc0/0xe0
> [ 1011.350562]      fib6_nh_init+0x1a9/0xa90
> [ 1011.350562]      rtm_new_nexthop+0x6fa/0x1580
> [ 1011.350562]      rtnetlink_rcv_msg+0x155/0x3e0
> [ 1011.350562]      netlink_rcv_skb+0x61/0x110
> [ 1011.350562]      rtnetlink_rcv+0x19/0x20
> [ 1011.350562]      netlink_unicast+0x23f/0x380
> [ 1011.350562]      netlink_sendmsg+0x1fc/0x430
> [ 1011.350562]      ____sys_sendmsg+0x2ef/0x320
> [ 1011.350562]      ___sys_sendmsg+0x86/0xd0
> [ 1011.350562]      __sys_sendmsg+0x67/0xc0
> [ 1011.350562]      __x64_sys_sendmsg+0x21/0x30
> [ 1011.350562]      x64_sys_call+0x252/0x2030
> [ 1011.350562]      do_syscall_64+0x6c/0x190
> [ 1011.350562]      entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [ 1011.350562]
> [ 1011.350562] ref_tracker: veth_A-R1@...000009392ed3b has 1/6 users at
> [ 1011.350562]      ipv6_add_dev+0x136/0x530
> [ 1011.350562]      addrconf_notify+0x19d/0x770
> [ 1011.350562]      notifier_call_chain+0x65/0xd0
> [ 1011.350562]      raw_notifier_call_chain+0x1a/0x20
> [ 1011.350562]      call_netdevice_notifiers_info+0x54/0x90
> [ 1011.350562]      register_netdevice+0x61e/0x790
> [ 1011.350562]      veth_newlink+0x230/0x440
> [ 1011.350562]      __rtnl_newlink+0x7d2/0xaa0
> [ 1011.350562]      rtnl_newlink+0x4c/0x70
> [ 1011.350562]      rtnetlink_rcv_msg+0x155/0x3e0
> [ 1011.350562]      netlink_rcv_skb+0x61/0x110
> [ 1011.350562]      rtnetlink_rcv+0x19/0x20
> [ 1011.350562]      netlink_unicast+0x23f/0x380
> [ 1011.350562]      netlink_sendmsg+0x1fc/0x430
> [ 1011.350562]      ____sys_sendmsg+0x2ef/0x320
> [ 1011.350562]      ___sys_sendmsg+0x86/0xd0
> [ 1011.350562]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ