lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ca1ade8ae0f20695c687580b2e1fbb75bf8a5d4b.camel@redhat.com>
Date:   Mon, 16 May 2022 16:29:24 +0200
From:   Paolo Abeni <pabeni@...hat.com>
To:     David Laight <David.Laight@...LAB.COM>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: UDP receive performance drop since 3.10

On Mon, 2022-05-16 at 12:58 +0000, David Laight wrote:
> I've noticed a doubling in the cpu cost of udp processing
> between a RHEL 3.10 kernel and a 5.18-rc6 one.
> 
> This is (probably) all within ip_rcv().
> 
> I'm testing very high rate UDP receive of RTP audio.
> (The target is 500000 udp/sec.)
> I've enable RPS so that ip_rcv() runs on different multiple
> cpus from the ethernet code.
> (RSS on the BCM5720 (tg3) doesn't seem to work very well.)
> 
> On the 3.10 kernel the 'RPS' cpu show about 5% 'soft int' time.
> With 5.10 this has doubled to 10% for much the same test.
> 
> The ftrace for a single packet shows a lot of extra code.
> With a RHEL 3.10 kernel the trace is quite short:
> 
>                  /* netif_receive_skb: dev=em2 skbaddr=ffff99c3ee5ae000 len=200 */
>                  ip_rcv() {
>                    ip_rcv_finish() {
>     0.483 us         udp_v4_early_demux();
>                      ip_route_input_noref() {
>                        ip_route_input_slow() {
>     2.036 us             fib_table_lookup();
>                          fib_validate_source() {
>                            __fib_validate_source.isra.13() {
>     0.646 us                 fib_table_lookup();
>     1.589 us               }
>     2.610 us             }
>     6.820 us           }
>     7.755 us         }
>                      ip_local_deliver() {
>                        ip_local_deliver_finish() {
>     0.250 us             raw_local_deliver();
>                          udp_rcv() {
>                            __udp4_lib_rcv() {
>                              __udp4_lib_lookup() {
>     0.063 us                   compute_score();
>     0.097 us                   compute_score();
>     1.496 us                 }
>                              udp_queue_rcv_skb() {
>                                sk_filter_trim_cap() {
>                                  security_sock_rcv_skb() {
>     0.066 us                       cap_socket_sock_rcv_skb();
>     1.024 us                     }
>     1.836 us                   }
>     0.093 us                   ipv4_pktinfo_prepare();
>                                __udp_enqueue_schedule_skb() {
>     0.066 us                     _raw_spin_lock();
>                                  sock_def_readable() {
>                                    __wake_up_sync_key() {
>                                      __wake_up_common_lock() {
>     0.194 us                           _raw_spin_lock_irqsave();
>                                        __wake_up_common() {
>                                          ep_poll_callback() {
>     0.184 us                               _raw_spin_lock_irqsave();
>     0.084 us                               _raw_spin_unlock_irqrestore();
>     2.009 us                             }
>     3.264 us                           }
>     0.087 us                           _raw_spin_unlock_irqrestore();
>     5.579 us                         }
>     6.311 us                       }
>     7.241 us                     }
>     8.833 us                   }
>   + 12.948 us                }
>   + 16.365 us              }
>   + 17.280 us            }
>   + 19.900 us          }
>   + 20.673 us        }
>   + 31.519 us      }
>   + 32.534 us    }
> 
> Whereas 5.18 has a much longer trace:
>                 ip_rcv() {
>    0.668 us       ip_rcv_core();
>                   ip_rcv_finish_core.constprop.0() {
>    1.155 us         udp_v4_early_demux();
>                     ip_route_input_noref() {
>    0.306 us           __rcu_read_lock();
>                       ip_route_input_rcu() {
>                         ip_route_input_slow() {
>                           make_kuid() {
>    0.441 us                 map_id_range_down();
>    1.231 us               }
>    0.307 us               __rcu_read_lock();
>                           fib_table_lookup() {
>    1.268 us                 fib_lookup_good_nhc();
>    2.736 us               }
>    0.307 us               __rcu_read_unlock();
>                           fib_validate_source() {
>                             __fib_validate_source() {
>                               make_kuid() {
>    0.304 us                     map_id_range_down();
>    0.931 us                   }
>    0.304 us                   __rcu_read_lock();
>                               fib_table_lookup() {
>    0.493 us                     fib_lookup_good_nhc();
>    1.405 us                   }
>    0.393 us                   __rcu_read_unlock();
>    0.390 us                   fib_info_nh_uses_dev();
>    5.457 us                 }
>    6.327 us               }
>  + 13.726 us            }
>  + 14.727 us          }
>    0.407 us           __rcu_read_unlock();
>  + 16.673 us        }
>  + 19.389 us      }
>                   ip_local_deliver() {
>                     ip_local_deliver_finish() {
>    0.376 us           __rcu_read_lock();
>                       ip_protocol_deliver_rcu() {
>    0.434 us             raw_local_deliver();
>                         udp_rcv() {
>                           __udp4_lib_rcv() {
>                             __udp4_lib_lookup() {
>    0.326 us                   udp4_lib_lookup2.isra.0();
>    0.928 us                   udp4_lib_lookup2.isra.0();
>    2.413 us                 }
>                             udp_unicast_rcv_skb() {
>                               udp_queue_rcv_skb() {
>                                 udp_queue_rcv_one_skb() {
>                                   sk_filter_trim_cap() {
>    0.440 us                         security_sock_rcv_skb();
>                                     sk_filter_trim_cap.part.0() {
>    0.297 us                           __rcu_read_lock();
>    0.310 us                           __rcu_read_unlock();
>    1.531 us                         }
>    3.277 us                       }
>    0.334 us                       skb_pull_rcsum();
>    0.310 us                       ipv4_pktinfo_prepare();
>                                   __udp_enqueue_schedule_skb() {
>                                     _raw_spin_lock() {
>    0.334 us                           preempt_count_add();
>    0.938 us                         }
>                                     _raw_spin_unlock() {
>    0.303 us                           preempt_count_sub();
>    0.908 us                         }
>                                     sock_def_readable() {
>    0.307 us                           __rcu_read_lock();
>                                       __wake_up_sync_key() {
>                                         __wake_up_common_lock() {
>                                           _raw_spin_lock_irqsave() {
>    0.326 us                                 preempt_count_add();
>    0.951 us                               }
>                                           __wake_up_common() {
>                                             ep_poll_callback() {
>                                               _raw_read_lock_irqsave() {
>    0.330 us                                     preempt_count_add();
>    1.152 us                                   }
>    0.614 us                                   __rcu_read_lock();
>    0.323 us                                   __rcu_read_unlock();
>                                               _raw_read_unlock_irqrestore() {
>    0.380 us                                     preempt_count_sub();
>    0.995 us                                   }
>    5.410 us                                 }
>    6.741 us                               }
>                                           _raw_spin_unlock_irqrestore() {
>    0.317 us                                 preempt_count_sub();
>    1.094 us                               }
>    9.994 us                             }
>  + 10.806 us                          }
>    0.327 us                           __rcu_read_unlock();
>  + 12.809 us                        }
>  + 16.182 us                      }
>  + 22.153 us                    }
>  + 22.769 us                  }
>  + 23.528 us                }
>  + 27.878 us              }
>  + 28.646 us            }
>  + 30.476 us          }
>    0.324 us           __rcu_read_unlock();
>  + 32.398 us        }
>  + 33.168 us      }
>  + 54.976 us    }
> 
> Now I know the cost of ftrace is significant (and seems to be
> higher in 5.18) but there also seems to be a lot more code.
> As well as the extra rcu locks (which are probably mostly ftrace
> overhead, a few other things stick out:
> 
> 1) The sock_net_uid(net, NULL) calls.
>    These are make_kuid(net->user_ns, 0) - so pretty much constant.
>    They seem to end up in a loop in map_id_range_down_base().
>    All looks expensive in the default network namespace where
>    0 maps to 0.
> 
> 2) Extra code in fib_lookup().
> 
> 3) A lot more locking in ep_poll_callback().
> 
> The 5.18 kernel also seems to have CONFIG_DEBUG_PREEMPT set.
> I can't find the Kconfig entry for it.
> It doesn't exist in the old .config at all.
> So I'm not sure why 'make oldconfig' picked it up.
> 
> The other possibility is that the extra code is tick_nohz_idle_exit().
> The 3.10 trace is from a non-RPS config so I can't compare it.
> 
> I'm going to disable CONFIG_DEBUG_PREEMPT to see how much
> difference it makes.
> Any idea if any other debug options will have got picked up?

Do you have CONFIG_PREEMPT_DYNAMIC in your config? That was not
available in 3.10 and apparently it pulls quite a bit of stuff, which
in the end should be quite measurable. The preempt count alone adds
~7us to the above sample.

Cheers,

Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ