[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ca1ade8ae0f20695c687580b2e1fbb75bf8a5d4b.camel@redhat.com>
Date: Mon, 16 May 2022 16:29:24 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: David Laight <David.Laight@...LAB.COM>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: UDP receive performance drop since 3.10
On Mon, 2022-05-16 at 12:58 +0000, David Laight wrote:
> I've noticed a doubling in the cpu cost of udp processing
> between a RHEL 3.10 kernel and a 5.18-rc6 one.
>
> This is (probably) all within ip_rcv().
>
> I'm testing very high rate UDP receive of RTP audio.
> (The target is 500000 udp/sec.)
> I've enable RPS so that ip_rcv() runs on different multiple
> cpus from the ethernet code.
> (RSS on the BCM5720 (tg3) doesn't seem to work very well.)
>
> On the 3.10 kernel the 'RPS' cpu show about 5% 'soft int' time.
> With 5.10 this has doubled to 10% for much the same test.
>
> The ftrace for a single packet shows a lot of extra code.
> With a RHEL 3.10 kernel the trace is quite short:
>
> /* netif_receive_skb: dev=em2 skbaddr=ffff99c3ee5ae000 len=200 */
> ip_rcv() {
> ip_rcv_finish() {
> 0.483 us udp_v4_early_demux();
> ip_route_input_noref() {
> ip_route_input_slow() {
> 2.036 us fib_table_lookup();
> fib_validate_source() {
> __fib_validate_source.isra.13() {
> 0.646 us fib_table_lookup();
> 1.589 us }
> 2.610 us }
> 6.820 us }
> 7.755 us }
> ip_local_deliver() {
> ip_local_deliver_finish() {
> 0.250 us raw_local_deliver();
> udp_rcv() {
> __udp4_lib_rcv() {
> __udp4_lib_lookup() {
> 0.063 us compute_score();
> 0.097 us compute_score();
> 1.496 us }
> udp_queue_rcv_skb() {
> sk_filter_trim_cap() {
> security_sock_rcv_skb() {
> 0.066 us cap_socket_sock_rcv_skb();
> 1.024 us }
> 1.836 us }
> 0.093 us ipv4_pktinfo_prepare();
> __udp_enqueue_schedule_skb() {
> 0.066 us _raw_spin_lock();
> sock_def_readable() {
> __wake_up_sync_key() {
> __wake_up_common_lock() {
> 0.194 us _raw_spin_lock_irqsave();
> __wake_up_common() {
> ep_poll_callback() {
> 0.184 us _raw_spin_lock_irqsave();
> 0.084 us _raw_spin_unlock_irqrestore();
> 2.009 us }
> 3.264 us }
> 0.087 us _raw_spin_unlock_irqrestore();
> 5.579 us }
> 6.311 us }
> 7.241 us }
> 8.833 us }
> + 12.948 us }
> + 16.365 us }
> + 17.280 us }
> + 19.900 us }
> + 20.673 us }
> + 31.519 us }
> + 32.534 us }
>
> Whereas 5.18 has a much longer trace:
> ip_rcv() {
> 0.668 us ip_rcv_core();
> ip_rcv_finish_core.constprop.0() {
> 1.155 us udp_v4_early_demux();
> ip_route_input_noref() {
> 0.306 us __rcu_read_lock();
> ip_route_input_rcu() {
> ip_route_input_slow() {
> make_kuid() {
> 0.441 us map_id_range_down();
> 1.231 us }
> 0.307 us __rcu_read_lock();
> fib_table_lookup() {
> 1.268 us fib_lookup_good_nhc();
> 2.736 us }
> 0.307 us __rcu_read_unlock();
> fib_validate_source() {
> __fib_validate_source() {
> make_kuid() {
> 0.304 us map_id_range_down();
> 0.931 us }
> 0.304 us __rcu_read_lock();
> fib_table_lookup() {
> 0.493 us fib_lookup_good_nhc();
> 1.405 us }
> 0.393 us __rcu_read_unlock();
> 0.390 us fib_info_nh_uses_dev();
> 5.457 us }
> 6.327 us }
> + 13.726 us }
> + 14.727 us }
> 0.407 us __rcu_read_unlock();
> + 16.673 us }
> + 19.389 us }
> ip_local_deliver() {
> ip_local_deliver_finish() {
> 0.376 us __rcu_read_lock();
> ip_protocol_deliver_rcu() {
> 0.434 us raw_local_deliver();
> udp_rcv() {
> __udp4_lib_rcv() {
> __udp4_lib_lookup() {
> 0.326 us udp4_lib_lookup2.isra.0();
> 0.928 us udp4_lib_lookup2.isra.0();
> 2.413 us }
> udp_unicast_rcv_skb() {
> udp_queue_rcv_skb() {
> udp_queue_rcv_one_skb() {
> sk_filter_trim_cap() {
> 0.440 us security_sock_rcv_skb();
> sk_filter_trim_cap.part.0() {
> 0.297 us __rcu_read_lock();
> 0.310 us __rcu_read_unlock();
> 1.531 us }
> 3.277 us }
> 0.334 us skb_pull_rcsum();
> 0.310 us ipv4_pktinfo_prepare();
> __udp_enqueue_schedule_skb() {
> _raw_spin_lock() {
> 0.334 us preempt_count_add();
> 0.938 us }
> _raw_spin_unlock() {
> 0.303 us preempt_count_sub();
> 0.908 us }
> sock_def_readable() {
> 0.307 us __rcu_read_lock();
> __wake_up_sync_key() {
> __wake_up_common_lock() {
> _raw_spin_lock_irqsave() {
> 0.326 us preempt_count_add();
> 0.951 us }
> __wake_up_common() {
> ep_poll_callback() {
> _raw_read_lock_irqsave() {
> 0.330 us preempt_count_add();
> 1.152 us }
> 0.614 us __rcu_read_lock();
> 0.323 us __rcu_read_unlock();
> _raw_read_unlock_irqrestore() {
> 0.380 us preempt_count_sub();
> 0.995 us }
> 5.410 us }
> 6.741 us }
> _raw_spin_unlock_irqrestore() {
> 0.317 us preempt_count_sub();
> 1.094 us }
> 9.994 us }
> + 10.806 us }
> 0.327 us __rcu_read_unlock();
> + 12.809 us }
> + 16.182 us }
> + 22.153 us }
> + 22.769 us }
> + 23.528 us }
> + 27.878 us }
> + 28.646 us }
> + 30.476 us }
> 0.324 us __rcu_read_unlock();
> + 32.398 us }
> + 33.168 us }
> + 54.976 us }
>
> Now I know the cost of ftrace is significant (and seems to be
> higher in 5.18) but there also seems to be a lot more code.
> As well as the extra rcu locks (which are probably mostly ftrace
> overhead, a few other things stick out:
>
> 1) The sock_net_uid(net, NULL) calls.
> These are make_kuid(net->user_ns, 0) - so pretty much constant.
> They seem to end up in a loop in map_id_range_down_base().
> All looks expensive in the default network namespace where
> 0 maps to 0.
>
> 2) Extra code in fib_lookup().
>
> 3) A lot more locking in ep_poll_callback().
>
> The 5.18 kernel also seems to have CONFIG_DEBUG_PREEMPT set.
> I can't find the Kconfig entry for it.
> It doesn't exist in the old .config at all.
> So I'm not sure why 'make oldconfig' picked it up.
>
> The other possibility is that the extra code is tick_nohz_idle_exit().
> The 3.10 trace is from a non-RPS config so I can't compare it.
>
> I'm going to disable CONFIG_DEBUG_PREEMPT to see how much
> difference it makes.
> Any idea if any other debug options will have got picked up?
Do you have CONFIG_PREEMPT_DYNAMIC in your config? That was not
available in 3.10 and apparently it pulls quite a bit of stuff, which
in the end should be quite measurable. The preempt count alone adds
~7us to the above sample.
Cheers,
Paolo
Powered by blists - more mailing lists