[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CH3PR11MB7345B6FD9B9512BB8913440DFC499@CH3PR11MB7345.namprd11.prod.outlook.com>
Date: Thu, 1 Jun 2023 02:46:30 +0000
From: "Zhang, Cathy" <cathy.zhang@...el.com>
To: "Sang, Oliver" <oliver.sang@...el.com>, Shakeel Butt <shakeelb@...gle.com>
CC: "Yin, Fengwei" <fengwei.yin@...el.com>, "Tang, Feng"
<feng.tang@...el.com>, Eric Dumazet <edumazet@...gle.com>, Linux MM
<linux-mm@...ck.org>, Cgroups <cgroups@...r.kernel.org>, Paolo Abeni
<pabeni@...hat.com>, "davem@...emloft.net" <davem@...emloft.net>,
"kuba@...nel.org" <kuba@...nel.org>, "Brandeburg, Jesse"
<jesse.brandeburg@...el.com>, "Srinivas, Suresh" <suresh.srinivas@...el.com>,
"Chen, Tim C" <tim.c.chen@...el.com>, "You, Lizhen" <lizhen.you@...el.com>,
"eric.dumazet@...il.com" <eric.dumazet@...il.com>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>, "Li, Philip" <philip.li@...el.com>, "Liu, Yujie"
<yujie.liu@...el.com>
Subject: RE: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper
size
Thanks for sharing the data, Oliver!
> -----Original Message-----
> From: Sang, Oliver <oliver.sang@...el.com>
> Sent: Wednesday, May 31, 2023 4:46 PM
> To: Shakeel Butt <shakeelb@...gle.com>
> Cc: Zhang, Cathy <cathy.zhang@...el.com>; Yin, Fengwei
> <fengwei.yin@...el.com>; Tang, Feng <feng.tang@...el.com>; Eric Dumazet
> <edumazet@...gle.com>; Linux MM <linux-mm@...ck.org>; Cgroups
> <cgroups@...r.kernel.org>; Paolo Abeni <pabeni@...hat.com>;
> davem@...emloft.net; kuba@...nel.org; Brandeburg, Jesse
> <jesse.brandeburg@...el.com>; Srinivas, Suresh
> <suresh.srinivas@...el.com>; Chen, Tim C <tim.c.chen@...el.com>; You,
> Lizhen <lizhen.you@...el.com>; eric.dumazet@...il.com;
> netdev@...r.kernel.org; Li, Philip <philip.li@...el.com>; Liu, Yujie
> <yujie.liu@...el.com>; Sang, Oliver <oliver.sang@...el.com>
> Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper
> size
>
> hi, Shakeel,
>
> On Wed, May 17, 2023 at 04:24:47PM +0000, Shakeel Butt wrote:
> > On Tue, May 16, 2023 at 01:46:55PM +0800, Oliver Sang wrote:
> > > hi Shakeel,
> > >
> > > On Mon, May 15, 2023 at 12:50:31PM -0700, Shakeel Butt wrote:
> > > > +Feng, Yin and Oliver
> > > >
> > > > >
> > > > > > Thanks a lot Cathy for testing. Do you see any performance
> improvement for
> > > > > > the memcached benchmark with the patch?
> > > > >
> > > > > Yep, absolutely :- ) RPS (with/without patch) = +1.74
> > > >
> > > > Thanks a lot Cathy.
> > > >
> > > > Feng/Yin/Oliver, can you please test the patch at [1] with other
> > > > workloads used by the test robot? Basically I wanted to know if it has
> > > > any positive or negative impact on other perf benchmarks.
> > >
> > > is it possible for you to resend patch with Signed-off-by?
> > > without it, test robot will regard the patch as informal, then it cannot feed
> > > into auto test process.
> > > and could you tell us the base of this patch? it will help us apply it
> > > correctly.
> > >
> > > on the other hand, due to resource restraint, we normally cannot support
> > > this type of on-demand test upon a single patch, patch set, or a branch.
> > > instead, we try to merge them into so-called hourly-kernels, then
> distribute
> > > tests and auto-bisects to various platforms.
> > > after we applying your patch and merging it to hourly-kernels sccussfully,
> > > if it really causes some performance changes, the test robot could spot
> out
> > > this patch as 'fbc' and we will send report to you. this could happen
> within
> > > several weeks after applying.
> > > but due to the complexity of whole process (also limited resourse, such
> like
> > > we cannot run all tests on all platforms), we cannot guanrantee capture
> all
> > > possible performance impacts of this patch. and it's hard for us to
> provide
> > > a big picture like what's the general performance impact of this patch.
> > > this maybe is not exactly what you want. is it ok for you?
> > >
> > >
> >
> > Yes, that is fine and thanks for the help. The patch is below:
>
> we applied below patch upon v6.4-rc2, so far, we didn't spot out
> performance
> impacts of it to other tests.
>
> but we found -7.6% regression of netperf.Throughput_Mbps
>
> testcase: netperf
> test machine: 128 threads 4 sockets Intel(R) Xeon(R) Gold 6338 CPU @
> 2.00GHz (Ice Lake) with 256G memory
> parameters:
>
> ip: ipv4
> runtime: 300s
> nr_threads: 50%
> cluster: cs-localhost
> send_size: 10K
> test: TCP_SENDFILE
> cpufreq_governor: performance
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> sudo bin/lkp install job.yaml # job file is attached in this email
> bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp
> run
> sudo bin/lkp run generated-yaml-file
>
> # if come across any failure that blocks the test,
> # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
> ================================================================
> =========================
> cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/s
> end_size/tbox_group/test/testcase:
> cs-localhost/gcc-11/performance/ipv4/x86_64-rhel-8.3/50%/debian-11.1-
> x86_64-20220510.cgz/300s/10K/lkp-icl-2sp2/TCP_SENDFILE/netperf
>
> commit:
> v6.4-rc2
> 5e32037c50 ("memcg: skip stock refill in irq context")
>
> v6.4-rc2 5e32037c5065d2058264d41cd4c
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 23165 -7.6% 21414 netperf.Throughput_Mbps
> 1482569 -7.6% 1370534 netperf.Throughput_total_Mbps
>
> detail data as below [1]
>
>
> at the same time, we tested Cathy's patch upon same test, found
> a 29.4% improvement of netperf.Throughput_Mbps
> just FYI
>
>
> ================================================================
> =========================
> cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/s
> end_size/tbox_group/test/testcase:
> cs-localhost/gcc-11/performance/ipv4/x86_64-rhel-8.3/50%/debian-11.1-
> x86_64-20220510.cgz/300s/10K/lkp-icl-2sp2/TCP_SENDFILE/netperf
>
> commit:
> ed23734c23 ("Merge tag 'net-6.4-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
> 05d72a8bed ("net: Keep sk->sk_forward_alloc as a proper size")
>
> ed23734c23d2fc1e 05d72a8bedfacfc46f300ab38e0
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 23218 +29.4% 30043 netperf.Throughput_Mbps
> 1485996 +29.4% 1922763 netperf.Throughput_total_Mbps
>
> detail data as below [2]
>
>
> [1]
>
> ================================================================
> =========================
> cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/s
> end_size/tbox_group/test/testcase:
> cs-localhost/gcc-11/performance/ipv4/x86_64-rhel-8.3/50%/debian-11.1-
> x86_64-20220510.cgz/300s/10K/lkp-icl-2sp2/TCP_SENDFILE/netperf
>
> commit:
> v6.4-rc2
> 5e32037c50 ("memcg: skip stock refill in irq context")
>
> v6.4-rc2 5e32037c5065d2058264d41cd4c
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 5106608 -1.3% 5040930 vmstat.system.cs
> 246222 ± 4% -21.9% 192291 ± 8% sched_debug.cpu.avg_idle.avg
> 269582 ± 6% -24.9% 202436 ± 13% sched_debug.cpu.avg_idle.stddev
> 2556 +0.9% 2579 turbostat.Bzy_MHz
> 15.01 +0.8 15.76 turbostat.C1%
> 30.63 +4.2% 31.90 ± 2% turbostat.RAMWatt
> 23165 -7.6% 21414 netperf.Throughput_Mbps
> 1482569 -7.6% 1370534 netperf.Throughput_total_Mbps
> 670.10 -11.8% 591.36 netperf.time.user_time
> 5.429e+09 -7.6% 5.019e+09 netperf.workload
> 6.93 +6.4% 7.38 perf-stat.i.MPKI
> 4.404e+10 -5.4% 4.167e+10 perf-stat.i.branch-instructions
> 0.88 +0.0 0.90 perf-stat.i.branch-miss-rate%
> 3.823e+08 -2.7% 3.721e+08 perf-stat.i.branch-misses
> 6.54 ± 2% +0.4 6.90 ± 3% perf-stat.i.cache-miss-rate%
> 1.05e+08 ± 3% +6.3% 1.117e+08 ± 3% perf-stat.i.cache-misses
> 1.29 +5.8% 1.37 perf-stat.i.cpi
> 27150 ± 6% +14.9% 31203 ± 5% perf-stat.i.cpu-migrations
> 2897 ± 3% -5.7% 2733 ± 3% perf-stat.i.cycles-between-cache-
> misses
> 0.01 ± 12% +0.0 0.01 perf-stat.i.dTLB-load-miss-rate%
> 6712601 ± 12% +7.8% 7237514 perf-stat.i.dTLB-load-misses
> 6.874e+10 -5.4% 6.505e+10 perf-stat.i.dTLB-loads
> 0.00 ± 5% +0.0 0.00 ± 5% perf-stat.i.dTLB-store-miss-rate%
> 940096 ± 5% +15.3% 1083508 ± 5% perf-stat.i.dTLB-store-misses
> 3.753e+10 -5.5% 3.547e+10 perf-stat.i.dTLB-stores
> 2.332e+11 -5.4% 2.207e+11 perf-stat.i.instructions
> 0.77 -5.4% 0.73 perf-stat.i.ipc
> 1186 -5.3% 1123 perf-stat.i.metric.M/sec
> 706578 ± 8% +33.2% 941322 ± 5% perf-stat.i.node-loads
> 2812685 ± 8% +15.6% 3250382 ± 10% perf-stat.i.node-stores
> 6.93 +6.4% 7.37 perf-stat.overall.MPKI
> 0.87 +0.0 0.89 perf-stat.overall.branch-miss-rate%
> 6.50 ± 2% +0.4 6.86 ± 3% perf-stat.overall.cache-miss-rate%
> 1.29 +5.8% 1.37 perf-stat.overall.cpi
> 2878 ± 3% -5.8% 2711 ± 3% perf-stat.overall.cycles-between-
> cache-misses
> 0.01 ± 12% +0.0 0.01 perf-stat.overall.dTLB-load-miss-rate%
> 0.00 ± 5% +0.0 0.00 ± 5% perf-stat.overall.dTLB-store-miss-rate%
> 0.77 -5.5% 0.73 perf-stat.overall.ipc
> 12903 +2.4% 13208 perf-stat.overall.path-length
> 4.39e+10 -5.4% 4.154e+10 perf-stat.ps.branch-instructions
> 3.81e+08 -2.7% 3.708e+08 perf-stat.ps.branch-misses
> 1.047e+08 ± 3% +6.3% 1.113e+08 ± 3% perf-stat.ps.cache-misses
> 27021 ± 6% +14.9% 31054 ± 5% perf-stat.ps.cpu-migrations
> 6672234 ± 12% +7.8% 7195318 perf-stat.ps.dTLB-load-misses
> 6.852e+10 -5.4% 6.484e+10 perf-stat.ps.dTLB-loads
> 935167 ± 5% +15.3% 1077856 ± 5% perf-stat.ps.dTLB-store-misses
> 3.741e+10 -5.5% 3.536e+10 perf-stat.ps.dTLB-stores
> 2.324e+11 -5.4% 2.199e+11 perf-stat.ps.instructions
> 704145 ± 8% +33.2% 938240 ± 5% perf-stat.ps.node-loads
> 2802795 ± 8% +15.5% 3238090 ± 10% perf-stat.ps.node-stores
> 7.006e+13 -5.4% 6.629e+13 perf-stat.total.instructions
> 11.29 -0.9 10.42 perf-profile.calltrace.cycles-
> pp.skb_copy_datagram_iter.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.
> sock_recvmsg
> 11.22 -0.9 10.35 perf-profile.calltrace.cycles-
> pp.__skb_datagram_iter.skb_copy_datagram_iter.tcp_recvmsg_locked.tcp_r
> ecvmsg.inet_recvmsg
> 29.43 -0.7 28.74 perf-profile.calltrace.cycles-
> pp.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvf
> rom
> 7.04 -0.5 6.51 perf-profile.calltrace.cycles-
> pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp_recvms
> g_locked.tcp_recvmsg
> 7.36 -0.5 6.86 perf-profile.calltrace.cycles-
> pp.generic_file_splice_read.splice_direct_to_actor.do_splice_direct.do_sendf
> ile.__x64_sys_sendfile64
> 6.56 -0.5 6.06 perf-profile.calltrace.cycles-
> pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp
> _recvmsg_locked
> 6.45 -0.4 6.03 perf-profile.calltrace.cycles-
> pp.filemap_read.generic_file_splice_read.splice_direct_to_actor.do_splice_d
> irect.do_sendfile
> 2.95 -0.3 2.61 ± 7% perf-profile.calltrace.cycles-
> pp.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy
> _datagram_iter.tcp_recvmsg_locked
> 2.58 ± 2% -0.3 2.29 ± 7% perf-profile.calltrace.cycles-
> pp.check_heap_object.__check_object_size.simple_copy_to_iter.__skb_data
> gram_iter.skb_copy_datagram_iter
> 3.22 -0.3 2.93 perf-profile.calltrace.cycles-
> pp.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp_r
> ecvmsg_locked.tcp_recvmsg
> 10.00 -0.3 9.75 perf-profile.calltrace.cycles-
> pp.__dev_queue_xmit.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_s
> kb.tcp_recvmsg_locked
> 10.15 -0.2 9.91 perf-profile.calltrace.cycles-
> pp.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb.tcp_recvmsg_loc
> ked.tcp_recvmsg
> 2.89 -0.2 2.66 perf-profile.calltrace.cycles-
> pp.filemap_get_read_batch.filemap_get_pages.filemap_read.generic_file_sp
> lice_read.splice_direct_to_actor
> 3.12 -0.2 2.90 perf-profile.calltrace.cycles-
> pp.filemap_get_pages.filemap_read.generic_file_splice_read.splice_direct_t
> o_actor.do_splice_direct
> 10.47 -0.2 10.25 perf-profile.calltrace.cycles-
> pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_recvmsg_locked.tcp_recvmsg.i
> net_recvmsg
> 2.66 -0.2 2.44 perf-profile.calltrace.cycles-
> pp.tcp_write_xmit.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_s
> endpage
> 2.42 -0.2 2.22 perf-profile.calltrace.cycles-
> pp.__tcp_transmit_skb.tcp_write_xmit.do_tcp_sendpages.tcp_sendpage.inet
> _sendpage
> 2.48 -0.2 2.29 ± 7% perf-profile.calltrace.cycles-
> pp.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_
> rcv.ip_protocol_deliver_rcu
> 2.46 -0.2 2.27 ± 7% perf-profile.calltrace.cycles-
> pp.tcp_write_xmit.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_
> do_rcv.tcp_v4_rcv
> 2.23 -0.2 2.05 perf-profile.calltrace.cycles-
> pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.do_tcp_sendpages.
> tcp_sendpage
> 2.14 -0.2 1.96 perf-profile.calltrace.cycles-
> pp.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.d
> o_tcp_sendpages
> 1.27 -0.1 1.17 perf-profile.calltrace.cycles-
> pp.tcp_send_mss.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_se
> ndpage
> 1.17 -0.1 1.09 perf-profile.calltrace.cycles-
> pp.__tcp_push_pending_frames.do_tcp_sendpages.tcp_sendpage.inet_send
> page.kernel_sendpage
> 1.10 -0.1 1.02 perf-profile.calltrace.cycles-
> pp.tcp_write_xmit.__tcp_push_pending_frames.do_tcp_sendpages.tcp_send
> page.inet_sendpage
> 0.91 -0.1 0.84 perf-profile.calltrace.cycles-
> pp.tcp_current_mss.tcp_send_mss.do_tcp_sendpages.tcp_sendpage.inet_se
> ndpage
> 1.29 -0.1 1.23 perf-profile.calltrace.cycles-
> pp.copy_page_to_iter_pipe.filemap_read.generic_file_splice_read.splice_dir
> ect_to_actor.do_splice_direct
> 0.77 -0.0 0.73 perf-profile.calltrace.cycles-
> pp.tcp_stream_alloc_skb.tcp_build_frag.do_tcp_sendpages.tcp_sendpage.in
> et_sendpage
> 0.81 -0.0 0.77 perf-profile.calltrace.cycles-
> pp.activate_task.ttwu_do_activate.sched_ttwu_pending.__sysvec_call_functi
> on_single.sysvec_call_function_single
> 0.78 -0.0 0.74 perf-profile.calltrace.cycles-
> pp.enqueue_task_fair.activate_task.ttwu_do_activate.sched_ttwu_pending._
> _sysvec_call_function_single
> 0.55 -0.0 0.53 perf-profile.calltrace.cycles-
> pp.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate.sched
> _ttwu_pending
> 0.93 +0.0 0.96 perf-profile.calltrace.cycles-
> pp.try_to_wake_up.__wake_up_common.__wake_up_common_lock.sock_d
> ef_readable.tcp_data_queue
> 1.05 +0.0 1.08 perf-profile.calltrace.cycles-
> pp.__wake_up_common.__wake_up_common_lock.sock_def_readable.tcp_
> data_queue.tcp_rcv_established
> 1.10 +0.0 1.13 perf-profile.calltrace.cycles-
> pp.__wake_up_common_lock.sock_def_readable.tcp_data_queue.tcp_rcv_e
> stablished.tcp_v4_do_rcv
> 1.20 +0.0 1.24 perf-profile.calltrace.cycles-
> pp.sock_def_readable.tcp_data_queue.tcp_rcv_established.tcp_v4_do_rcv.t
> cp_v4_rcv
> 15.73 +0.2 15.97 perf-profile.calltrace.cycles-
> pp.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_fini
> sh_output2
> 15.13 +0.3 15.38 perf-profile.calltrace.cycles-
> pp.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queu
> e_xmit
> 13.50 +0.3 13.82 perf-profile.calltrace.cycles-
> pp.__napi_poll.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip
> 13.45 +0.3 13.77 perf-profile.calltrace.cycles-
> pp.process_backlog.__napi_poll.net_rx_action.__do_softirq.do_softirq
> 13.06 +0.3 13.38 perf-profile.calltrace.cycles-
> pp.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_actio
> n.__do_softirq
> 2.23 ± 2% +0.4 2.60 ± 3% perf-profile.calltrace.cycles-
> pp.release_sock.tcp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom
> 12.08 +0.4 12.46 perf-profile.calltrace.cycles-
> pp.ip_local_deliver_finish.__netif_receive_skb_one_core.process_backlog.__
> napi_poll.net_rx_action
> 12.02 +0.4 12.41 perf-profile.calltrace.cycles-
> pp.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_
> core.process_backlog.__napi_poll
> 1.12 +0.4 1.51 ± 3% perf-profile.calltrace.cycles-
> pp.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.__relea
> se_sock
> 1.31 +0.4 1.71 ± 3% perf-profile.calltrace.cycles-
> pp.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock
> 11.73 +0.4 12.14 perf-profile.calltrace.cycles-
> pp.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receiv
> e_skb_one_core.process_backlog
> 1.34 ± 13% +0.4 1.76 ± 6% perf-profile.calltrace.cycles-
> pp.__sk_mem_reduce_allocated.tcp_recvmsg_locked.tcp_recvmsg.inet_recv
> msg.sock_recvmsg
> 1.73 ± 14% +0.5 2.19 ± 7% perf-profile.calltrace.cycles-
> pp.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock.tcp_recv
> msg
> 1.38 ± 14% +0.5 1.85 ± 7% perf-profile.calltrace.cycles-
> pp.tcp_data_queue.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.relea
> se_sock
> 5.62 +0.5 6.11 perf-profile.calltrace.cycles-
> pp.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock.tcp_sen
> dpage
> 5.61 +0.5 6.10 perf-profile.calltrace.cycles-
> pp.tcp_v4_do_rcv.__release_sock.release_sock.tcp_sendpage.inet_sendpage
> 8.89 +0.5 9.40 perf-profile.calltrace.cycles-
> pp.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish
> .__netif_receive_skb_one_core
> 8.74 +0.5 9.26 perf-profile.calltrace.cycles-
> pp.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip
> _local_deliver_finish
> 2.86 +0.6 3.46 ± 3% perf-profile.calltrace.cycles-
> pp.tcp_data_queue.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protoc
> ol_deliver_rcu
> 0.58 ± 3% +0.6 1.19 ± 9% perf-profile.calltrace.cycles-
> pp.mem_cgroup_charge_skmem.tcp_data_queue.tcp_rcv_established.tcp_v
> 4_do_rcv.tcp_v4_rcv
> 1.29 ± 15% +0.6 1.94 ± 8% perf-profile.calltrace.cycles-
> pp.__sk_mem_reduce_allocated.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_estab
> lished.tcp_v4_do_rcv
> 7.18 ± 2% +0.7 7.87 ± 2% perf-profile.calltrace.cycles-
> pp.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames.tcp_rcv
> _established.tcp_v4_do_rcv
> 6.06 +0.7 6.76 ± 2% perf-profile.calltrace.cycles-
> pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pendin
> g_frames.tcp_rcv_established
> 0.35 ± 70% +0.7 1.07 ± 32% perf-profile.calltrace.cycles-
> pp.refill_stock.__sk_mem_reduce_allocated.tcp_clean_rtx_queue.tcp_ack.tc
> p_rcv_established
> 6.02 +0.7 6.75 ± 2% perf-profile.calltrace.cycles-
> pp.tcp_write_xmit.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_
> do_rcv.__release_sock
> 6.05 +0.7 6.78 ± 2% perf-profile.calltrace.cycles-
> pp.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_do_rcv.__releas
> e_sock.release_sock
> 0.39 ± 70% +0.8 1.20 ± 22% perf-profile.calltrace.cycles-
> pp.page_counter_try_charge.try_charge_memcg.mem_cgroup_charge_skme
> m.tcp_data_queue.tcp_rcv_established
> 16.80 +0.8 17.62 perf-profile.calltrace.cycles-
> pp.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_sendpage.sock_s
> endpage
> 46.63 +0.9 47.53 perf-profile.calltrace.cycles-
> pp.do_splice_direct.do_sendfile.__x64_sys_sendfile64.do_syscall_64.entry_S
> YSCALL_64_after_hwframe
> 0.53 ± 4% +0.9 1.46 ± 9% perf-profile.calltrace.cycles-
> pp.page_counter_try_charge.try_charge_memcg.mem_cgroup_charge_skme
> m.__sk_mem_raise_allocated.__sk_mem_schedule
> 46.04 +1.0 47.00 perf-profile.calltrace.cycles-
> pp.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_sys_sendfile64
> .do_syscall_64
> 0.00 +1.0 0.98 ± 33% perf-profile.calltrace.cycles-
> pp.page_counter_uncharge.drain_stock.refill_stock.__sk_mem_reduce_alloc
> ated.tcp_clean_rtx_queue
> 0.00 +1.0 0.99 ± 33% perf-profile.calltrace.cycles-
> pp.drain_stock.refill_stock.__sk_mem_reduce_allocated.tcp_clean_rtx_queu
> e.tcp_ack
> 9.51 +1.2 10.67 ± 2% perf-profile.calltrace.cycles-
> pp.release_sock.tcp_sendpage.inet_sendpage.kernel_sendpage.sock_sendp
> age
> 8.17 +1.2 9.34 ± 2% perf-profile.calltrace.cycles-
> pp.__release_sock.release_sock.tcp_sendpage.inet_sendpage.kernel_sendp
> age
> 10.68 +1.3 11.98 perf-profile.calltrace.cycles-
> pp.tcp_build_frag.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_s
> endpage
> 0.96 ± 15% +1.4 2.34 ± 11% perf-profile.calltrace.cycles-
> pp.try_charge_memcg.mem_cgroup_charge_skmem.tcp_data_queue.tcp_rc
> v_established.tcp_v4_do_rcv
> 7.84 +1.5 9.30 perf-profile.calltrace.cycles-
> pp.tcp_wmem_schedule.tcp_build_frag.do_tcp_sendpages.tcp_sendpage.in
> et_sendpage
> 7.60 +1.5 9.08 perf-profile.calltrace.cycles-
> pp.__sk_mem_schedule.tcp_wmem_schedule.tcp_build_frag.do_tcp_sendp
> ages.tcp_sendpage
> 36.91 +1.5 38.40 perf-profile.calltrace.cycles-
> pp.generic_splice_sendpage.direct_splice_actor.splice_direct_to_actor.do_sp
> lice_direct.do_sendfile
> 37.04 +1.5 38.53 perf-profile.calltrace.cycles-
> pp.direct_splice_actor.splice_direct_to_actor.do_splice_direct.do_sendfile._
> _x64_sys_sendfile64
> 7.41 +1.5 8.91 perf-profile.calltrace.cycles-
> pp.__sk_mem_raise_allocated.__sk_mem_schedule.tcp_wmem_schedule.tc
> p_build_frag.do_tcp_sendpages
> 36.49 +1.5 38.02 perf-profile.calltrace.cycles-
> pp.__splice_from_pipe.generic_splice_sendpage.direct_splice_actor.splice_d
> irect_to_actor.do_splice_direct
> 1.47 ± 3% +1.6 3.11 ± 7% perf-profile.calltrace.cycles-
> pp.try_charge_memcg.mem_cgroup_charge_skmem.__sk_mem_raise_alloca
> ted.__sk_mem_schedule.tcp_wmem_schedule
> 34.61 +1.7 36.26 perf-profile.calltrace.cycles-
> pp.pipe_to_sendpage.__splice_from_pipe.generic_splice_sendpage.direct_s
> plice_actor.splice_direct_to_actor
> 34.29 +1.7 35.97 perf-profile.calltrace.cycles-
> pp.sock_sendpage.pipe_to_sendpage.__splice_from_pipe.generic_splice_se
> ndpage.direct_splice_actor
> 34.10 +1.7 35.79 perf-profile.calltrace.cycles-
> pp.kernel_sendpage.sock_sendpage.pipe_to_sendpage.__splice_from_pipe.
> generic_splice_sendpage
> 33.73 +1.7 35.46 perf-profile.calltrace.cycles-
> pp.inet_sendpage.kernel_sendpage.sock_sendpage.pipe_to_sendpage.__spli
> ce_from_pipe
> 33.24 +1.8 35.02 perf-profile.calltrace.cycles-
> pp.tcp_sendpage.inet_sendpage.kernel_sendpage.sock_sendpage.pipe_to_s
> endpage
> 4.46 ± 2% +2.0 6.42 ± 2% perf-profile.calltrace.cycles-
> pp.mem_cgroup_charge_skmem.__sk_mem_raise_allocated.__sk_mem_sch
> edule.tcp_wmem_schedule.tcp_build_frag
> 11.28 -0.9 10.40 perf-profile.children.cycles-
> pp.__skb_datagram_iter
> 11.30 -0.9 10.42 perf-profile.children.cycles-
> pp.skb_copy_datagram_iter
> 29.47 -0.7 28.77 perf-profile.children.cycles-
> pp.tcp_recvmsg_locked
> 7.09 -0.5 6.56 perf-profile.children.cycles-pp._copy_to_iter
> 6.70 -0.5 6.20 perf-profile.children.cycles-pp.copyout
> 7.44 -0.5 6.94 perf-profile.children.cycles-
> pp.generic_file_splice_read
> 6.58 -0.4 6.15 perf-profile.children.cycles-pp.filemap_read
> 3.26 -0.3 2.97 perf-profile.children.cycles-
> pp.simple_copy_to_iter
> 3.16 -0.3 2.88 perf-profile.children.cycles-
> pp.__check_object_size
> 2.93 -0.2 2.70 perf-profile.children.cycles-
> pp.filemap_get_read_batch
> 2.65 ± 2% -0.2 2.42 perf-profile.children.cycles-
> pp.check_heap_object
> 3.16 -0.2 2.93 perf-profile.children.cycles-
> pp.filemap_get_pages
> 1.32 -0.1 1.22 perf-profile.children.cycles-pp.tcp_send_mss
> 1.33 -0.1 1.23 perf-profile.children.cycles-pp.touch_atime
> 1.22 ± 2% -0.1 1.12 perf-profile.children.cycles-
> pp.security_file_permission
> 5.62 -0.1 5.53 perf-profile.children.cycles-
> pp.lock_sock_nested
> 1.08 -0.1 1.00 perf-profile.children.cycles-
> pp.atime_needs_update
> 1.08 -0.1 1.00 perf-profile.children.cycles-
> pp.tcp_current_mss
> 0.96 ± 3% -0.1 0.88 perf-profile.children.cycles-
> pp.apparmor_file_permission
> 1.35 -0.1 1.28 perf-profile.children.cycles-
> pp.copy_page_to_iter_pipe
> 0.57 ± 3% -0.1 0.51 perf-profile.children.cycles-
> pp._copy_from_user
> 0.52 -0.1 0.46 ± 2% perf-profile.children.cycles-
> pp.__fsnotify_parent
> 1.06 -0.1 1.01 perf-profile.children.cycles-
> pp.__inet_lookup_established
> 0.41 -0.0 0.36 perf-profile.children.cycles-
> pp.tcp_rate_check_app_limited
> 0.52 ± 2% -0.0 0.48 ± 2% perf-profile.children.cycles-
> pp.netperf_sendfile
> 0.74 -0.0 0.70 perf-profile.children.cycles-
> pp.__cond_resched
> 0.48 -0.0 0.43 perf-profile.children.cycles-
> pp.tcp_event_new_data_sent
> 0.64 -0.0 0.60 perf-profile.children.cycles-pp.__fget_light
> 0.97 -0.0 0.93 perf-profile.children.cycles-pp.__alloc_skb
> 0.60 ± 3% -0.0 0.55 ± 3% perf-profile.children.cycles-pp.ip_rcv
> 0.78 -0.0 0.74 perf-profile.children.cycles-
> pp.tcp_stream_alloc_skb
> 0.38 -0.0 0.34 ± 2% perf-profile.children.cycles-
> pp.page_cache_pipe_buf_confirm
> 0.59 ± 2% -0.0 0.55 ± 2% perf-profile.children.cycles-
> pp.__entry_text_start
> 0.23 ± 5% -0.0 0.20 ± 2% perf-profile.children.cycles-pp.xas_load
> 0.48 -0.0 0.44 perf-profile.children.cycles-pp.sk_reset_timer
> 0.42 -0.0 0.39 perf-profile.children.cycles-
> pp.entry_SYSRETQ_unsafe_stack
> 0.74 ± 2% -0.0 0.71 perf-profile.children.cycles-pp.__kfree_skb
> 0.69 -0.0 0.65 perf-profile.children.cycles-pp.read_tsc
> 0.45 -0.0 0.42 ± 2% perf-profile.children.cycles-
> pp.current_time
> 0.57 -0.0 0.54 perf-profile.children.cycles-
> pp.kmem_cache_alloc_node
> 0.40 ± 2% -0.0 0.38 ± 2% perf-profile.children.cycles-
> pp.__virt_addr_valid
> 0.81 -0.0 0.78 perf-profile.children.cycles-
> pp.enqueue_task_fair
> 0.43 -0.0 0.40 perf-profile.children.cycles-pp.__mod_timer
> 0.38 -0.0 0.36 perf-profile.children.cycles-
> pp.tcp_established_options
> 0.21 ± 2% -0.0 0.18 ± 2% perf-profile.children.cycles-
> pp.sockfd_lookup_light
> 0.35 -0.0 0.32 ± 2% perf-profile.children.cycles-
> pp.__put_user_8
> 0.30 ± 3% -0.0 0.27 perf-profile.children.cycles-
> pp.aa_file_perm
> 0.48 -0.0 0.46 ± 2% perf-profile.children.cycles-
> pp.__tcp_send_ack
> 0.49 ± 2% -0.0 0.47 perf-profile.children.cycles-
> pp.kmem_cache_free
> 0.28 ± 3% -0.0 0.26 ± 4% perf-profile.children.cycles-
> pp.ip_rcv_finish_core
> 0.11 ± 6% -0.0 0.09 ± 5% perf-profile.children.cycles-pp.xas_start
> 0.24 -0.0 0.22 ± 3% perf-profile.children.cycles-pp.tcp_tso_segs
> 0.25 -0.0 0.23 perf-profile.children.cycles-
> pp.copy_page_to_iter
> 0.30 -0.0 0.28 ± 2% perf-profile.children.cycles-
> pp.__netif_receive_skb_core
> 0.24 -0.0 0.22 ± 2% perf-profile.children.cycles-pp.sanity
> 0.78 -0.0 0.76 perf-profile.children.cycles-
> pp.page_cache_pipe_buf_release
> 0.28 ± 3% -0.0 0.26 perf-profile.children.cycles-
> pp.tcp_schedule_loss_probe
> 0.27 -0.0 0.26 perf-profile.children.cycles-pp.rcu_all_qs
> 0.30 -0.0 0.28 perf-profile.children.cycles-
> pp.syscall_return_via_sysret
> 0.23 -0.0 0.22 ± 2% perf-profile.children.cycles-
> pp.set_next_entity
> 0.16 ± 3% -0.0 0.15 ± 5% perf-profile.children.cycles-
> pp.skb_release_head_state
> 0.15 ± 2% -0.0 0.14 ± 2% perf-profile.children.cycles-
> pp.folio_mark_accessed
> 0.08 -0.0 0.07 ± 5% perf-profile.children.cycles-pp.aa_sk_perm
> 0.20 ± 2% -0.0 0.18 ± 2% perf-profile.children.cycles-
> pp._raw_spin_unlock_bh
> 0.07 -0.0 0.06 perf-profile.children.cycles-pp.rb_next
> 0.05 +0.0 0.06 perf-profile.children.cycles-pp.skb_push
> 0.07 +0.0 0.08 perf-profile.children.cycles-
> pp.cpuidle_governor_latency_req
> 0.33 +0.0 0.34 perf-profile.children.cycles-
> pp.prepare_task_switch
> 0.07 +0.0 0.08 ± 5% perf-profile.children.cycles-
> pp.switch_fpu_return
> 0.11 ± 6% +0.0 0.12 ± 4% perf-profile.children.cycles-
> pp.resched_curr
> 0.14 ± 3% +0.0 0.15 ± 3% perf-profile.children.cycles-
> pp.check_preempt_curr
> 0.21 +0.0 0.23 ± 2% perf-profile.children.cycles-pp.ip_output
> 0.49 ± 2% +0.0 0.51 ± 2% perf-profile.children.cycles-
> pp._raw_spin_lock
> 0.59 +0.0 0.62 perf-profile.children.cycles-
> pp._raw_spin_lock_irqsave
> 0.76 ± 3% +0.1 0.90 ± 4% perf-profile.children.cycles-
> pp.mem_cgroup_uncharge_skmem
> 0.31 ± 2% +0.2 0.47 ± 10% perf-profile.children.cycles-
> pp.propagate_protected_usage
> 84.35 +0.2 84.55 perf-profile.children.cycles-
> pp.do_syscall_64
> 16.48 +0.2 16.68 perf-profile.children.cycles-
> pp.__local_bh_enable_ip
> 15.96 +0.2 16.20 perf-profile.children.cycles-pp.do_softirq
> 15.84 +0.2 16.09 perf-profile.children.cycles-pp.__do_softirq
> 15.20 +0.3 15.46 perf-profile.children.cycles-
> pp.net_rx_action
> 17.63 +0.3 17.89 perf-profile.children.cycles-
> pp.__dev_queue_xmit
> 18.00 +0.3 18.29 perf-profile.children.cycles-
> pp.ip_finish_output2
> 18.93 +0.3 19.22 perf-profile.children.cycles-
> pp.__ip_queue_xmit
> 20.12 +0.3 20.43 perf-profile.children.cycles-
> pp.__tcp_transmit_skb
> 12.38 +0.3 12.69 perf-profile.children.cycles-
> pp.tcp_write_xmit
> 13.56 +0.3 13.87 perf-profile.children.cycles-pp.__napi_poll
> 13.51 +0.3 13.83 perf-profile.children.cycles-
> pp.process_backlog
> 13.12 +0.3 13.44 perf-profile.children.cycles-
> pp.__netif_receive_skb_one_core
> 12.12 +0.4 12.51 perf-profile.children.cycles-
> pp.ip_local_deliver_finish
> 12.08 +0.4 12.47 perf-profile.children.cycles-
> pp.ip_protocol_deliver_rcu
> 11.84 +0.4 12.24 perf-profile.children.cycles-pp.tcp_v4_rcv
> 3.87 +0.5 4.34 perf-profile.children.cycles-pp.tcp_ack
> 2.89 +0.5 3.40 ± 2% perf-profile.children.cycles-
> pp.tcp_clean_rtx_queue
> 9.78 +0.5 10.31 perf-profile.children.cycles-
> pp.__tcp_push_pending_frames
> 1.54 ± 4% +0.7 2.26 ± 7% perf-profile.children.cycles-
> pp.refill_stock
> 1.26 ± 5% +0.7 1.99 ± 8% perf-profile.children.cycles-
> pp.drain_stock
> 1.24 ± 5% +0.7 1.96 ± 8% perf-profile.children.cycles-
> pp.page_counter_uncharge
> 17.03 +0.8 17.85 perf-profile.children.cycles-
> pp.do_tcp_sendpages
> 46.66 +0.9 47.56 perf-profile.children.cycles-
> pp.do_splice_direct
> 2.92 ± 2% +0.9 3.86 ± 3% perf-profile.children.cycles-
> pp.__sk_mem_reduce_allocated
> 46.08 +0.9 47.03 perf-profile.children.cycles-
> pp.splice_direct_to_actor
> 4.41 +1.0 5.43 ± 4% perf-profile.children.cycles-
> pp.tcp_data_queue
> 10.88 +1.3 12.18 perf-profile.children.cycles-
> pp.tcp_build_frag
> 16.59 +1.4 17.98 perf-profile.children.cycles-
> pp.tcp_v4_do_rcv
> 16.36 +1.4 17.77 perf-profile.children.cycles-
> pp.tcp_rcv_established
> 7.93 +1.5 9.40 perf-profile.children.cycles-
> pp.tcp_wmem_schedule
> 1.52 ± 4% +1.5 2.98 ± 8% perf-profile.children.cycles-
> pp.page_counter_try_charge
> 36.96 +1.5 38.45 perf-profile.children.cycles-
> pp.generic_splice_sendpage
> 37.07 +1.5 38.56 perf-profile.children.cycles-
> pp.direct_splice_actor
> 7.75 +1.5 9.24 perf-profile.children.cycles-
> pp.__sk_mem_schedule
> 7.59 +1.5 9.10 perf-profile.children.cycles-
> pp.__sk_mem_raise_allocated
> 36.59 +1.5 38.12 perf-profile.children.cycles-
> pp.__splice_from_pipe
> 11.95 +1.5 13.48 ± 2% perf-profile.children.cycles-
> pp.release_sock
> 10.33 +1.6 11.89 ± 2% perf-profile.children.cycles-
> pp.__release_sock
> 34.67 +1.7 36.32 perf-profile.children.cycles-
> pp.pipe_to_sendpage
> 34.34 +1.7 36.02 perf-profile.children.cycles-
> pp.sock_sendpage
> 34.15 +1.7 35.84 perf-profile.children.cycles-
> pp.kernel_sendpage
> 33.84 +1.7 35.56 perf-profile.children.cycles-
> pp.inet_sendpage
> 33.40 +1.8 35.16 perf-profile.children.cycles-
> pp.tcp_sendpage
> 3.31 ± 4% +2.6 5.93 ± 7% perf-profile.children.cycles-
> pp.try_charge_memcg
> 6.82 +3.0 9.82 ± 3% perf-profile.children.cycles-
> pp.mem_cgroup_charge_skmem
> 6.66 -0.5 6.15 perf-profile.self.cycles-pp.copyout
> 2.88 -0.4 2.44 ± 2% perf-profile.self.cycles-
> pp.__sk_mem_raise_allocated
> 2.69 -0.2 2.50 perf-profile.self.cycles-
> pp.filemap_get_read_batch
> 2.14 ± 2% -0.2 1.95 ± 2% perf-profile.self.cycles-
> pp.check_heap_object
> 2.01 -0.1 1.88 perf-profile.self.cycles-pp.tcp_build_frag
> 1.30 -0.1 1.22 perf-profile.self.cycles-pp.filemap_read
> 1.04 -0.1 0.96 perf-profile.self.cycles-pp.do_sendfile
> 0.70 -0.1 0.63 ± 2% perf-profile.self.cycles-
> pp.__splice_from_pipe
> 0.52 -0.1 0.46 ± 2% perf-profile.self.cycles-
> pp.sendfile_tcp_stream
> 0.75 -0.1 0.70 perf-profile.self.cycles-pp.do_tcp_sendpages
> 0.55 ± 2% -0.1 0.50 ± 2% perf-profile.self.cycles-
> pp._copy_from_user
> 0.42 ± 4% -0.1 0.36 ± 2% perf-profile.self.cycles-pp.sendfile
> 0.67 ± 3% -0.1 0.62 ± 2% perf-profile.self.cycles-
> pp.apparmor_file_permission
> 1.11 -0.0 1.06 perf-profile.self.cycles-
> pp.copy_page_to_iter_pipe
> 0.54 ± 2% -0.0 0.49 perf-profile.self.cycles-
> pp.entry_SYSCALL_64_after_hwframe
> 0.48 -0.0 0.43 ± 2% perf-profile.self.cycles-
> pp.__fsnotify_parent
> 0.80 ± 2% -0.0 0.75 perf-profile.self.cycles-
> pp.__skb_datagram_iter
> 0.81 -0.0 0.76 perf-profile.self.cycles-pp.tcp_write_xmit
> 0.95 -0.0 0.91 perf-profile.self.cycles-
> pp.__inet_lookup_established
> 0.62 -0.0 0.58 perf-profile.self.cycles-pp.__fget_light
> 0.36 -0.0 0.32 perf-profile.self.cycles-
> pp.tcp_rate_check_app_limited
> 0.34 -0.0 0.30 perf-profile.self.cycles-pp.inet_sendpage
> 0.47 -0.0 0.43 ± 2% perf-profile.self.cycles-pp.netperf_sendfile
> 0.49 ± 5% -0.0 0.45 perf-profile.self.cycles-pp.net_rx_action
> 0.48 -0.0 0.44 perf-profile.self.cycles-
> pp.atime_needs_update
> 0.67 -0.0 0.63 perf-profile.self.cycles-pp.tcp_v4_rcv
> 0.43 ± 3% -0.0 0.40 perf-profile.self.cycles-pp.do_syscall_64
> 0.41 -0.0 0.37 perf-profile.self.cycles-
> pp.entry_SYSRETQ_unsafe_stack
> 0.36 ± 2% -0.0 0.32 ± 2% perf-profile.self.cycles-
> pp.page_cache_pipe_buf_confirm
> 0.46 -0.0 0.42 perf-profile.self.cycles-
> pp.__local_bh_enable_ip
> 0.48 ± 2% -0.0 0.45 perf-profile.self.cycles-pp.tcp_sendpage
> 0.45 -0.0 0.42 perf-profile.self.cycles-pp.tcp_current_mss
> 0.31 ± 2% -0.0 0.28 perf-profile.self.cycles-pp.kernel_sendpage
> 0.34 -0.0 0.31 ± 2% perf-profile.self.cycles-pp.__put_user_8
> 0.66 -0.0 0.63 perf-profile.self.cycles-pp.read_tsc
> 0.40 -0.0 0.37 ± 2% perf-profile.self.cycles-
> pp.__check_object_size
> 0.33 -0.0 0.30 perf-profile.self.cycles-
> pp.generic_splice_sendpage
> 0.31 -0.0 0.28 ± 2% perf-profile.self.cycles-pp.tcp_send_mss
> 0.66 -0.0 0.64 perf-profile.self.cycles-pp.tcp_ack
> 0.28 ± 2% -0.0 0.25 ± 2% perf-profile.self.cycles-
> pp.__sys_recvfrom
> 0.44 -0.0 0.42 ± 2% perf-profile.self.cycles-pp.__cond_resched
> 0.39 -0.0 0.36 perf-profile.self.cycles-pp._copy_to_iter
> 0.34 ± 2% -0.0 0.32 ± 2% perf-profile.self.cycles-
> pp.tcp_established_options
> 0.24 ± 2% -0.0 0.21 ± 4% perf-profile.self.cycles-
> pp.tcp_wmem_schedule
> 0.48 ± 2% -0.0 0.46 perf-profile.self.cycles-
> pp.kmem_cache_free
> 0.33 -0.0 0.31 perf-profile.self.cycles-pp.pipe_to_sendpage
> 0.11 ± 6% -0.0 0.09 perf-profile.self.cycles-
> pp.check_stack_object
> 0.36 -0.0 0.34 ± 3% perf-profile.self.cycles-pp.release_sock
> 0.26 -0.0 0.24 ± 2% perf-profile.self.cycles-
> pp.security_file_permission
> 0.23 -0.0 0.21 ± 3% perf-profile.self.cycles-pp.tcp_tso_segs
> 0.44 -0.0 0.42 perf-profile.self.cycles-
> pp.kmem_cache_alloc_node
> 0.31 -0.0 0.29 ± 2% perf-profile.self.cycles-pp.current_time
> 0.20 ± 3% -0.0 0.18 ± 2% perf-profile.self.cycles-
> pp.do_splice_direct
> 0.25 ± 4% -0.0 0.23 ± 2% perf-profile.self.cycles-pp.aa_file_perm
> 0.25 ± 3% -0.0 0.23 ± 2% perf-profile.self.cycles-pp.touch_atime
> 0.19 -0.0 0.17 ± 2% perf-profile.self.cycles-pp.process_backlog
> 0.11 ± 6% -0.0 0.09 ± 5% perf-profile.self.cycles-
> pp.__get_task_ioprio
> 0.21 -0.0 0.19 perf-profile.self.cycles-pp.sanity
> 0.06 -0.0 0.04 ± 44% perf-profile.self.cycles-pp.aa_sk_perm
> 0.33 ± 2% -0.0 0.31 ± 2% perf-profile.self.cycles-
> pp.splice_direct_to_actor
> 0.30 -0.0 0.28 perf-profile.self.cycles-
> pp.syscall_return_via_sysret
> 0.22 -0.0 0.20 ± 2% perf-profile.self.cycles-
> pp.copy_page_to_iter
> 0.16 ± 2% -0.0 0.14 ± 3% perf-profile.self.cycles-
> pp.tcp_stream_alloc_skb
> 0.11 ± 3% -0.0 0.10 ± 5% perf-profile.self.cycles-
> pp.ip_protocol_deliver_rcu
> 0.09 ± 6% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.xas_start
> 0.15 ± 2% -0.0 0.14 ± 3% perf-profile.self.cycles-
> pp.__sk_mem_schedule
> 0.65 -0.0 0.63 perf-profile.self.cycles-pp.tcp_rcv_established
> 0.25 -0.0 0.23 perf-profile.self.cycles-pp.__mod_timer
> 0.15 -0.0 0.14 ± 3% perf-profile.self.cycles-
> pp.tcp_tx_timestamp
> 0.30 -0.0 0.28 ± 2% perf-profile.self.cycles-
> pp.__netif_receive_skb_core
> 0.75 -0.0 0.74 perf-profile.self.cycles-
> pp.page_cache_pipe_buf_release
> 0.18 ± 2% -0.0 0.17 ± 2% perf-profile.self.cycles-
> pp.sock_sendpage
> 0.13 ± 2% -0.0 0.12 perf-profile.self.cycles-
> pp._raw_spin_unlock_bh
> 0.12 ± 3% -0.0 0.11 perf-profile.self.cycles-
> pp.folio_mark_accessed
> 0.12 -0.0 0.11 ± 3% perf-profile.self.cycles-
> pp.simple_copy_to_iter
> 0.06 -0.0 0.05 perf-profile.self.cycles-
> pp.splice_from_pipe_next
> 0.11 -0.0 0.10 perf-profile.self.cycles-
> pp.exit_to_user_mode_prepare
> 0.25 +0.0 0.26 perf-profile.self.cycles-pp.__switch_to
> 0.06 ± 8% +0.0 0.07 ± 6% perf-profile.self.cycles-
> pp.switch_fpu_return
> 0.44 ± 2% +0.0 0.46 perf-profile.self.cycles-pp._raw_spin_lock
> 0.33 +0.0 0.36 ± 2% perf-profile.self.cycles-pp.__schedule
> 0.58 +0.0 0.61 perf-profile.self.cycles-
> pp._raw_spin_lock_irqsave
> 0.34 ± 3% +0.0 0.38 perf-profile.self.cycles-
> pp.__x64_sys_sendfile64
> 0.16 ± 8% +0.0 0.20 ± 2% perf-profile.self.cycles-pp.do_splice_to
> 0.65 ± 2% +0.1 0.73 ± 2% perf-profile.self.cycles-
> pp.__sk_mem_reduce_allocated
> 0.71 ± 3% +0.1 0.84 ± 5% perf-profile.self.cycles-
> pp.mem_cgroup_uncharge_skmem
> 0.30 ± 2% +0.2 0.47 ± 10% perf-profile.self.cycles-
> pp.propagate_protected_usage
> 3.34 ± 3% +0.4 3.72 ± 5% perf-profile.self.cycles-
> pp.mem_cgroup_charge_skmem
> 1.08 ± 6% +0.7 1.74 ± 8% perf-profile.self.cycles-
> pp.page_counter_uncharge
> 1.72 ± 3% +1.2 2.87 ± 7% perf-profile.self.cycles-
> pp.try_charge_memcg
> 1.36 ± 5% +1.4 2.73 ± 8% perf-profile.self.cycles-
> pp.page_counter_try_charge
>
>
>
> [2]
>
> ================================================================
> =========================
> cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/s
> end_size/tbox_group/test/testcase:
> cs-localhost/gcc-11/performance/ipv4/x86_64-rhel-8.3/50%/debian-11.1-
> x86_64-20220510.cgz/300s/10K/lkp-icl-2sp2/TCP_SENDFILE/netperf
>
> commit:
> ed23734c23 ("Merge tag 'net-6.4-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
> 05d72a8bed ("net: Keep sk->sk_forward_alloc as a proper size")
>
> ed23734c23d2fc1e 05d72a8bedfacfc46f300ab38e0
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 5.95e+09 -12.7% 5.193e+09 cpuidle..time
> 3328 ± 22% +96.7% 6547 ± 21% numa-
> vmstat.node2.nr_slab_reclaimable
> 13.95 -2.0 11.93 mpstat.cpu.all.idle%
> 2.69 +0.6 3.31 mpstat.cpu.all.usr%
> 5106176 -6.6% 4769081 vmstat.system.cs
> 2629481 -7.3% 2436543 vmstat.system.in
> 11284480 ± 9% +23.7% 13957802 ± 11% meminfo.DirectMap2M
> 1726173 ± 2% -17.6% 1422506 ± 2% meminfo.Mapped
> 7247621 +11.2% 8061423 meminfo.Shmem
> 13314 ± 22% +96.7% 26192 ± 21% numa-
> meminfo.node2.KReclaimable
> 13314 ± 22% +96.7% 26192 ± 21% numa-
> meminfo.node2.SReclaimable
> 71128 ± 5% +28.0% 91013 ± 8% numa-meminfo.node2.Slab
> 15.26 -1.9 13.33 turbostat.C1%
> 10.41 -15.8% 8.77 turbostat.CPU%c1
> 0.26 +11.5% 0.29 turbostat.IPC
> 30.71 -3.2% 29.72 turbostat.RAMWatt
> 7854382 ± 2% +10.3% 8664074 ± 2%
> sched_debug.cfs_rq:/.min_vruntime.min
> 708120 ± 2% -15.5% 598098 ± 3%
> sched_debug.cfs_rq:/.min_vruntime.stddev
> 708203 ± 2% -15.5% 598191 ± 3%
> sched_debug.cfs_rq:/.spread0.stddev
> 5317 ± 2% -11.2% 4722 ± 5% sched_debug.cpu.avg_idle.min
> 10037310 ± 3% -15.9% 8440803 ± 2%
> sched_debug.cpu.nr_switches.max
> 1290083 ± 2% -22.0% 1006686 ± 3%
> sched_debug.cpu.nr_switches.stddev
> 23218 +29.4% 30043 netperf.Throughput_Mbps
> 1485996 +29.4% 1922763 netperf.Throughput_total_Mbps
> 160215 ± 3% +107.9% 333022 ± 15%
> netperf.time.involuntary_context_switches
> 5567 +2.5% 5707 netperf.time.percent_of_cpu_this_job_got
> 16093 +1.2% 16286 netperf.time.system_time
> 669.70 +34.0% 897.24 netperf.time.user_time
> 35419 ± 3% +160.8% 92374 ± 5%
> netperf.time.voluntary_context_switches
> 5.442e+09 +29.4% 7.041e+09 netperf.workload
> 2481590 +8.1% 2681600 proc-vmstat.nr_file_pages
> 1892119 +10.6% 2092306 proc-vmstat.nr_inactive_anon
> 431915 ± 2% -17.9% 354649 ± 2% proc-vmstat.nr_mapped
> 3064 -4.5% 2927 proc-vmstat.nr_page_table_pages
> 1813072 +11.0% 2013082 proc-vmstat.nr_shmem
> 35384 +1.3% 35861 proc-vmstat.nr_slab_reclaimable
> 1892119 +10.6% 2092306 proc-vmstat.nr_zone_inactive_anon
> 491137 ± 2% -20.0% 393067 ± 17% proc-
> vmstat.numa_hint_faults_local
> 5593417 +10.7% 6193714 proc-vmstat.numa_hit
> 5431644 +10.5% 6001135 proc-vmstat.numa_local
> 44132 ± 3% +18.1% 52128 ± 6% proc-vmstat.pgactivate
> 5733229 +9.9% 6302633 proc-vmstat.pgalloc_normal
> 7.00 -22.1% 5.45 perf-stat.i.MPKI
> 4.405e+10 +13.7% 5.007e+10 perf-stat.i.branch-instructions
> 0.87 -0.1 0.78 perf-stat.i.branch-miss-rate%
> 3.795e+08 +1.6% 3.854e+08 perf-stat.i.branch-misses
> 6.39 -3.3 3.09 ± 7% perf-stat.i.cache-miss-rate%
> 1.038e+08 ± 2% -57.7% 43877506 ± 7% perf-stat.i.cache-misses
> 1.633e+09 -12.0% 1.438e+09 perf-stat.i.cache-references
> 5163294 -6.8% 4814691 perf-stat.i.context-switches
> 1.29 -10.0% 1.16 perf-stat.i.cpi
> 3.016e+11 +1.8% 3.072e+11 perf-stat.i.cpu-cycles
> 27516 ± 3% -34.8% 17931 perf-stat.i.cpu-migrations
> 2930 ± 2% +153.5% 7428 ± 7% perf-stat.i.cycles-between-cache-
> misses
> 0.01 -0.0 0.01 ± 13% perf-stat.i.dTLB-load-miss-rate%
> 7226907 -11.0% 6428694 ± 13% perf-stat.i.dTLB-load-misses
> 6.872e+10 +13.4% 7.791e+10 perf-stat.i.dTLB-loads
> 0.00 ± 3% -0.0 0.00 ± 2% perf-stat.i.dTLB-store-miss-rate%
> 954320 ± 3% -33.0% 639153 ± 2% perf-stat.i.dTLB-store-misses
> 3.753e+10 +12.5% 4.221e+10 perf-stat.i.dTLB-stores
> 2.332e+11 +13.2% 2.639e+11 perf-stat.i.instructions
> 0.78 +11.1% 0.86 perf-stat.i.ipc
> 2.36 +1.8% 2.40 perf-stat.i.metric.GHz
> 263.06 ± 2% -45.6% 143.14 ± 5% perf-stat.i.metric.K/sec
> 1186 +13.0% 1340 perf-stat.i.metric.M/sec
> 95.18 +2.5 97.70 perf-stat.i.node-load-miss-rate%
> 15047143 ± 3% -50.7% 7421607 ± 7% perf-stat.i.node-load-misses
> 736992 ± 4% -79.2% 153436 ± 5% perf-stat.i.node-loads
> 76.94 -13.8 63.13 ± 5% perf-stat.i.node-store-miss-rate%
> 8866276 -61.9% 3375324 ± 7% perf-stat.i.node-store-misses
> 2808107 ± 7% -34.1% 1851536 ± 14% perf-stat.i.node-stores
> 7.00 -22.2% 5.45 perf-stat.overall.MPKI
> 0.86 -0.1 0.77 perf-stat.overall.branch-miss-rate%
> 6.36 -3.3 3.05 ± 7% perf-stat.overall.cache-miss-rate%
> 1.29 -10.0% 1.16 perf-stat.overall.cpi
> 2907 ± 2% +142.1% 7040 ± 7% perf-stat.overall.cycles-between-
> cache-misses
> 0.01 -0.0 0.01 ± 13% perf-stat.overall.dTLB-load-miss-rate%
> 0.00 ± 3% -0.0 0.00 ± 2% perf-stat.overall.dTLB-store-miss-rate%
> 0.77 +11.1% 0.86 perf-stat.overall.ipc
> 95.33 +2.6 97.97 perf-stat.overall.node-load-miss-rate%
> 75.97 -11.3 64.69 ± 4% perf-stat.overall.node-store-miss-rate%
> 12891 -12.6% 11262 perf-stat.overall.path-length
> 4.39e+10 +13.7% 4.99e+10 perf-stat.ps.branch-instructions
> 3.782e+08 +1.6% 3.841e+08 perf-stat.ps.branch-misses
> 1.034e+08 ± 2% -57.7% 43735005 ± 7% perf-stat.ps.cache-misses
> 1.627e+09 -11.9% 1.433e+09 perf-stat.ps.cache-references
> 5145798 -6.8% 4798160 perf-stat.ps.context-switches
> 3.006e+11 +1.8% 3.062e+11 perf-stat.ps.cpu-cycles
> 27426 ± 3% -34.8% 17883 perf-stat.ps.cpu-migrations
> 7190273 -11.0% 6397079 ± 13% perf-stat.ps.dTLB-load-misses
> 6.849e+10 +13.4% 7.765e+10 perf-stat.ps.dTLB-loads
> 950808 ± 3% -33.0% 637446 ± 2% perf-stat.ps.dTLB-store-misses
> 3.741e+10 +12.5% 4.207e+10 perf-stat.ps.dTLB-stores
> 2.324e+11 +13.2% 2.63e+11 perf-stat.ps.instructions
> 14992384 ± 3% -50.7% 7391904 ± 7% perf-stat.ps.node-load-misses
> 734606 ± 4% -79.2% 153010 ± 5% perf-stat.ps.node-loads
> 8837267 -61.9% 3364441 ± 7% perf-stat.ps.node-store-misses
> 2799494 ± 7% -34.1% 1845425 ± 14% perf-stat.ps.node-stores
> 7.015e+13 +13.0% 7.93e+13 perf-stat.total.instructions
> 7.88 -6.8 1.06 ± 2% perf-profile.calltrace.cycles-
> pp.tcp_wmem_schedule.tcp_build_frag.do_tcp_sendpages.tcp_sendpage.in
> et_sendpage
> 7.64 -6.8 0.84 perf-profile.calltrace.cycles-
> pp.__sk_mem_schedule.tcp_wmem_schedule.tcp_build_frag.do_tcp_sendp
> ages.tcp_sendpage
> 7.45 -6.7 0.76 ± 2% perf-profile.calltrace.cycles-
> pp.__sk_mem_raise_allocated.__sk_mem_schedule.tcp_wmem_schedule.tc
> p_build_frag.do_tcp_sendpages
> 10.74 -6.3 4.41 perf-profile.calltrace.cycles-
> pp.tcp_build_frag.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_s
> endpage
> 33.39 -6.1 27.33 perf-profile.calltrace.cycles-
> pp.tcp_sendpage.inet_sendpage.kernel_sendpage.sock_sendpage.pipe_to_s
> endpage
> 33.88 -6.0 27.93 perf-profile.calltrace.cycles-
> pp.inet_sendpage.kernel_sendpage.sock_sendpage.pipe_to_sendpage.__spli
> ce_from_pipe
> 34.25 -5.9 28.39 perf-profile.calltrace.cycles-
> pp.kernel_sendpage.sock_sendpage.pipe_to_sendpage.__splice_from_pipe.
> generic_splice_sendpage
> 34.43 -5.8 28.61 perf-profile.calltrace.cycles-
> pp.sock_sendpage.pipe_to_sendpage.__splice_from_pipe.generic_splice_se
> ndpage.direct_splice_actor
> 34.75 -5.8 29.00 perf-profile.calltrace.cycles-
> pp.pipe_to_sendpage.__splice_from_pipe.generic_splice_sendpage.direct_s
> plice_actor.splice_direct_to_actor
> 36.66 -5.3 31.34 perf-profile.calltrace.cycles-
> pp.__splice_from_pipe.generic_splice_sendpage.direct_splice_actor.splice_d
> irect_to_actor.do_splice_direct
> 37.08 -5.2 31.85 perf-profile.calltrace.cycles-
> pp.generic_splice_sendpage.direct_splice_actor.splice_direct_to_actor.do_sp
> lice_direct.do_sendfile
> 37.20 -5.2 32.00 perf-profile.calltrace.cycles-
> pp.direct_splice_actor.splice_direct_to_actor.do_splice_direct.do_sendfile._
> _x64_sys_sendfile64
> 16.95 -5.1 11.89 perf-profile.calltrace.cycles-
> pp.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_sendpage.sock_s
> endpage
> 8.23 -2.6 5.67 ± 2% perf-profile.calltrace.cycles-
> pp.__release_sock.release_sock.tcp_sendpage.inet_sendpage.kernel_sendp
> age
> 46.36 -2.5 43.86 perf-profile.calltrace.cycles-
> pp.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_sys_sendfile64
> .do_syscall_64
> 46.96 -2.4 44.58 perf-profile.calltrace.cycles-
> pp.do_splice_direct.do_sendfile.__x64_sys_sendfile64.do_syscall_64.entry_S
> YSCALL_64_after_hwframe
> 9.59 -2.3 7.24 ± 2% perf-profile.calltrace.cycles-
> pp.release_sock.tcp_sendpage.inet_sendpage.kernel_sendpage.sock_sendp
> age
> 2.87 -2.1 0.76 perf-profile.calltrace.cycles-
> pp.tcp_data_queue.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protoc
> ol_deliver_rcu
> 51.58 -2.0 49.62 perf-profile.calltrace.cycles-
> pp.entry_SYSCALL_64_after_hwframe.sendfile.sendfile_tcp_stream.main.__li
> bc_start_main
> 49.43 -1.9 47.48 perf-profile.calltrace.cycles-
> pp.do_sendfile.__x64_sys_sendfile64.do_syscall_64.entry_SYSCALL_64_after
> _hwframe.sendfile
> 51.31 -1.9 49.37 perf-profile.calltrace.cycles-
> pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendfile.sendfile_tcp_st
> ream.main
> 6.07 -1.8 4.22 ± 2% perf-profile.calltrace.cycles-
> pp.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_do_rcv.__releas
> e_sock.release_sock
> 6.04 -1.8 4.20 ± 2% perf-profile.calltrace.cycles-
> pp.tcp_write_xmit.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_
> do_rcv.__release_sock
> 52.41 -1.8 50.64 perf-profile.calltrace.cycles-
> pp.sendfile.sendfile_tcp_stream.main.__libc_start_main
> 50.66 -1.7 48.91 perf-profile.calltrace.cycles-
> pp.__x64_sys_sendfile64.do_syscall_64.entry_SYSCALL_64_after_hwframe.s
> endfile.sendfile_tcp_stream
> 1.99 -1.5 0.48 ± 44% perf-profile.calltrace.cycles-
> pp.tcp_v4_do_rcv.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg
> 53.77 -1.5 52.28 perf-profile.calltrace.cycles-
> pp.sendfile_tcp_stream.main.__libc_start_main
> 1.88 ± 2% -1.5 0.42 ± 44% perf-profile.calltrace.cycles-
> pp.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock.tcp_recv
> msg
> 5.64 -1.5 4.19 ± 2% perf-profile.calltrace.cycles-
> pp.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit._
> _tcp_push_pending_frames
> 6.14 -1.4 4.71 ± 2% perf-profile.calltrace.cycles-
> pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pendin
> g_frames.tcp_rcv_established
> 5.67 -1.4 4.24 ± 2% perf-profile.calltrace.cycles-
> pp.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock.tcp_sen
> dpage
> 2.08 -1.4 0.68 ± 8% perf-profile.calltrace.cycles-
> pp.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg.sock_recvmsg
> 5.66 -1.4 4.28 ± 2% perf-profile.calltrace.cycles-
> pp.tcp_v4_do_rcv.__release_sock.release_sock.tcp_sendpage.inet_sendpage
> 2.22 -1.4 0.84 ± 8% perf-profile.calltrace.cycles-
> pp.release_sock.tcp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom
> 7.37 -1.3 6.07 ± 3% perf-profile.calltrace.cycles-
> pp.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames.tcp_rcv
> _established.tcp_v4_do_rcv
> 12.84 -1.2 11.64 perf-profile.calltrace.cycles-
> pp.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter.cpuidle_
> enter_state.cpuidle_enter
> 7.52 -1.1 6.41 ± 2% perf-profile.calltrace.cycles-
> pp.__dev_queue_xmit.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_s
> kb.tcp_write_xmit
> 11.36 -1.0 10.31 perf-profile.calltrace.cycles-
> pp.start_secondary.secondary_startup_64_no_verify
> 11.35 -1.0 10.30 perf-profile.calltrace.cycles-
> pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
> 11.47 -1.0 10.43 perf-profile.calltrace.cycles-
> pp.secondary_startup_64_no_verify
> 11.32 -1.0 10.28 perf-profile.calltrace.cycles-
> pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_ve
> rify
> 9.96 -0.9 9.02 perf-profile.calltrace.cycles-
> pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_s
> tartup_64_no_verify
> 9.10 -0.9 8.24 perf-profile.calltrace.cycles-
> pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondar
> y
> 9.03 -0.9 8.18 perf-profile.calltrace.cycles-
> pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_
> entry
> 8.78 -0.8 7.95 perf-profile.calltrace.cycles-
> pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_i
> dle
> 1.03 -0.6 0.43 ± 44% perf-profile.calltrace.cycles-
> pp.__wake_up_common.__wake_up_common_lock.sock_def_readable.tcp_
> data_queue.tcp_rcv_established
> 1.19 -0.6 0.59 ± 2% perf-profile.calltrace.cycles-
> pp.sock_def_readable.tcp_data_queue.tcp_rcv_established.tcp_v4_do_rcv.t
> cp_v4_rcv
> 1.32 -0.6 0.75 ± 2% perf-profile.calltrace.cycles-
> pp.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock
> 1.08 -0.5 0.54 perf-profile.calltrace.cycles-
> pp.__wake_up_common_lock.sock_def_readable.tcp_data_queue.tcp_rcv_e
> stablished.tcp_v4_do_rcv
> 1.12 -0.5 0.59 ± 3% perf-profile.calltrace.cycles-
> pp.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.__relea
> se_sock
> 2.46 -0.5 2.00 ± 8% perf-profile.calltrace.cycles-
> pp.wait_woken.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg
> 2.24 -0.4 1.80 ± 8% perf-profile.calltrace.cycles-
> pp.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked.tcp_rec
> vmsg
> 2.19 -0.4 1.75 ± 8% perf-profile.calltrace.cycles-
> pp.schedule.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locke
> d
> 2.08 -0.4 1.65 ± 7% perf-profile.calltrace.cycles-
> pp.__schedule.schedule.schedule_timeout.wait_woken.sk_wait_data
> 3.07 -0.4 2.65 perf-profile.calltrace.cycles-
> pp.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.sock_recvm
> sg
> 1.69 -0.4 1.32 ± 8% perf-profile.calltrace.cycles-
> pp.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_
> rcv
> 3.56 -0.3 3.27 perf-profile.calltrace.cycles-
> pp.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle
> _idle_call
> 8.87 -0.3 8.62 perf-profile.calltrace.cycles-
> pp.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish
> .__netif_receive_skb_one_core
> 2.17 -0.2 1.96 ± 8% perf-profile.calltrace.cycles-
> pp.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_delive
> r_rcu
> 8.73 -0.2 8.51 perf-profile.calltrace.cycles-
> pp.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip
> _local_deliver_finish
> 0.69 -0.2 0.53 ± 44% perf-profile.calltrace.cycles-
> pp.dequeue_task_fair.__schedule.schedule.schedule_timeout.wait_woken
> 0.60 -0.1 0.46 ± 44% perf-profile.calltrace.cycles-
> pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.schedule_timeo
> ut
> 2.34 -0.1 2.23 perf-profile.calltrace.cycles-
> pp.sysvec_call_function_single.asm_sysvec_call_function_single.acpi_safe_h
> alt.acpi_idle_enter.cpuidle_enter_state
> 0.99 -0.1 0.92 perf-profile.calltrace.cycles-
> pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_star
> tup_64_no_verify
> 1.78 -0.1 1.70 perf-profile.calltrace.cycles-
> pp.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_cal
> l_function_single.acpi_safe_halt.acpi_idle_enter
> 0.93 -0.1 0.85 perf-profile.calltrace.cycles-
> pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
> 0.60 -0.1 0.54 perf-profile.calltrace.cycles-
> pp.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
> 1.21 -0.1 1.16 perf-profile.calltrace.cycles-
> pp.sched_ttwu_pending.__sysvec_call_function_single.sysvec_call_function_
> single.asm_sysvec_call_function_single.acpi_safe_halt
> 0.95 -0.0 0.90 perf-profile.calltrace.cycles-
> pp.ttwu_do_activate.sched_ttwu_pending.__sysvec_call_function_single.sys
> vec_call_function_single.asm_sysvec_call_function_single
> 0.69 -0.0 0.66 perf-profile.calltrace.cycles-
> pp.napi_consume_skb.net_rx_action.__do_softirq.do_softirq.__local_bh_en
> able_ip
> 0.78 -0.0 0.75 perf-profile.calltrace.cycles-
> pp.enqueue_task_fair.activate_task.ttwu_do_activate.sched_ttwu_pending._
> _sysvec_call_function_single
> 0.53 +0.0 0.55 perf-profile.calltrace.cycles-
> pp.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate.sched
> _ttwu_pending
> 0.58 +0.0 0.61 ± 2% perf-profile.calltrace.cycles-
> pp.__alloc_skb.tcp_stream_alloc_skb.tcp_build_frag.do_tcp_sendpages.tcp_
> sendpage
> 0.79 +0.1 0.90 ± 2% perf-profile.calltrace.cycles-
> pp.tcp_stream_alloc_skb.tcp_build_frag.do_tcp_sendpages.tcp_sendpage.in
> et_sendpage
> 0.77 +0.1 0.90 ± 3% perf-profile.calltrace.cycles-
> pp.page_cache_pipe_buf_release.__splice_from_pipe.generic_splice_sendpa
> ge.direct_splice_actor.splice_direct_to_actor
> 1.04 +0.1 1.18 ± 2% perf-profile.calltrace.cycles-
> pp._raw_spin_lock_bh.release_sock.tcp_sendpage.inet_sendpage.kernel_se
> ndpage
> 0.96 +0.2 1.12 ± 2% perf-profile.calltrace.cycles-
> pp.tcp_current_mss.tcp_send_mss.do_tcp_sendpages.tcp_sendpage.inet_se
> ndpage
> 0.71 +0.2 0.89 perf-profile.calltrace.cycles-
> pp.do_splice_to.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_s
> ys_sendfile64
> 0.41 ± 50% +0.2 0.64 ± 2% perf-profile.calltrace.cycles-
> pp._copy_from_user.__x64_sys_sendfile64.do_syscall_64.entry_SYSCALL_64
> _after_hwframe.sendfile
> 0.41 ± 50% +0.2 0.65 perf-profile.calltrace.cycles-
> pp.security_file_permission.do_sendfile.__x64_sys_sendfile64.do_syscall_64
> .entry_SYSCALL_64_after_hwframe
> 1.32 +0.2 1.56 perf-profile.calltrace.cycles-
> pp.tcp_send_mss.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_se
> ndpage
> 15.63 +0.3 15.90 perf-profile.calltrace.cycles-
> pp.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_fini
> sh_output2
> 1.10 +0.3 1.37 ± 2% perf-profile.calltrace.cycles-
> pp.tcp_write_xmit.__tcp_push_pending_frames.do_tcp_sendpages.tcp_send
> page.inet_sendpage
> 15.79 +0.3 16.06 perf-profile.calltrace.cycles-
> pp.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2._
> _ip_queue_xmit
> 1.18 +0.3 1.45 ± 2% perf-profile.calltrace.cycles-
> pp.__tcp_push_pending_frames.do_tcp_sendpages.tcp_sendpage.inet_send
> page.kernel_sendpage
> 15.88 +0.3 16.16 perf-profile.calltrace.cycles-
> pp.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.__ip_queue_
> xmit.__tcp_transmit_skb
> 0.31 ± 81% +0.3 0.60 ± 2% perf-profile.calltrace.cycles-
> pp.touch_atime.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_s
> ys_sendfile64
> 1.29 +0.3 1.60 perf-profile.calltrace.cycles-
> pp.copy_page_to_iter_pipe.filemap_read.generic_file_splice_read.splice_dir
> ect_to_actor.do_splice_direct
> 2.14 +0.3 2.48 ± 2% perf-profile.calltrace.cycles-
> pp.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.d
> o_tcp_sendpages
> 2.23 +0.4 2.60 perf-profile.calltrace.cycles-
> pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.do_tcp_sendpages.
> tcp_sendpage
> 2.42 +0.4 2.86 ± 2% perf-profile.calltrace.cycles-
> pp.__tcp_transmit_skb.tcp_write_xmit.do_tcp_sendpages.tcp_sendpage.inet
> _sendpage
> 2.66 +0.5 3.20 ± 2% perf-profile.calltrace.cycles-
> pp.tcp_write_xmit.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_s
> endpage
> 0.00 +0.5 0.54 ± 2% perf-profile.calltrace.cycles-
> pp.__fget_light.do_sendfile.__x64_sys_sendfile64.do_syscall_64.entry_SYSC
> ALL_64_after_hwframe
> 0.00 +0.6 0.56 ± 2% perf-profile.calltrace.cycles-
> pp.__entry_text_start.sendfile.sendfile_tcp_stream.main.__libc_start_main
> 4.35 +0.6 4.96 ± 2% perf-profile.calltrace.cycles-
> pp.native_queued_spin_lock_slowpath._raw_spin_lock_bh.lock_sock_neste
> d.tcp_sendpage.inet_sendpage
> 0.00 +0.7 0.74 ± 3% perf-profile.calltrace.cycles-
> pp.try_to_wake_up.__wake_up_common.__wake_up_common_lock.sock_d
> ef_readable.tcp_rcv_established
> 5.15 +0.8 5.93 ± 2% perf-profile.calltrace.cycles-
> pp._raw_spin_lock_bh.lock_sock_nested.tcp_sendpage.inet_sendpage.kerne
> l_sendpage
> 0.00 +0.8 0.84 ± 3% perf-profile.calltrace.cycles-
> pp.__wake_up_common.__wake_up_common_lock.sock_def_readable.tcp_
> rcv_established.tcp_v4_do_rcv
> 2.47 +0.8 3.31 perf-profile.calltrace.cycles-
> pp.tcp_write_xmit.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_
> do_rcv.tcp_v4_rcv
> 2.49 +0.8 3.34 perf-profile.calltrace.cycles-
> pp.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_
> rcv.ip_protocol_deliver_rcu
> 5.49 +0.9 6.34 ± 2% perf-profile.calltrace.cycles-
> pp.lock_sock_nested.tcp_sendpage.inet_sendpage.kernel_sendpage.sock_se
> ndpage
> 0.00 +0.9 0.88 ± 3% perf-profile.calltrace.cycles-
> pp.__wake_up_common_lock.sock_def_readable.tcp_rcv_established.tcp_v
> 4_do_rcv.tcp_v4_rcv
> 2.61 ± 2% +0.9 3.53 ± 3% perf-profile.calltrace.cycles-
> pp.check_heap_object.__check_object_size.simple_copy_to_iter.__skb_data
> gram_iter.skb_copy_datagram_iter
> 0.00 +0.9 0.94 ± 2% perf-profile.calltrace.cycles-
> pp.sock_def_readable.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_pro
> tocol_deliver_rcu
> 2.98 +1.0 4.00 ± 2% perf-profile.calltrace.cycles-
> pp.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy
> _datagram_iter.tcp_recvmsg_locked
> 2.91 +1.0 3.94 perf-profile.calltrace.cycles-
> pp.filemap_get_read_batch.filemap_get_pages.filemap_read.generic_file_sp
> lice_read.splice_direct_to_actor
> 10.13 +1.0 11.17 perf-profile.calltrace.cycles-
> pp.__tcp_transmit_skb.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.sock_
> recvmsg
> 3.14 +1.1 4.21 perf-profile.calltrace.cycles-
> pp.filemap_get_pages.filemap_read.generic_file_splice_read.splice_direct_t
> o_actor.do_splice_direct
> 3.24 +1.1 4.32 ± 2% perf-profile.calltrace.cycles-
> pp.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp_r
> ecvmsg_locked.tcp_recvmsg
> 10.38 +1.3 11.66 perf-profile.calltrace.cycles-
> pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_recvmsg_locked.tcp_recvmsg.i
> net_recvmsg
> 10.07 +1.3 11.41 perf-profile.calltrace.cycles-
> pp.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb.tcp_recvmsg_loc
> ked.tcp_recvmsg
> 9.94 +1.3 11.28 perf-profile.calltrace.cycles-
> pp.__dev_queue_xmit.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_s
> kb.tcp_recvmsg_locked
> 6.53 +1.6 8.18 perf-profile.calltrace.cycles-
> pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp
> _recvmsg_locked
> 7.02 +1.8 8.79 perf-profile.calltrace.cycles-
> pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp_recvms
> g_locked.tcp_recvmsg
> 31.73 +1.9 33.63 perf-profile.calltrace.cycles-
> pp.tcp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom.__x64_sys_recv
> from
> 31.85 +1.9 33.77 perf-profile.calltrace.cycles-
> pp.inet_recvmsg.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_sysc
> all_64
> 6.52 +1.9 8.44 perf-profile.calltrace.cycles-
> pp.filemap_read.generic_file_splice_read.splice_direct_to_actor.do_splice_d
> irect.do_sendfile
> 32.06 +1.9 34.00 perf-profile.calltrace.cycles-
> pp.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_
> SYSCALL_64_after_hwframe
> 32.54 +2.0 34.54 perf-profile.calltrace.cycles-
> pp.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_af
> ter_hwframe.recv
> 32.63 +2.0 34.64 perf-profile.calltrace.cycles-
> pp.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe.rec
> v.process_requests
> 33.81 +2.0 35.82 perf-profile.calltrace.cycles-
> pp.recv.process_requests.spawn_child.accept_connection.accept_connectio
> ns
> 32.95 +2.0 34.96 perf-profile.calltrace.cycles-
> pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.recv.process_requests.s
> pawn_child
> 33.11 +2.0 35.14 perf-profile.calltrace.cycles-
> pp.entry_SYSCALL_64_after_hwframe.recv.process_requests.spawn_child.ac
> cept_connection
> 7.44 +2.1 9.57 perf-profile.calltrace.cycles-
> pp.generic_file_splice_read.splice_direct_to_actor.do_splice_direct.do_sendf
> ile.__x64_sys_sendfile64
> 11.23 +3.1 14.38 perf-profile.calltrace.cycles-
> pp.__skb_datagram_iter.skb_copy_datagram_iter.tcp_recvmsg_locked.tcp_r
> ecvmsg.inet_recvmsg
> 11.30 +3.2 14.48 perf-profile.calltrace.cycles-
> pp.skb_copy_datagram_iter.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.
> sock_recvmsg
> 29.26 +3.2 32.47 perf-profile.calltrace.cycles-
> pp.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvf
> rom
> 7.77 -6.8 0.94 perf-profile.children.cycles-
> pp.__sk_mem_schedule
> 7.95 -6.8 1.12 perf-profile.children.cycles-
> pp.tcp_wmem_schedule
> 7.62 -6.7 0.88 perf-profile.children.cycles-
> pp.__sk_mem_raise_allocated
> 10.92 -6.3 4.63 perf-profile.children.cycles-pp.tcp_build_frag
> 6.86 ± 2% -6.2 0.62 ± 2% perf-profile.children.cycles-
> pp.mem_cgroup_charge_skmem
> 33.63 -5.9 27.72 perf-profile.children.cycles-pp.tcp_sendpage
> 34.07 -5.8 28.26 perf-profile.children.cycles-
> pp.inet_sendpage
> 34.39 -5.7 28.65 perf-profile.children.cycles-
> pp.kernel_sendpage
> 34.58 -5.7 28.88 perf-profile.children.cycles-
> pp.sock_sendpage
> 34.90 -5.6 29.28 perf-profile.children.cycles-
> pp.pipe_to_sendpage
> 36.86 -5.2 31.69 perf-profile.children.cycles-
> pp.__splice_from_pipe
> 37.23 -5.1 32.14 perf-profile.children.cycles-
> pp.generic_splice_sendpage
> 37.33 -5.1 32.26 perf-profile.children.cycles-
> pp.direct_splice_actor
> 17.14 -4.9 12.22 perf-profile.children.cycles-
> pp.do_tcp_sendpages
> 10.36 -3.9 6.49 perf-profile.children.cycles-
> pp.__release_sock
> 4.40 -3.6 0.78 perf-profile.children.cycles-
> pp.tcp_data_queue
> 11.99 -3.6 8.40 perf-profile.children.cycles-pp.release_sock
> 3.34 ± 4% -3.1 0.26 ± 2% perf-profile.children.cycles-
> pp.try_charge_memcg
> 16.59 -3.0 13.62 perf-profile.children.cycles-
> pp.tcp_v4_do_rcv
> 16.37 -2.9 13.46 perf-profile.children.cycles-
> pp.tcp_rcv_established
> 46.40 -2.5 43.91 perf-profile.children.cycles-
> pp.splice_direct_to_actor
> 2.93 -2.5 0.46 perf-profile.children.cycles-
> pp.__sk_mem_reduce_allocated
> 46.99 -2.4 44.62 perf-profile.children.cycles-
> pp.do_splice_direct
> 49.52 -1.9 47.59 perf-profile.children.cycles-pp.do_sendfile
> 50.71 -1.7 48.97 perf-profile.children.cycles-
> pp.__x64_sys_sendfile64
> 1.54 ± 5% -1.5 0.06 ± 6% perf-profile.children.cycles-
> pp.page_counter_try_charge
> 1.56 ± 3% -1.4 0.16 ± 2% perf-profile.children.cycles-
> pp.refill_stock
> 1.29 ± 4% -1.2 0.06 perf-profile.children.cycles-
> pp.drain_stock
> 1.26 ± 4% -1.2 0.05 perf-profile.children.cycles-
> pp.page_counter_uncharge
> 52.89 -1.2 51.68 perf-profile.children.cycles-pp.sendfile
> 11.36 -1.0 10.31 perf-profile.children.cycles-
> pp.start_secondary
> 11.47 -1.0 10.43 perf-profile.children.cycles-
> pp.secondary_startup_64_no_verify
> 11.47 -1.0 10.43 perf-profile.children.cycles-
> pp.cpu_startup_entry
> 11.45 -1.0 10.41 perf-profile.children.cycles-pp.do_idle
> 53.93 -1.0 52.92 perf-profile.children.cycles-
> pp.sendfile_tcp_stream
> 3.85 -0.9 2.91 perf-profile.children.cycles-pp.tcp_ack
> 10.07 -0.9 9.13 perf-profile.children.cycles-
> pp.cpuidle_idle_call
> 9.19 -0.9 8.34 perf-profile.children.cycles-pp.cpuidle_enter
> 9.13 -0.8 8.28 perf-profile.children.cycles-
> pp.cpuidle_enter_state
> 2.87 -0.8 2.03 perf-profile.children.cycles-
> pp.tcp_clean_rtx_queue
> 8.84 -0.8 8.02 perf-profile.children.cycles-pp.acpi_safe_halt
> 8.87 -0.8 8.04 perf-profile.children.cycles-pp.acpi_idle_enter
> 7.77 -0.7 7.11 perf-profile.children.cycles-
> pp.asm_sysvec_call_function_single
> 9.79 -0.6 9.14 perf-profile.children.cycles-
> pp.__tcp_push_pending_frames
> 0.75 ± 4% -0.6 0.14 ± 3% perf-profile.children.cycles-
> pp.mem_cgroup_uncharge_skmem
> 3.07 -0.4 2.63 perf-profile.children.cycles-pp.__schedule
> 3.09 -0.4 2.67 perf-profile.children.cycles-pp.sk_wait_data
> 2.47 -0.4 2.08 perf-profile.children.cycles-pp.wait_woken
> 2.25 -0.4 1.87 perf-profile.children.cycles-
> pp.schedule_timeout
> 2.20 -0.4 1.83 perf-profile.children.cycles-pp.schedule
> 1.10 ± 2% -0.3 0.82 ± 3% perf-profile.children.cycles-
> pp.pick_next_task_fair
> 0.73 ± 4% -0.2 0.48 ± 6% perf-profile.children.cycles-
> pp.newidle_balance
> 0.28 ± 12% -0.2 0.09 ± 5% perf-profile.children.cycles-
> pp.cgroup_rstat_updated
> 2.39 -0.1 2.28 perf-profile.children.cycles-
> pp.sysvec_call_function_single
> 0.30 ± 4% -0.1 0.20 ± 5% perf-profile.children.cycles-
> pp.load_balance
> 1.01 -0.1 0.93 perf-profile.children.cycles-pp.schedule_idle
> 1.68 -0.1 1.60 perf-profile.children.cycles-
> pp.sock_def_readable
> 1.82 -0.1 1.74 perf-profile.children.cycles-
> pp.__sysvec_call_function_single
> 0.22 ± 5% -0.1 0.14 ± 5% perf-profile.children.cycles-
> pp.find_busiest_group
> 1.51 -0.1 1.44 perf-profile.children.cycles-
> pp.__wake_up_common_lock
> 0.20 ± 6% -0.1 0.13 ± 5% perf-profile.children.cycles-
> pp.update_sd_lb_stats
> 1.27 -0.1 1.21 perf-profile.children.cycles-
> pp.try_to_wake_up
> 1.43 -0.1 1.37 perf-profile.children.cycles-
> pp.__wake_up_common
> 0.14 ± 3% -0.1 0.09 ± 7% perf-profile.children.cycles-
> pp.update_blocked_averages
> 0.61 -0.1 0.56 perf-profile.children.cycles-pp.menu_select
> 0.70 -0.1 0.64 perf-profile.children.cycles-
> pp.dequeue_task_fair
> 0.15 ± 4% -0.1 0.10 ± 5% perf-profile.children.cycles-
> pp.update_sg_lb_stats
> 0.63 -0.0 0.58 perf-profile.children.cycles-
> pp.dequeue_entity
> 1.25 -0.0 1.20 perf-profile.children.cycles-
> pp.sched_ttwu_pending
> 0.24 ± 2% -0.0 0.19 ± 3% perf-profile.children.cycles-
> pp.tcp_check_space
> 0.98 -0.0 0.94 perf-profile.children.cycles-
> pp.ttwu_do_activate
> 0.06 -0.0 0.02 ± 99% perf-profile.children.cycles-
> pp.irqentry_exit
> 0.30 -0.0 0.27 ± 2% perf-profile.children.cycles-
> pp.native_irq_return_iret
> 0.52 -0.0 0.48 perf-profile.children.cycles-
> pp.ttwu_queue_wakelist
> 0.43 -0.0 0.40 perf-profile.children.cycles-
> pp.native_sched_clock
> 0.08 ± 5% -0.0 0.06 ± 9% perf-profile.children.cycles-
> pp.raw_spin_rq_lock_nested
> 0.22 -0.0 0.20 ± 2% perf-profile.children.cycles-
> pp.__switch_to_asm
> 0.48 -0.0 0.45 perf-profile.children.cycles-
> pp.sched_clock_cpu
> 0.27 -0.0 0.24 perf-profile.children.cycles-pp.__switch_to
> 0.21 ± 2% -0.0 0.18 ± 4% perf-profile.children.cycles-
> pp.___perf_sw_event
> 0.11 ± 3% -0.0 0.09 perf-profile.children.cycles-
> pp.ct_kernel_exit_state
> 0.19 ± 2% -0.0 0.17 ± 2% perf-profile.children.cycles-
> pp.native_apic_msr_eoi_write
> 0.29 -0.0 0.27 perf-profile.children.cycles-pp.update_curr
> 0.06 -0.0 0.04 ± 44% perf-profile.children.cycles-
> pp.update_irq_load_avg
> 0.14 ± 2% -0.0 0.12 perf-profile.children.cycles-
> pp.update_rq_clock_task
> 0.11 ± 4% -0.0 0.09 ± 7% perf-profile.children.cycles-
> pp.resched_curr
> 0.13 ± 4% -0.0 0.11 ± 4% perf-profile.children.cycles-
> pp.check_preempt_curr
> 0.17 ± 2% -0.0 0.15 ± 2% perf-profile.children.cycles-
> pp.__x2apic_send_IPI_dest
> 0.17 ± 2% -0.0 0.15 ± 3% perf-profile.children.cycles-
> pp.__update_load_avg_se
> 0.12 ± 4% -0.0 0.10 ± 3% perf-profile.children.cycles-
> pp.finish_task_switch
> 0.25 -0.0 0.23 ± 2% perf-profile.children.cycles-
> pp.set_next_entity
> 0.09 -0.0 0.08 perf-profile.children.cycles-
> pp.__wrgsbase_inactive
> 0.06 -0.0 0.05 perf-profile.children.cycles-pp.ct_idle_exit
> 0.10 +0.0 0.11 perf-profile.children.cycles-
> pp.tcp_chrono_stop
> 0.07 ± 5% +0.0 0.08 perf-profile.children.cycles-pp.rb_next
> 0.05 ± 7% +0.0 0.06 ± 7% perf-profile.children.cycles-
> pp.__fdget
> 0.08 ± 5% +0.0 0.09 ± 4% perf-profile.children.cycles-
> pp.tcp_rearm_rto
> 0.06 ± 8% +0.0 0.07 perf-profile.children.cycles-pp.rb_first
> 1.08 +0.0 1.10 perf-profile.children.cycles-
> pp.dev_hard_start_xmit
> 0.11 ± 4% +0.0 0.13 ± 2% perf-profile.children.cycles-
> pp.inet_ehashfn
> 0.07 ± 6% +0.0 0.09 ± 4% perf-profile.children.cycles-
> pp.demo_interval_tick
> 0.12 ± 3% +0.0 0.14 ± 3% perf-profile.children.cycles-
> pp.netif_skb_features
> 0.28 ± 2% +0.0 0.30 perf-profile.children.cycles-
> pp.ip_local_out
> 0.09 +0.0 0.10 ± 4% perf-profile.children.cycles-
> pp.tcp_queue_rcv
> 0.05 +0.0 0.06 ± 7% perf-profile.children.cycles-
> pp.__tcp_ack_snd_check
> 0.16 ± 3% +0.0 0.18 ± 2% perf-profile.children.cycles-
> pp.ip_send_check
> 0.07 ± 7% +0.0 0.08 ± 4% perf-profile.children.cycles-
> pp.tcp_rtt_estimator
> 0.06 ± 8% +0.0 0.07 ± 5% perf-profile.children.cycles-
> pp.iov_iter_pipe
> 0.24 ± 3% +0.0 0.26 perf-profile.children.cycles-
> pp.tcp_rcv_space_adjust
> 0.25 +0.0 0.26 perf-profile.children.cycles-
> pp.__update_load_avg_cfs_rq
> 0.15 +0.0 0.17 ± 5% perf-profile.children.cycles-
> pp.ipv4_dst_check
> 0.06 ± 7% +0.0 0.08 ± 5% perf-profile.children.cycles-
> pp.splice_from_pipe_next
> 0.12 ± 3% +0.0 0.14 ± 3% perf-profile.children.cycles-
> pp.tcp_update_skb_after_send
> 0.60 +0.0 0.62 perf-profile.children.cycles-
> pp._raw_spin_lock_irqsave
> 0.08 +0.0 0.10 perf-profile.children.cycles-
> pp.__list_add_valid
> 0.11 ± 3% +0.0 0.13 ± 2% perf-profile.children.cycles-
> pp.__get_task_ioprio
> 0.36 +0.0 0.38 perf-profile.children.cycles-
> pp.enqueue_to_backlog
> 0.11 +0.0 0.13 ± 2% perf-profile.children.cycles-
> pp.syscall_enter_from_user_mode
> 0.12 +0.0 0.14 ± 2% perf-profile.children.cycles-pp.tcp_push
> 0.21 +0.0 0.23 ± 3% perf-profile.children.cycles-
> pp.exit_to_user_mode_prepare
> 0.10 ± 5% +0.0 0.12 ± 5% perf-profile.children.cycles-
> pp.xas_start
> 0.10 +0.0 0.12 ± 3% perf-profile.children.cycles-
> pp.tcp_update_pacing_rate
> 0.06 +0.0 0.08 ± 5% perf-profile.children.cycles-
> pp.tcp_event_data_recv
> 0.12 ± 4% +0.0 0.14 perf-profile.children.cycles-
> pp.tcp_downgrade_zcopy_pure
> 0.17 ± 3% +0.0 0.20 ± 2% perf-profile.children.cycles-
> pp.syscall_exit_to_user_mode_prepare
> 0.20 ± 2% +0.0 0.22 ± 4% perf-profile.children.cycles-
> pp.sockfd_lookup_light
> 0.10 ± 4% +0.0 0.13 perf-profile.children.cycles-
> pp.is_vmalloc_addr
> 0.10 ± 4% +0.0 0.13 ± 6% perf-profile.children.cycles-
> pp.make_vfsgid
> 0.10 ± 3% +0.0 0.13 ± 2% perf-profile.children.cycles-
> pp.make_vfsuid
> 0.39 +0.0 0.42 perf-profile.children.cycles-
> pp.netif_rx_internal
> 0.28 ± 2% +0.0 0.30 ± 3% perf-profile.children.cycles-
> pp.recv_tcp_stream
> 0.13 +0.0 0.16 ± 4% perf-profile.children.cycles-
> pp.check_stack_object
> 0.13 ± 3% +0.0 0.16 ± 2% perf-profile.children.cycles-
> pp.tcp_release_cb
> 0.12 ± 3% +0.0 0.15 ± 2% perf-profile.children.cycles-
> pp.demo_stream_interval
> 0.26 ± 2% +0.0 0.29 ± 2% perf-profile.children.cycles-
> pp.tcp_add_backlog
> 0.11 ± 3% +0.0 0.14 ± 2% perf-profile.children.cycles-
> pp.tcp_ack_update_rtt
> 0.21 +0.0 0.24 ± 2% perf-profile.children.cycles-
> pp.ip_rcv_core
> 0.18 ± 2% +0.0 0.21 ± 3% perf-profile.children.cycles-
> pp.__sk_dst_check
> 0.07 +0.0 0.10 ± 3% perf-profile.children.cycles-
> pp.__tcp_cleanup_rbuf
> 0.41 +0.0 0.44 perf-profile.children.cycles-pp.__netif_rx
> 0.17 ± 2% +0.0 0.20 perf-profile.children.cycles-
> pp.__tcp_select_window
> 0.14 ± 3% +0.0 0.17 ± 3% perf-profile.children.cycles-
> pp.tcp_mtu_probe
> 0.34 +0.0 0.37 perf-profile.children.cycles-
> pp.kmalloc_reserve
> 0.09 ± 4% +0.0 0.12 ± 4% perf-profile.children.cycles-
> pp.lock_timer_base
> 0.17 ± 2% +0.0 0.21 perf-profile.children.cycles-
> pp.tcp_tx_timestamp
> 0.15 +0.0 0.19 ± 3% perf-profile.children.cycles-
> pp.folio_mark_accessed
> 0.20 ± 2% +0.0 0.24 perf-profile.children.cycles-
> pp._raw_spin_unlock_bh
> 0.40 +0.0 0.44 perf-profile.children.cycles-
> pp.tcp_mstamp_refresh
> 0.15 ± 3% +0.0 0.20 ± 5% perf-profile.children.cycles-
> pp.inet_send_prepare
> 0.37 +0.0 0.42 perf-profile.children.cycles-pp.__skb_clone
> 0.14 ± 2% +0.0 0.18 ± 5% perf-profile.children.cycles-
> pp.ktime_get_coarse_real_ts64
> 0.27 +0.0 0.32 perf-profile.children.cycles-
> pp.validate_xmit_skb
> 0.18 +0.0 0.22 ± 2% perf-profile.children.cycles-
> pp.fsnotify_perm
> 0.17 ± 3% +0.0 0.22 ± 2% perf-profile.children.cycles-
> pp.skb_clone
> 0.19 ± 2% +0.0 0.24 ± 2% perf-profile.children.cycles-
> pp.rw_verify_area
> 0.23 ± 2% +0.1 0.28 ± 2% perf-profile.children.cycles-
> pp.xas_load
> 0.00 +0.1 0.05 ± 8% perf-profile.children.cycles-
> pp.tcp_rbtree_insert
> 0.28 ± 2% +0.1 0.34 perf-profile.children.cycles-
> pp.tcp_schedule_loss_probe
> 0.24 +0.1 0.30 perf-profile.children.cycles-pp.sanity
> 0.32 ± 2% +0.1 0.38 ± 2% perf-profile.children.cycles-
> pp.dst_release
> 0.58 +0.1 0.65 perf-profile.children.cycles-
> pp.kmem_cache_alloc_node
> 0.31 +0.1 0.37 perf-profile.children.cycles-
> pp.syscall_return_via_sysret
> 0.24 +0.1 0.31 perf-profile.children.cycles-pp.tcp_tso_segs
> 0.25 +0.1 0.32 perf-profile.children.cycles-
> pp.copy_page_to_iter
> 0.48 +0.1 0.55 perf-profile.children.cycles-
> pp._raw_spin_lock
> 0.28 +0.1 0.35 perf-profile.children.cycles-pp.rcu_all_qs
> 0.32 ± 4% +0.1 0.39 ± 2% perf-profile.children.cycles-
> pp.sock_put
> 0.50 +0.1 0.57 perf-profile.children.cycles-
> pp.kmem_cache_free
> 0.34 ± 2% +0.1 0.42 perf-profile.children.cycles-
> pp.__put_user_8
> 0.29 ± 2% +0.1 0.37 ± 2% perf-profile.children.cycles-
> pp.aa_file_perm
> 0.49 +0.1 0.57 perf-profile.children.cycles-
> pp.syscall_exit_to_user_mode
> 0.69 +0.1 0.77 perf-profile.children.cycles-pp.read_tsc
> 0.16 ± 4% +0.1 0.25 ± 4% perf-profile.children.cycles-
> pp.skb_release_head_state
> 0.38 +0.1 0.47 perf-profile.children.cycles-
> pp.tcp_established_options
> 0.41 +0.1 0.50 ± 3% perf-profile.children.cycles-
> pp.__virt_addr_valid
> 0.48 +0.1 0.58 perf-profile.children.cycles-
> pp.__tcp_send_ack
> 0.42 +0.1 0.52 perf-profile.children.cycles-
> pp.entry_SYSRETQ_unsafe_stack
> 0.99 +0.1 1.09 perf-profile.children.cycles-pp.__alloc_skb
> 0.52 +0.1 0.64 perf-profile.children.cycles-
> pp.netperf_sendfile
> 0.43 +0.1 0.55 perf-profile.children.cycles-pp.__mod_timer
> 0.46 +0.1 0.58 perf-profile.children.cycles-
> pp.tcp_event_new_data_sent
> 0.80 +0.1 0.92 perf-profile.children.cycles-
> pp.tcp_stream_alloc_skb
> 0.47 +0.1 0.60 perf-profile.children.cycles-pp.sk_reset_timer
> 0.46 ± 2% +0.1 0.58 ± 2% perf-profile.children.cycles-
> pp.current_time
> 0.51 +0.1 0.64 perf-profile.children.cycles-
> pp.__fsnotify_parent
> 0.59 +0.1 0.73 perf-profile.children.cycles-
> pp.__entry_text_start
> 0.54 +0.1 0.68 perf-profile.children.cycles-
> pp._copy_from_user
> 0.41 +0.1 0.54 perf-profile.children.cycles-
> pp.tcp_rate_check_app_limited
> 0.39 ± 2% +0.1 0.52 perf-profile.children.cycles-
> pp.page_cache_pipe_buf_confirm
> 0.62 +0.1 0.76 perf-profile.children.cycles-pp.__fget_light
> 0.79 +0.1 0.94 ± 2% perf-profile.children.cycles-
> pp.page_cache_pipe_buf_release
> 1.02 +0.2 1.19 ± 4% perf-profile.children.cycles-pp.ktime_get
> 0.97 +0.2 1.14 perf-profile.children.cycles-
> pp.napi_consume_skb
> 0.78 +0.2 0.95 perf-profile.children.cycles-
> pp.__cond_resched
> 1.14 +0.2 1.32 perf-profile.children.cycles-
> pp.tcp_current_mss
> 0.74 +0.2 0.93 perf-profile.children.cycles-pp.do_splice_to
> 0.76 +0.2 0.96 perf-profile.children.cycles-pp.__kfree_skb
> 0.94 ± 2% +0.3 1.19 perf-profile.children.cycles-
> pp.apparmor_file_permission
> 1.38 +0.3 1.65 perf-profile.children.cycles-pp.tcp_send_mss
> 1.09 ± 2% +0.3 1.36 perf-profile.children.cycles-
> pp.atime_needs_update
> 1.44 +0.3 1.72 perf-profile.children.cycles-
> pp.skb_release_data
> 15.09 +0.3 15.40 perf-profile.children.cycles-
> pp.net_rx_action
> 1.20 +0.3 1.51 perf-profile.children.cycles-
> pp.security_file_permission
> 1.34 +0.3 1.67 perf-profile.children.cycles-pp.touch_atime
> 1.35 +0.3 1.69 perf-profile.children.cycles-
> pp.copy_page_to_iter_pipe
> 15.73 +0.3 16.08 perf-profile.children.cycles-pp.__do_softirq
> 17.54 +0.4 17.89 perf-profile.children.cycles-
> pp.__dev_queue_xmit
> 15.84 +0.4 16.20 perf-profile.children.cycles-pp.do_softirq
> 17.91 +0.4 18.27 perf-profile.children.cycles-
> pp.ip_finish_output2
> 18.82 +0.4 19.20 perf-profile.children.cycles-
> pp.__ip_queue_xmit
> 20.03 +0.4 20.41 perf-profile.children.cycles-
> pp.__tcp_transmit_skb
> 84.38 +0.4 84.81 perf-profile.children.cycles-
> pp.do_syscall_64
> 16.35 +0.5 16.81 perf-profile.children.cycles-
> pp.__local_bh_enable_ip
> 84.91 +0.5 85.42 perf-profile.children.cycles-
> pp.entry_SYSCALL_64_after_hwframe
> 4.52 +0.7 5.21 perf-profile.children.cycles-
> pp.native_queued_spin_lock_slowpath
> 2.68 +0.9 3.62 ± 3% perf-profile.children.cycles-
> pp.check_heap_object
> 5.69 +1.0 6.64 perf-profile.children.cycles-
> pp.lock_sock_nested
> 6.77 +1.0 7.80 perf-profile.children.cycles-
> pp._raw_spin_lock_bh
> 3.19 +1.1 4.26 ± 2% perf-profile.children.cycles-
> pp.__check_object_size
> 2.94 +1.1 4.01 perf-profile.children.cycles-
> pp.filemap_get_read_batch
> 3.29 +1.1 4.38 ± 2% perf-profile.children.cycles-
> pp.simple_copy_to_iter
> 3.16 +1.1 4.29 perf-profile.children.cycles-
> pp.filemap_get_pages
> 6.68 +1.7 8.36 perf-profile.children.cycles-pp.copyout
> 7.06 +1.8 8.85 perf-profile.children.cycles-pp._copy_to_iter
> 31.77 +1.9 33.68 perf-profile.children.cycles-pp.tcp_recvmsg
> 31.86 +1.9 33.78 perf-profile.children.cycles-pp.inet_recvmsg
> 32.07 +1.9 34.01 perf-profile.children.cycles-
> pp.sock_recvmsg
> 32.56 +2.0 34.56 perf-profile.children.cycles-
> pp.__sys_recvfrom
> 32.65 +2.0 34.65 perf-profile.children.cycles-
> pp.__x64_sys_recvfrom
> 33.95 +2.0 35.97 perf-profile.children.cycles-pp.recv
> 6.63 +2.0 8.66 perf-profile.children.cycles-pp.filemap_read
> 34.18 +2.0 36.23 perf-profile.children.cycles-
> pp.accept_connections
> 34.18 +2.0 36.23 perf-profile.children.cycles-
> pp.accept_connection
> 34.18 +2.0 36.23 perf-profile.children.cycles-pp.spawn_child
> 34.18 +2.0 36.23 perf-profile.children.cycles-
> pp.process_requests
> 7.51 +2.2 9.74 perf-profile.children.cycles-
> pp.generic_file_splice_read
> 11.31 +3.2 14.48 perf-profile.children.cycles-
> pp.skb_copy_datagram_iter
> 11.29 +3.2 14.46 perf-profile.children.cycles-
> pp.__skb_datagram_iter
> 29.29 +3.2 32.51 perf-profile.children.cycles-
> pp.tcp_recvmsg_locked
> 3.33 ± 3% -3.0 0.32 ± 2% perf-profile.self.cycles-
> pp.mem_cgroup_charge_skmem
> 2.89 -2.6 0.29 perf-profile.self.cycles-
> pp.__sk_mem_raise_allocated
> 1.72 ± 4% -1.5 0.18 ± 3% perf-profile.self.cycles-
> pp.try_charge_memcg
> 1.38 ± 5% -1.3 0.05 ± 7% perf-profile.self.cycles-
> pp.page_counter_try_charge
> 1.10 ± 4% -1.1 0.04 ± 44% perf-profile.self.cycles-
> pp.page_counter_uncharge
> 5.95 -0.6 5.30 perf-profile.self.cycles-pp.acpi_safe_halt
> 0.69 ± 4% -0.6 0.12 ± 3% perf-profile.self.cycles-
> pp.mem_cgroup_uncharge_skmem
> 0.64 ± 2% -0.5 0.16 ± 3% perf-profile.self.cycles-
> pp.__sk_mem_reduce_allocated
> 0.25 ± 13% -0.2 0.08 ± 8% perf-profile.self.cycles-
> pp.cgroup_rstat_updated
> 0.27 ± 2% -0.2 0.10 ± 3% perf-profile.self.cycles-pp.refill_stock
> 0.67 ± 2% -0.1 0.55 perf-profile.self.cycles-pp.tcp_ack
> 0.15 ± 3% -0.1 0.05 perf-profile.self.cycles-
> pp.__sk_mem_schedule
> 0.28 ± 6% -0.1 0.20 ± 5% perf-profile.self.cycles-
> pp.newidle_balance
> 0.22 -0.0 0.17 ± 2% perf-profile.self.cycles-
> pp.tcp_check_space
> 0.24 ± 3% -0.0 0.20 ± 2% perf-profile.self.cycles-
> pp.enqueue_task_fair
> 0.12 ± 3% -0.0 0.08 ± 8% perf-profile.self.cycles-
> pp.update_sg_lb_stats
> 0.06 -0.0 0.02 ± 99% perf-profile.self.cycles-
> pp.update_irq_load_avg
> 0.30 -0.0 0.27 ± 2% perf-profile.self.cycles-
> pp.native_irq_return_iret
> 0.34 -0.0 0.30 ± 2% perf-profile.self.cycles-pp.__schedule
> 0.11 -0.0 0.08 ± 4% perf-profile.self.cycles-
> pp.ct_kernel_exit_state
> 0.22 ± 3% -0.0 0.19 perf-profile.self.cycles-
> pp.__switch_to_asm
> 0.41 -0.0 0.38 perf-profile.self.cycles-pp.native_sched_clock
> 0.19 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles-
> pp.native_apic_msr_eoi_write
> 0.22 ± 2% -0.0 0.19 perf-profile.self.cycles-pp.menu_select
> 0.26 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.__switch_to
> 0.22 ± 2% -0.0 0.20 ± 3% perf-profile.self.cycles-
> pp.loopback_xmit
> 0.18 ± 3% -0.0 0.16 ± 4% perf-profile.self.cycles-
> pp.___perf_sw_event
> 0.11 ± 4% -0.0 0.09 ± 7% perf-profile.self.cycles-
> pp.resched_curr
> 0.13 -0.0 0.11 ± 4% perf-profile.self.cycles-pp.do_idle
> 0.08 ± 6% -0.0 0.06 perf-profile.self.cycles-
> pp.pick_next_task_fair
> 0.17 ± 2% -0.0 0.15 ± 2% perf-profile.self.cycles-
> pp.__x2apic_send_IPI_dest
> 0.14 ± 3% -0.0 0.13 perf-profile.self.cycles-pp.__release_sock
> 0.15 ± 3% -0.0 0.13 ± 3% perf-profile.self.cycles-
> pp.__update_load_avg_se
> 0.10 ± 3% -0.0 0.09 perf-profile.self.cycles-pp.dequeue_entity
> 0.08 ± 4% -0.0 0.07 perf-profile.self.cycles-
> pp.cpuidle_idle_call
> 0.17 -0.0 0.16 ± 2% perf-profile.self.cycles-
> pp.sock_def_readable
> 0.07 -0.0 0.06 perf-profile.self.cycles-pp.cpuidle_enter_state
> 0.07 -0.0 0.06 perf-profile.self.cycles-pp.__sock_wfree
> 0.11 -0.0 0.10 perf-profile.self.cycles-
> pp.ttwu_queue_wakelist
> 0.10 -0.0 0.09 perf-profile.self.cycles-
> pp.asm_sysvec_call_function_single
> 0.09 -0.0 0.08 perf-profile.self.cycles-
> pp.update_rq_clock_task
> 0.09 -0.0 0.08 perf-profile.self.cycles-
> pp.__wrgsbase_inactive
> 0.08 -0.0 0.07 perf-profile.self.cycles-pp.finish_task_switch
> 0.06 -0.0 0.05 perf-profile.self.cycles-pp.cpuidle_enter
> 0.14 +0.0 0.15 perf-profile.self.cycles-
> pp.enqueue_to_backlog
> 0.07 +0.0 0.08 perf-profile.self.cycles-pp.tcp_v4_fill_cb
> 0.05 +0.0 0.06 perf-profile.self.cycles-pp.iov_iter_pipe
> 0.06 +0.0 0.07 perf-profile.self.cycles-pp.__sk_dst_check
> 0.06 +0.0 0.07 ± 5% perf-profile.self.cycles-
> pp.demo_interval_tick
> 0.06 +0.0 0.07 ± 5% perf-profile.self.cycles-pp.rb_next
> 0.07 ± 5% +0.0 0.08 perf-profile.self.cycles-pp.tcp_rearm_rto
> 0.12 ± 3% +0.0 0.13 perf-profile.self.cycles-pp.tcp_wfree
> 0.18 ± 2% +0.0 0.20 ± 3% perf-profile.self.cycles-
> pp.process_backlog
> 0.07 ± 5% +0.0 0.08 ± 5% perf-profile.self.cycles-
> pp.tcp_chrono_stop
> 0.05 +0.0 0.06 ± 7% perf-profile.self.cycles-
> pp.sk_filter_trim_cap
> 0.23 +0.0 0.24 perf-profile.self.cycles-
> pp.__update_load_avg_cfs_rq
> 0.06 ± 6% +0.0 0.07 ± 5% perf-profile.self.cycles-
> pp.splice_from_pipe_next
> 0.08 ± 5% +0.0 0.10 ± 6% perf-profile.self.cycles-
> pp.tcp_event_new_data_sent
> 0.11 ± 6% +0.0 0.13 ± 5% perf-profile.self.cycles-
> pp.exit_to_user_mode_prepare
> 0.15 ± 4% +0.0 0.16 ± 3% perf-profile.self.cycles-
> pp.syscall_exit_to_user_mode_prepare
> 0.10 ± 4% +0.0 0.12 perf-profile.self.cycles-pp.tcp_push
> 0.10 +0.0 0.12 ± 4% perf-profile.self.cycles-
> pp.direct_splice_actor
> 0.10 +0.0 0.12 ± 4% perf-profile.self.cycles-pp.inet_ehashfn
> 0.19 +0.0 0.21 ± 2% perf-profile.self.cycles-pp.recv
> 0.07 +0.0 0.09 ± 5% perf-profile.self.cycles-
> pp.demo_stream_interval
> 0.14 ± 3% +0.0 0.16 ± 4% perf-profile.self.cycles-
> pp.tcp_add_backlog
> 0.15 ± 2% +0.0 0.17 ± 3% perf-profile.self.cycles-
> pp.ip_send_check
> 0.14 +0.0 0.16 ± 5% perf-profile.self.cycles-pp.ipv4_dst_check
> 0.08 +0.0 0.10 ± 3% perf-profile.self.cycles-pp.make_vfsuid
> 0.06 +0.0 0.08 ± 4% perf-profile.self.cycles-
> pp.tcp_rtt_estimator
> 0.10 ± 4% +0.0 0.12 ± 4% perf-profile.self.cycles-
> pp.syscall_enter_from_user_mode
> 0.10 ± 4% +0.0 0.12 ± 4% perf-profile.self.cycles-
> pp.__get_task_ioprio
> 0.09 ± 4% +0.0 0.11 ± 4% perf-profile.self.cycles-
> pp.inet_recvmsg
> 0.10 ± 6% +0.0 0.12 perf-profile.self.cycles-
> pp.tcp_schedule_loss_probe
> 0.04 ± 50% +0.0 0.06 perf-profile.self.cycles-pp.rb_first
> 0.07 +0.0 0.09 perf-profile.self.cycles-pp.__list_add_valid
> 0.08 ± 5% +0.0 0.10 ± 6% perf-profile.self.cycles-pp.xas_start
> 0.08 ± 5% +0.0 0.10 ± 3% perf-profile.self.cycles-
> pp.make_vfsgid
> 0.06 ± 6% +0.0 0.08 ± 4% perf-profile.self.cycles-
> pp.tcp_event_data_recv
> 0.13 ± 3% +0.0 0.16 ± 3% perf-profile.self.cycles-
> pp.tcp_recvmsg
> 0.10 ± 4% +0.0 0.12 ± 4% perf-profile.self.cycles-
> pp.check_stack_object
> 0.08 ± 5% +0.0 0.10 perf-profile.self.cycles-
> pp.is_vmalloc_addr
> 0.09 ± 5% +0.0 0.11 ± 3% perf-profile.self.cycles-
> pp.tcp_downgrade_zcopy_pure
> 0.10 ± 3% +0.0 0.12 ± 4% perf-profile.self.cycles-
> pp.tcp_release_cb
> 0.22 ± 2% +0.0 0.24 perf-profile.self.cycles-
> pp.do_splice_direct
> 0.10 ± 7% +0.0 0.12 ± 3% perf-profile.self.cycles-
> pp.ip_protocol_deliver_rcu
> 0.10 ± 4% +0.0 0.12 ± 3% perf-profile.self.cycles-
> pp.tcp_update_pacing_rate
> 0.30 +0.0 0.32 ± 2% perf-profile.self.cycles-pp.__alloc_skb
> 0.58 +0.0 0.61 perf-profile.self.cycles-
> pp._raw_spin_lock_irqsave
> 0.23 ± 2% +0.0 0.26 ± 2% perf-profile.self.cycles-
> pp.recv_tcp_stream
> 0.13 +0.0 0.16 ± 3% perf-profile.self.cycles-
> pp._raw_spin_unlock_bh
> 0.10 ± 4% +0.0 0.13 ± 8% perf-profile.self.cycles-
> pp.inet_send_prepare
> 0.20 +0.0 0.23 ± 3% perf-profile.self.cycles-pp.ip_rcv_core
> 0.13 ± 3% +0.0 0.15 ± 6% perf-profile.self.cycles-
> pp.tcp_mtu_probe
> 0.13 ± 3% +0.0 0.16 ± 2% perf-profile.self.cycles-pp.xas_load
> 0.06 ± 7% +0.0 0.09 ± 5% perf-profile.self.cycles-
> pp.__tcp_cleanup_rbuf
> 0.13 ± 3% +0.0 0.16 ± 2% perf-profile.self.cycles-
> pp.validate_xmit_skb
> 0.12 +0.0 0.15 ± 3% perf-profile.self.cycles-
> pp.folio_mark_accessed
> 0.16 +0.0 0.19 ± 3% perf-profile.self.cycles-
> pp.__tcp_select_window
> 0.27 +0.0 0.30 perf-profile.self.cycles-pp.__sys_recvfrom
> 0.15 ± 2% +0.0 0.18 ± 2% perf-profile.self.cycles-
> pp.tcp_tx_timestamp
> 0.15 ± 2% +0.0 0.18 ± 2% perf-profile.self.cycles-
> pp.do_splice_to
> 0.31 +0.0 0.34 perf-profile.self.cycles-pp.__skb_clone
> 0.12 +0.0 0.15 ± 3% perf-profile.self.cycles-
> pp.simple_copy_to_iter
> 0.14 ± 3% +0.0 0.18 ± 2% perf-profile.self.cycles-
> pp.rw_verify_area
> 0.18 +0.0 0.22 ± 2% perf-profile.self.cycles-pp.sock_sendpage
> 0.12 ± 3% +0.0 0.16 perf-profile.self.cycles-
> pp.syscall_exit_to_user_mode
> 0.24 +0.0 0.28 ± 2% perf-profile.self.cycles-pp.__mod_timer
> 0.09 ± 5% +0.0 0.12 ± 6% perf-profile.self.cycles-
> pp.__tcp_send_ack
> 0.16 +0.0 0.20 perf-profile.self.cycles-pp.fsnotify_perm
> 0.11 +0.0 0.15 ± 7% perf-profile.self.cycles-
> pp.ktime_get_coarse_real_ts64
> 0.17 ± 2% +0.0 0.21 perf-profile.self.cycles-pp.skb_clone
> 0.20 ± 2% +0.0 0.24 ± 5% perf-profile.self.cycles-
> pp.__entry_text_start
> 0.39 ± 2% +0.0 0.44 perf-profile.self.cycles-
> pp.__ip_queue_xmit
> 0.17 ± 2% +0.0 0.22 ± 3% perf-profile.self.cycles-
> pp.generic_file_splice_read
> 0.18 ± 3% +0.0 0.23 ± 6% perf-profile.self.cycles-
> pp.lock_sock_nested
> 0.47 ± 2% +0.0 0.52 ± 2% perf-profile.self.cycles-
> pp.tcp_recvmsg_locked
> 0.22 ± 2% +0.0 0.27 perf-profile.self.cycles-
> pp.filemap_get_pages
> 0.00 +0.1 0.05 perf-profile.self.cycles-pp.tcp_options_write
> 0.00 +0.1 0.05 perf-profile.self.cycles-pp.tcp_rbtree_insert
> 0.00 +0.1 0.05 perf-profile.self.cycles-
> pp.skb_network_protocol
> 0.20 +0.1 0.25 perf-profile.self.cycles-pp.rcu_all_qs
> 0.00 +0.1 0.05 ± 7% perf-profile.self.cycles-
> pp.__tcp_ack_snd_check
> 0.43 +0.1 0.48 perf-profile.self.cycles-pp._raw_spin_lock
> 0.25 +0.1 0.30 ± 2% perf-profile.self.cycles-pp.touch_atime
> 0.46 +0.1 0.52 ± 2% perf-profile.self.cycles-pp.net_rx_action
> 0.16 ± 2% +0.1 0.22 ± 3% perf-profile.self.cycles-
> pp.tcp_stream_alloc_skb
> 0.23 ± 2% +0.1 0.28 perf-profile.self.cycles-
> pp.copy_page_to_iter
> 0.33 ± 2% +0.1 0.39 perf-profile.self.cycles-
> pp.splice_direct_to_actor
> 0.31 +0.1 0.37 ± 2% perf-profile.self.cycles-pp.dst_release
> 0.21 +0.1 0.27 perf-profile.self.cycles-pp.sanity
> 0.26 +0.1 0.32 perf-profile.self.cycles-
> pp.security_file_permission
> 0.25 ± 2% +0.1 0.31 perf-profile.self.cycles-pp.aa_file_perm
> 1.04 +0.1 1.11 ± 3% perf-profile.self.cycles-pp.do_sendfile
> 0.43 ± 2% +0.1 0.50 perf-profile.self.cycles-
> pp.kmem_cache_alloc_node
> 0.40 +0.1 0.47 perf-profile.self.cycles-pp.do_syscall_64
> 0.30 ± 2% +0.1 0.37 ± 2% perf-profile.self.cycles-
> pp.syscall_return_via_sysret
> 0.31 ± 4% +0.1 0.38 ± 2% perf-profile.self.cycles-pp.sock_put
> 0.49 +0.1 0.56 perf-profile.self.cycles-pp.kmem_cache_free
> 0.22 +0.1 0.29 perf-profile.self.cycles-pp.tcp_tso_segs
> 0.64 +0.1 0.71 perf-profile.self.cycles-pp.tcp_v4_rcv
> 0.34 ± 2% +0.1 0.41 perf-profile.self.cycles-
> pp.generic_splice_sendpage
> 0.32 ± 2% +0.1 0.39 perf-profile.self.cycles-
> pp.kernel_sendpage
> 0.33 +0.1 0.40 perf-profile.self.cycles-pp.__put_user_8
> 0.57 +0.1 0.64 perf-profile.self.cycles-
> pp.tcp_clean_rtx_queue
> 0.34 +0.1 0.42 perf-profile.self.cycles-pp.inet_sendpage
> 0.67 +0.1 0.74 perf-profile.self.cycles-pp.read_tsc
> 0.34 +0.1 0.42 perf-profile.self.cycles-
> pp.tcp_established_options
> 0.33 +0.1 0.41 perf-profile.self.cycles-pp.pipe_to_sendpage
> 0.31 +0.1 0.38 perf-profile.self.cycles-pp.tcp_send_mss
> 0.32 ± 3% +0.1 0.40 perf-profile.self.cycles-pp.current_time
> 0.33 ± 2% +0.1 0.42 ± 12% perf-profile.self.cycles-pp.ktime_get
> 0.36 ± 2% +0.1 0.45 ± 2% perf-profile.self.cycles-
> pp.release_sock
> 0.38 +0.1 0.46 ± 2% perf-profile.self.cycles-
> pp.__virt_addr_valid
> 0.55 +0.1 0.64 perf-profile.self.cycles-
> pp.entry_SYSCALL_64_after_hwframe
> 0.71 +0.1 0.80 perf-profile.self.cycles-pp.__dev_queue_xmit
> 0.41 +0.1 0.50 perf-profile.self.cycles-
> pp.entry_SYSRETQ_unsafe_stack
> 0.45 +0.1 0.54 perf-profile.self.cycles-
> pp.__local_bh_enable_ip
> 0.64 +0.1 0.74 perf-profile.self.cycles-
> pp.tcp_rcv_established
> 0.41 +0.1 0.50 perf-profile.self.cycles-
> pp.__check_object_size
> 0.46 +0.1 0.56 perf-profile.self.cycles-pp.netperf_sendfile
> 0.47 +0.1 0.57 perf-profile.self.cycles-pp.__cond_resched
> 0.39 +0.1 0.49 perf-profile.self.cycles-pp._copy_to_iter
> 0.51 +0.1 0.62 ± 2% perf-profile.self.cycles-
> pp.sendfile_tcp_stream
> 0.42 +0.1 0.53 perf-profile.self.cycles-pp.sendfile
> 0.48 ± 2% +0.1 0.58 perf-profile.self.cycles-
> pp.atime_needs_update
> 0.46 +0.1 0.57 perf-profile.self.cycles-pp.tcp_current_mss
> 0.49 ± 2% +0.1 0.60 perf-profile.self.cycles-pp.tcp_sendpage
> 0.95 +0.1 1.06 ± 2% perf-profile.self.cycles-
> pp.__tcp_transmit_skb
> 0.35 +0.1 0.48 perf-profile.self.cycles-
> pp.tcp_rate_check_app_limited
> 0.47 +0.1 0.60 perf-profile.self.cycles-pp.__fsnotify_parent
> 0.60 +0.1 0.74 perf-profile.self.cycles-pp.__fget_light
> 0.36 +0.1 0.49 perf-profile.self.cycles-
> pp.page_cache_pipe_buf_confirm
> 0.77 +0.1 0.90 ± 2% perf-profile.self.cycles-
> pp.page_cache_pipe_buf_release
> 0.53 +0.1 0.66 perf-profile.self.cycles-pp._copy_from_user
> 0.71 +0.2 0.86 perf-profile.self.cycles-
> pp.__splice_from_pipe
> 0.65 ± 2% +0.2 0.83 perf-profile.self.cycles-
> pp.apparmor_file_permission
> 0.81 +0.2 1.00 perf-profile.self.cycles-pp.tcp_write_xmit
> 0.77 +0.2 0.97 perf-profile.self.cycles-pp.do_tcp_sendpages
> 0.81 +0.2 1.05 perf-profile.self.cycles-
> pp.__skb_datagram_iter
> 1.00 +0.2 1.25 perf-profile.self.cycles-pp.skb_release_data
> 1.11 +0.3 1.38 perf-profile.self.cycles-
> pp.copy_page_to_iter_pipe
> 2.20 +0.3 2.52 perf-profile.self.cycles-pp._raw_spin_lock_bh
> 1.34 +0.4 1.69 perf-profile.self.cycles-pp.filemap_read
> 2.01 +0.4 2.40 perf-profile.self.cycles-pp.tcp_build_frag
> 4.49 +0.7 5.18 perf-profile.self.cycles-
> pp.native_queued_spin_lock_slowpath
> 2.17 ± 2% +0.8 2.99 ± 3% perf-profile.self.cycles-
> pp.check_heap_object
> 2.71 +1.0 3.73 perf-profile.self.cycles-
> pp.filemap_get_read_batch
> 6.63 +1.7 8.29 perf-profile.self.cycles-pp.copyout
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are
> provided
> for informational purposes only. Any difference in system hardware or
> software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
>
> >
> >
> > From 93b3b4c5f356a5090551519522cfd5740ae7e774 Mon Sep 17 00:00:00
> 2001
> > From: Shakeel Butt <shakeelb@...gle.com>
> > Date: Tue, 16 May 2023 20:30:26 +0000
> > Subject: [PATCH] memcg: skip stock refill in irq context
> >
> > The linux kernel processes incoming packets in softirq on a given CPU
> > and those packets may belong to different jobs. This is very normal on
> > large systems running multiple workloads. With memcg enabled, network
> > memory for such packets is charged to the corresponding memcgs of the
> > jobs.
> >
> > Memcg charging can be a costly operation and the memcg code
> implements
> > a per-cpu memcg charge caching optimization to reduce the cost of
> > charging. More specifically, the kernel charges the given memcg for more
> > memory than requested and keep the remaining charge in a local per-cpu
> > cache. The insight behind this heuristic is that there will be more
> > charge requests for that memcg in near future. This optimization works
> > well when a specific job runs on a CPU for long time and majority of the
> > charging requests happen in process context. However the kernel's
> > incoming packet processing does not work well with this optimization.
> >
> > Recently Cathy Zhang has shown [1] that memcg charge flushing within the
> > memcg charge path can become a performance bottleneck for the memcg
> > charging of network traffic.
> >
> > Perf profile:
> >
> > 8.98% mc-worker [kernel.vmlinux] [k] page_counter_cancel
> > |
> > --8.97%--page_counter_cancel
> > |
> > --8.97%--page_counter_uncharge
> > drain_stock
> > __refill_stock
> > refill_stock
> > |
> > --8.91%--try_charge_memcg
> > mem_cgroup_charge_skmem
> > |
> > --8.91%--__sk_mem_raise_allocated
> > __sk_mem_schedule
> > |
> > |--5.41%--
> tcp_try_rmem_schedule
> > | tcp_data_queue
> > | tcp_rcv_established
> > | tcp_v4_do_rcv
> > | tcp_v4_rcv
> >
> > The simplest way to solve this issue is to not refill the memcg charge
> > stock in the irq context. Since networking is the main source of memcg
> > charging in the irq context, other users will not be impacted. In
> > addition, this will preseve the memcg charge cache of the application
> > running on that CPU.
> >
> > There are also potential side effects. What if all the packets belong to
> > the same application and memcg? More specifically, users can use Receive
> > Flow Steering (RFS) to make sure the kernel process the packets of the
> > application on the CPU where the application is running. This change may
> > cause the kernel to do slowpath memcg charging more often in irq
> > context.
> >
> > Link:
> https://lore.kernel.org/all/IA0PR11MB73557DEAB912737FD61D2873FC749@
> IA0PR11MB7355.namprd11.prod.outlook.com [1]
> > Signed-off-by: Shakeel Butt <shakeelb@...gle.com>
> > ---
> > mm/memcontrol.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 5abffe6f8389..2635aae82b3e 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2652,6 +2652,14 @@ static int try_charge_memcg(struct
> mem_cgroup *memcg, gfp_t gfp_mask,
> > bool raised_max_event = false;
> > unsigned long pflags;
> >
> > + /*
> > + * Skip the refill in irq context as it may flush the charge cache of
> > + * the process running on the CPUs or the kernel may have to process
> > + * incoming packets for different memcgs.
> > + */
> > + if (!in_task())
> > + batch = nr_pages;
> > +
> > retry:
> > if (consume_stock(memcg, nr_pages))
> > return 0;
> > --
> > 2.40.1.606.ga4b1b128d6-goog
> >
Powered by blists - more mailing lists