netdev - Re: [PATCH net-next 1/2] net: Keep sk->sk_forward

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALvZod7Y+SxiopRBXOf1HoDKO=Xh8CNPfgz3Etd4XOq5BPc5Ag@mail.gmail.com>
Date: Thu, 11 May 2023 09:23:50 -0700
From: Shakeel Butt <shakeelb@...gle.com>
To: "Zhang, Cathy" <cathy.zhang@...el.com>
Cc: Eric Dumazet <edumazet@...gle.com>, Linux MM <linux-mm@...ck.org>, 
	Cgroups <cgroups@...r.kernel.org>, Paolo Abeni <pabeni@...hat.com>, 
	"davem@...emloft.net" <davem@...emloft.net>, "kuba@...nel.org" <kuba@...nel.org>, 
	"Brandeburg, Jesse" <jesse.brandeburg@...el.com>, "Srinivas, Suresh" <suresh.srinivas@...el.com>, 
	"Chen, Tim C" <tim.c.chen@...el.com>, "You, Lizhen" <lizhen.you@...el.com>, 
	"eric.dumazet@...il.com" <eric.dumazet@...il.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper size

On Thu, May 11, 2023 at 2:27 AM Zhang, Cathy <cathy.zhang@...el.com> wrote:
>
>
>
[...]
>
> Here is the output with the command you paste, it's from system wide,
> I only show pieces of memcached records, and it seems to be a
> callee -> caller stack trace:
>
>      9.02%  mc-worker        [kernel.vmlinux]          [k] page_counter_try_charge
>             |
>              --9.00%--page_counter_try_charge
>                        |
>                         --9.00%--try_charge_memcg
>                                   mem_cgroup_charge_skmem
>                                   |
>                                    --9.00%--__sk_mem_raise_allocated
>                                              __sk_mem_schedule
>                                              |
>                                              |--5.32%--tcp_try_rmem_schedule
>                                              |          tcp_data_queue
>                                              |          tcp_rcv_established
>                                              |          tcp_v4_do_rcv
>                                              |          tcp_v4_rcv
>                                              |          ip_protocol_deliver_rcu
>                                              |          ip_local_deliver_finish
>                                              |          ip_local_deliver
>                                              |          ip_rcv
>                                              |          __netif_receive_skb_one_core
>                                              |          __netif_receive_skb
>                                              |          process_backlog
>                                              |          __napi_poll
>                                              |          net_rx_action
>                                              |          __do_softirq
>                                              |          |
>                                              |           --5.32%--do_softirq.part.0
>                                              |                     __local_bh_enable_ip
>                                              |                     __dev_queue_xmit
>                                              |                     ip_finish_output2
>                                              |                     __ip_finish_output
>                                              |                     ip_finish_output
>                                              |                     ip_output
>                                              |                     ip_local_out
>                                              |                     __ip_queue_xmit
>                                              |                     ip_queue_xmit
>                                              |                     __tcp_transmit_skb
>                                              |                     tcp_write_xmit
>                                              |                     __tcp_push_pending_frames
>                                              |                     tcp_push
>                                              |                     tcp_sendmsg_locked
>                                              |                     tcp_sendmsg
>                                              |                     inet_sendmsg
>                                              |                     sock_sendmsg
>                                              |                     ____sys_sendmsg
>
>      8.98%  mc-worker        [kernel.vmlinux]          [k] page_counter_cancel
>             |
>              --8.97%--page_counter_cancel
>                        |
>                         --8.97%--page_counter_uncharge
>                                   drain_stock
>                                   __refill_stock
>                                   refill_stock
>                                   |
>                                    --8.91%--try_charge_memcg
>                                              mem_cgroup_charge_skmem
>                                              |
>                                               --8.91%--__sk_mem_raise_allocated
>                                                         __sk_mem_schedule
>                                                         |
>                                                         |--5.41%--tcp_try_rmem_schedule
>                                                         |          tcp_data_queue
>                                                         |          tcp_rcv_established
>                                                         |          tcp_v4_do_rcv
>                                                         |          tcp_v4_rcv
>                                                         |          ip_protocol_deliver_rcu
>                                                         |          ip_local_deliver_finish
>                                                         |          ip_local_deliver
>                                                         |          ip_rcv
>                                                         |          __netif_receive_skb_one_core
>                                                         |          __netif_receive_skb
>                                                         |          process_backlog
>                                                         |          __napi_poll
>                                                         |          net_rx_action
>                                                         |          __do_softirq
>                                                         |          do_softirq.part.0
>                                                         |          __local_bh_enable_ip
>                                                         |          __dev_queue_xmit
>                                                         |          ip_finish_output2
>                                                         |          __ip_finish_output
>                                                         |          ip_finish_output
>                                                         |          ip_output
>                                                         |          ip_local_out
>                                                         |          __ip_queue_xmit
>                                                         |          ip_queue_xmit
>                                                         |          __tcp_transmit_skb
>                                                         |          tcp_write_xmit
>                                                         |          __tcp_push_pending_frames
>                                                         |          tcp_push
>                                                         |          tcp_sendmsg_locked
>                                                         |          tcp_sendmsg
>                                                         |          inet_sendmsg
>
>      8.78%  mc-worker        [kernel.vmlinux]          [k] try_charge_memcg
>             |
>              --8.77%--try_charge_memcg
>                        |
>                         --8.76%--mem_cgroup_charge_skmem
>                                   |
>                                    --8.76%--__sk_mem_raise_allocated
>                                              __sk_mem_schedule
>                                              |
>                                              |--5.21%--tcp_try_rmem_schedule
>                                              |          tcp_data_queue
>                                              |          tcp_rcv_established
>                                              |          tcp_v4_do_rcv
>                                              |          |
>                                              |           --5.21%--tcp_v4_rcv
>                                              |                     ip_protocol_deliver_rcu
>                                              |                     ip_local_deliver_finish
>                                              |                     ip_local_deliver
>                                              |                     ip_rcv
>                                              |                     __netif_receive_skb_one_core
>                                              |                     __netif_receive_skb
>                                              |                     process_backlog
>                                              |                     __napi_poll
>                                              |                     net_rx_action
>                                              |                     __do_softirq
>                                              |                     |
>                                              |                      --5.21%--do_softirq.part.0
>                                              |                                __local_bh_enable_ip
>                                              |                                __dev_queue_xmit
>                                              |                                ip_finish_output2
>                                              |                                __ip_finish_output
>                                              |                                ip_finish_output
>                                              |                                ip_output
>                                              |                                ip_local_out
>                                              |                                __ip_queue_xmit
>                                              |                                ip_queue_xmit
>                                              |                                __tcp_transmit_skb
>                                              |                                tcp_write_xmit
>                                              |                                __tcp_push_pending_frames
>                                              |                                tcp_push
>                                              |                                tcp_sendmsg_locked
>                                              |                                tcp_sendmsg
>                                              |                                inet_sendmsg
>                                              |                                sock_sendmsg
>                                              |                                ____sys_sendmsg
>                                              |                                ___sys_sendmsg
>                                              |                                __sys_sendmsg
>
>
> >


I am suspecting we are doing a lot of charging for a specific memcg on
one CPU (or a set of CPUs) and a lot of uncharging on the different
CPU (or a different set of CPUs) and thus both of these code paths are
hitting the slow path a lot.

Eric, I remember we have an optimization in the networking stack that
tries to free the memory on the same CPU where the allocation
happened. Is that optimization enabled for this code path? Or maybe we
should do something similar in memcg code (with the assumption that my
suspicion is correct).