[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <61298b77-f1e0-9fc8-aa79-9b48f31c6941@kernel.org>
Date: Tue, 18 Jul 2023 21:50:10 -0600
From: David Ahern <dsahern@...nel.org>
To: Yan Zhai <yan@...udflare.com>, Jakub Kicinski <kuba@...nel.org>
Cc: Ivan Babrou <ivan@...udflare.com>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
kernel-team <kernel-team@...udflare.com>, Eric Dumazet
<edumazet@...gle.com>, "David S. Miller" <davem@...emloft.net>,
Paolo Abeni <pabeni@...hat.com>, Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>
Subject: Re: Stacks leading into skb:kfree_skb
On 7/18/23 9:10 PM, Yan Zhai wrote:
> On Tue, Jul 18, 2023 at 5:36 PM Jakub Kicinski <kuba@...nel.org> wrote:
>>
>> On Fri, 14 Jul 2023 18:54:14 -0600 David Ahern wrote:
>>>> I made some aggregations for the stacks we see leading into
>>>> skb:kfree_skb endpoint. There's a lot of data that is not easily
>>>> digestible, so I lightly massaged the data and added flamegraphs in
>>>> addition to raw stack counts. Here's the gist link:
>>>>
>>>> * https://gist.github.com/bobrik/0e57671c732d9b13ac49fed85a2b2290
>>>
>>> I see a lot of packet_rcv as the tip before kfree_skb. How many packet
>>> sockets do you have running on that box? Can you accumulate the total
>>> packet_rcv -> kfree_skb_reasons into 1 count -- regardless of remaining
>>> stacktrace?
>>
>> On a quick look we have 3 branches which can get us to kfree_skb from
>> packet_rcv:
>>
>> if (skb->pkt_type == PACKET_LOOPBACK)
>> goto drop;
>> ...
>> if (!net_eq(dev_net(dev), sock_net(sk)))
>> goto drop;
>> ...
>> res = run_filter(skb, sk, snaplen);
>> if (!res)
>> goto drop_n_restore;
>>
>> I'd guess is the last one? Which we should mark with the SOCKET_FILTER
>> drop reason?
>
> So we have multiple packet socket consumers on our edge:
> * systemd-networkd: listens on ETH_P_LLDPD, which is the role model
> that does not do excessive things
ETH level means raw packet socket which means *all* packets are duplicated.
> * lldpd: I am not sure why we needed this one in presence of
> systemd-networkd, but it is running atm, which contributes to constant
> packet_rcv calls. It listens on ETH_P_ALL because of
> https://github.com/lldpd/lldpd/pull/414. But its filter is doing the
> correct work, so packets hitting this one is mostly "consumed"
This one I am familiar with and its filter -- the fact that the filter
applies *after* the clone means it still contributes to the packet load.
Together these 2 sockets might explain why the filter drop shows up in
packet_rcv.
>
> Now the bad kids:
> * arping: listens on ETH_P_ALL. This one contributes all the
> skb:kfree_skb spikes, and the reason is sk_rmem_alloc overflows
> rcvbuf. I suspect it is due to a poorly constructed filter so too many
> packets get queued too fast.
Any packet socket is the problem because the filter is applied to the
clone. Clone the packet, run the filter, kfree the packet.
> * conduit-watcher: a health checker, sending packets on ETH_P_IP in
> non-init netns. Majority of packet_rcv on this one goes to direct drop
> due to netns difference.
So this the raw packet socket at L3 that shows up. This one should not
be as large of a contributor to the increases packet load.
>
> So to conclude, it might be useful to set a reason for rcvbuf related
> drops at least. On the other hand, almost all packets entered
> packet_rcv are shared, so clone failure probably can also be a thing
> under memory pressure.
>
>
Powered by blists - more mailing lists