netdev - Re: Stacks leading into skb:kfree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAO3-Pbqo_bfYsstH47hgqx7GC0CUg1H0xUaewq=MkUvb2BzCZA@mail.gmail.com>
Date: Tue, 18 Jul 2023 22:10:44 -0500
From: Yan Zhai <yan@...udflare.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: David Ahern <dsahern@...nel.org>, Ivan Babrou <ivan@...udflare.com>, 
	Linux Kernel Network Developers <netdev@...r.kernel.org>, kernel-team <kernel-team@...udflare.com>, 
	Eric Dumazet <edumazet@...gle.com>, "David S. Miller" <davem@...emloft.net>, 
	Paolo Abeni <pabeni@...hat.com>, Steven Rostedt <rostedt@...dmis.org>, 
	Masami Hiramatsu <mhiramat@...nel.org>, Willem de Bruijn <willemdebruijn.kernel@...il.com>
Subject: Re: Stacks leading into skb:kfree_skb

On Tue, Jul 18, 2023 at 5:36 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Fri, 14 Jul 2023 18:54:14 -0600 David Ahern wrote:
> > > I made some aggregations for the stacks we see leading into
> > > skb:kfree_skb endpoint. There's a lot of data that is not easily
> > > digestible, so I lightly massaged the data and added flamegraphs in
> > > addition to raw stack counts. Here's the gist link:
> > >
> > > * https://gist.github.com/bobrik/0e57671c732d9b13ac49fed85a2b2290
> >
> > I see a lot of packet_rcv as the tip before kfree_skb. How many packet
> > sockets do you have running on that box? Can you accumulate the total
> > packet_rcv -> kfree_skb_reasons into 1 count -- regardless of remaining
> > stacktrace?
>
> On a quick look we have 3 branches which can get us to kfree_skb from
> packet_rcv:
>
>         if (skb->pkt_type == PACKET_LOOPBACK)
>                 goto drop;
> ...
>         if (!net_eq(dev_net(dev), sock_net(sk)))
>                 goto drop;
> ...
>         res = run_filter(skb, sk, snaplen);
>         if (!res)
>                 goto drop_n_restore;
>
> I'd guess is the last one? Which we should mark with the SOCKET_FILTER
> drop reason?

So we have multiple packet socket consumers on our edge:
* systemd-networkd: listens on ETH_P_LLDPD, which is the role model
that does not do excessive things
* lldpd: I am not sure why we needed this one in presence of
systemd-networkd, but it is running atm, which contributes to constant
packet_rcv calls. It listens on ETH_P_ALL because of
https://github.com/lldpd/lldpd/pull/414. But its filter is doing the
correct work, so packets hitting this one is mostly "consumed"

Now the bad kids:
* arping: listens on ETH_P_ALL. This one contributes all the
skb:kfree_skb spikes, and the reason is sk_rmem_alloc overflows
rcvbuf. I suspect it is due to a poorly constructed filter so too many
packets get queued too fast.
* conduit-watcher: a health checker, sending packets on ETH_P_IP in
non-init netns. Majority of packet_rcv on this one goes to direct drop
due to netns difference.

So to conclude, it might be useful to set a reason for rcvbuf related
drops at least. On the other hand, almost all packets entered
packet_rcv are shared, so clone failure probably can also be a thing
under memory pressure.

-- 

Yan