netdev - Re: [PATCH net-next] net: tcp: Define tcp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALOAHbBRN=H-fFaKeZ_2iNAEhwfMUDLjo2thq+w4GCMh50+4TA@mail.gmail.com>
Date: Tue, 31 Dec 2024 21:05:50 +0800
From: Yafang Shao <laoar.shao@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: davem@...emloft.net, dsahern@...nel.org, kuba@...nel.org, 
	pabeni@...hat.com, horms@...nel.org, netdev@...r.kernel.org
Subject: Re: [PATCH net-next] net: tcp: Define tcp_listendrop() as noinline

On Tue, Dec 31, 2024 at 8:34 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Dec 31, 2024 at 12:49 PM Yafang Shao <laoar.shao@...il.com> wrote:
> >
> > On Tue, Dec 31, 2024 at 5:26 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > On Tue, Dec 31, 2024 at 6:52 AM Yafang Shao <laoar.shao@...il.com> wrote:
> > > >
> > > > The LINUX_MIB_LISTENDROPS counter can be incremented for several reasons,
> > > > such as:
> > > > - A full SYN backlog queue
> > > > - SYN flood attacks
> > > > - Memory exhaustion
> > > > - Other resource constraints
> > > >
> > > > Recently, one of our services experienced frequent ListenDrops. While
> > > > attempting to trace the root cause, we discovered that tracing the function
> > > > tcp_listendrop() was not straightforward because it is currently inlined.
> > > > To overcome this, we had to create a livepatch that defined a non-inlined
> > > > version of the function, which we then traced using BPF programs.
> > > >
> > > >   $ grep tcp_listendrop /proc/kallsyms
> > > >   ffffffffc093fba0 t tcp_listendrop_x     [livepatch_tmp]
> > > >
> > > > Through this process, we eventually determined that the ListenDrops were
> > > > caused by SYN attacks.
> > > >
> > > > Since tcp_listendrop() is not part of the critical path, defining it as
> > > > noinline would make it significantly easier to trace and diagnose issues
> > > > without requiring workarounds such as livepatching. This function can be
> > > > used by kernel modules like smc, so export it with EXPORT_SYMBOL_GPL().
> > > >
> > > > After that change, the result is as follows,
> > > >
> > > >   $ grep tcp_listendrop /proc/kallsyms
> > > >   ffffffffb718eaa0 T __pfx_tcp_listendrop
> > > >   ffffffffb718eab0 T tcp_listendrop
> > > >   ffffffffb7e636b8 r __ksymtab_tcp_listendrop
> > > >
> > > > Signed-off-by: Yafang Shao <laoar.shao@...il.com>
> > > > --
> > >
> > > Are we going to accept one patch at a time to un-inline all TCP
> > > related functions in the kernel ?
> >
> > I don't have an in-depth understanding of the TCP/IP stack, so I can't
> > consolidate all related TCP functions in the error paths into a single
> > patch. I would greatly appreciate it if you could help ensure these
> > functions are marked as non-inlined based on your expertise. If you
> > don't have time to do it directly, could you provide some guidance?
> >
> > The inlining of TCP functions in error paths often complicates tracing
> > efforts. For instance, we recently encountered issues with the inlined
> > tcp_write_err(), although we were fortunate to have an alternative
> > tracepoint available: inet_sk_error_report.
> >
> > Unfortunately, no such alternative exists for tcp_listendrop(), which
> > is why I submitted this patch.
> >
> > >
> > > My understanding is that current tools work fine. You may need to upgrade yours.
> > >
> > > # perf probe -a tcp_listendrop
> > > Added new events:
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >   probe:tcp_listendrop (on tcp_listendrop)
> > >
> > > You can now use it in all perf tools, such as:
> > >
> > > perf record -e probe:tcp_listendrop -aR sleep 1
> >
> > The issue is that we can't extract the struct `sock *sk` from
> > tcp_listendrop() using perf. Please correct me if I’m mistaken.
> >
> > In a containerized environment, it's common to have many pods running
> > on a single host, each assigned a unique IP. Our goal is to filter for
> > a specific local_ip:local_port combination while also capturing the
> > corresponding remote IP. This task is straightforward with bpftrace or
> > other BPF-based tools, but I’m unsure how to accomplish it using perf.
>
> I suggest you ask tracing experts' opinions, why is perf able to add a
> probe, but not bpftrace ?
>
> Again, accepting thousands of patches just because of tracing tools
> seems not good to me.

Thank you for your guidance on using perf. I now understand how to
achieve similar tracing with bpftrace.

I agree with your point that it’s not ideal to send thousands of
patches solely for the sake of tracing tools. It would indeed be much
better if this could be addressed in a single patch.

-- 
Regards
Yafang