[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL+tcoCz31PPOqjOvfh0OwkJvM_CHo4vNteAWhNZNfrcs57uug@mail.gmail.com>
Date: Tue, 28 May 2024 16:52:14 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: dsahern@...nel.org, kuba@...nel.org, pabeni@...hat.com,
davem@...emloft.net, netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>, Yongming Liu <yomiliu@...cent.com>,
Wangzi Yong <curuwang@...cent.com>
Subject: Re: [PATCH net-next] tcp: introduce a new MIB for CLOSE-WAIT sockets
On Tue, May 28, 2024 at 4:34 PM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> On Tue, May 28, 2024 at 3:36 PM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Tue, May 28, 2024 at 8:48 AM Jason Xing <kerneljasonxing@...il.com> wrote:
> > >
> > > Hello Eric,
> > >
> > > On Tue, May 28, 2024 at 1:13 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > > >
> > > > On Tue, May 28, 2024 at 4:12 AM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > > >
> > > > > From: Jason Xing <kernelxing@...cent.com>
> > > > >
> > > > > CLOSE-WAIT is a relatively special state which "represents waiting for
> > > > > a connection termination request from the local user" (RFC 793). Some
> > > > > issues may happen because of unexpected/too many CLOSE-WAIT sockets,
> > > > > like user application mistakenly handling close() syscall.
> > > > >
> > > > > We want to trace this total number of CLOSE-WAIT sockets fastly and
> > > > > frequently instead of resorting to displaying them altogether by using:
> > > > >
> > > > > netstat -anlp | grep CLOSE_WAIT
> > > >
> > > > This is horribly expensive.
> > >
> > > Yes.
> > >
> > > > Why asking af_unix and program names ?
> > > > You want to count some TCP sockets in a given state, right ?
> > > > iproute2 interface (inet_diag) can do the filtering in the kernel,
> > > > saving a lot of cycles.
> > > >
> > > > ss -t state close-wait
> > >
> > > Indeed, it is much better than netstat but not that good/fast enough
> > > if we've already generated a lot of sockets. This command is suitable
> > > for debug use, but not for frequent sampling, say, every 10 seconds.
> > > More than this, RFC 1213 defines CurrEstab which should also include
> > > close-wait sockets, but we don't have this one.
> >
> > "we don't have this one."
> > You mean we do not have CurrEstab ?
> > That might be user space decision to not display it from nstat
> > command, in useless_number()
> > (Not sure why. If someone thought it was useless, then CLOSE_WAIT
> > count is even more useless...)
>
> It has nothing to do with user applications.
>
> Let me give one example, ss -s can show the value of 'estab' which is
> derived from /proc/net/snmp file.
Speaking of the CurrEstab, for many newbies, they may ask what the use
of this counter is? For me, I would like to share an interesting issue
report I ever handled.
One day we had a moment when most CPUs were burned (cpu% is around
80%) and most applications were stuck all of sudden, but this
phenomenon disappeared very soon. After we deployed an agent
collecting the snmp counters, we noticed that there was one
application mistakenly launching a great number of connections
concurrently. It's a bug in that application.
Even the CurrEstab is useful, let alone the counter for close-wait.
Thanks,
Jason
Powered by blists - more mailing lists