[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADVnQykPo35mQ1y16WD3zppENCeOi+2Ea_2m-AjUQVPc9SXm4g@mail.gmail.com>
Date: Wed, 4 Dec 2024 09:21:50 -0500
From: Neal Cardwell <ncardwell@...gle.com>
To: "Dujeong.lee" <dujeong.lee@...sung.com>
Cc: Eric Dumazet <edumazet@...gle.com>, Youngmin Nam <youngmin.nam@...sung.com>,
Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net, dsahern@...nel.org,
pabeni@...hat.com, horms@...nel.org, guo88.liu@...sung.com,
yiwang.cai@...sung.com, netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
joonki.min@...sung.com, hajun.sung@...sung.com, d7271.choe@...sung.com,
sw.ju@...sung.com
Subject: Re: [PATCH] tcp: check socket state before calling WARN_ON
On Wed, Dec 4, 2024 at 2:48 AM Dujeong.lee <dujeong.lee@...sung.com> wrote:
>
> On Wed, Dec 4, 2024 at 4:14 PM Eric Dumazet wrote:
> > To: Youngmin Nam <youngmin.nam@...sung.com>
> > Cc: Jakub Kicinski <kuba@...nel.org>; Neal Cardwell <ncardwell@...gle.com>;
> > davem@...emloft.net; dsahern@...nel.org; pabeni@...hat.com;
> > horms@...nel.org; dujeong.lee@...sung.com; guo88.liu@...sung.com;
> > yiwang.cai@...sung.com; netdev@...r.kernel.org; linux-
> > kernel@...r.kernel.org; joonki.min@...sung.com; hajun.sung@...sung.com;
> > d7271.choe@...sung.com; sw.ju@...sung.com
> > Subject: Re: [PATCH] tcp: check socket state before calling WARN_ON
> >
> > On Wed, Dec 4, 2024 at 4:35 AM Youngmin Nam <youngmin.nam@...sung.com>
> > wrote:
> > >
> > > On Tue, Dec 03, 2024 at 06:18:39PM -0800, Jakub Kicinski wrote:
> > > > On Tue, 3 Dec 2024 10:34:46 -0500 Neal Cardwell wrote:
> > > > > > I have not seen these warnings firing. Neal, have you seen this in
> > the past ?
> > > > >
> > > > > I can't recall seeing these warnings over the past 5 years or so,
> > > > > and (from checking our monitoring) they don't seem to be firing in
> > > > > our fleet recently.
> > > >
> > > > FWIW I see this at Meta on 5.12 kernels, but nothing since.
> > > > Could be that one of our workloads is pinned to 5.12.
> > > > Youngmin, what's the newest kernel you can repro this on?
> > > >
> > > Hi Jakub.
> > > Thank you for taking an interest in this issue.
> > >
> > > We've seen this issue since 5.15 kernel.
> > > Now, we can see this on 6.6 kernel which is the newest kernel we are
> > running.
> >
> > The fact that we are processing ACK packets after the write queue has been
> > purged would be a serious bug.
> >
> > Thus the WARN() makes sense to us.
> >
> > It would be easy to build a packetdrill test. Please do so, then we can
> > fix the root cause.
> >
> > Thank you !
>
>
> Please let me share some more details and clarifications on the issue from ramdump snapshot locally secured.
>
> 1) This issue has been reported from Android-T linux kernel when we enabled panic_on_warn for the first time.
> Reproduction rate is not high and can be seen in any test cases with public internet connection.
>
> 2) Analysis from ramdump (which is not available at the moment).
> 2-A) From ramdump, I was able to find below values.
> tp->packets_out = 0
> tp->retrans_out = 1
> tp->max_packets_out = 1
> tp->max_packets_Seq = 1575830358
> tp->snd_ssthresh = 5
> tp->snd_cwnd = 1
> tp->prior_cwnd = 10
> tp->wite_seq = 1575830359
> tp->pushed_seq = 1575830358
> tp->lost_out = 1
> tp->sacked_out = 0
Thanks for all the details! If the ramdump becomes available again at
some point, would it be possible to pull out the following values as
well:
tp->mss_cache
inet_csk(sk)->icsk_pmtu_cookie
inet_csk(sk)->icsk_ca_state
Thanks,
neal
Powered by blists - more mailing lists