[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z1ZNTKHmCV9Jg2o8@perf>
Date: Mon, 9 Dec 2024 10:52:12 +0900
From: Youngmin Nam <youngmin.nam@...sung.com>
To: Neal Cardwell <ncardwell@...gle.com>
Cc: Eric Dumazet <edumazet@...gle.com>, Youngmin Nam
<youngmin.nam@...sung.com>, Jakub Kicinski <kuba@...nel.org>,
davem@...emloft.net, dsahern@...nel.org, pabeni@...hat.com,
horms@...nel.org, dujeong.lee@...sung.com, guo88.liu@...sung.com,
yiwang.cai@...sung.com, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, joonki.min@...sung.com,
hajun.sung@...sung.com, d7271.choe@...sung.com, sw.ju@...sung.com
Subject: Re: [PATCH] tcp: check socket state before calling WARN_ON
On Fri, Dec 06, 2024 at 10:34:16AM -0500, Neal Cardwell wrote:
> On Fri, Dec 6, 2024 at 4:08 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Fri, Dec 6, 2024 at 9:58 AM Youngmin Nam <youngmin.nam@...sung.com> wrote:
> > >
> > > On Fri, Dec 06, 2024 at 09:35:32AM +0100, Eric Dumazet wrote:
> > > > On Fri, Dec 6, 2024 at 6:50 AM Youngmin Nam <youngmin.nam@...sung.com> wrote:
> > > > >
> > > > > On Wed, Dec 04, 2024 at 08:13:33AM +0100, Eric Dumazet wrote:
> > > > > > On Wed, Dec 4, 2024 at 4:35 AM Youngmin Nam <youngmin.nam@...sung.com> wrote:
> > > > > > >
> > > > > > > On Tue, Dec 03, 2024 at 06:18:39PM -0800, Jakub Kicinski wrote:
> > > > > > > > On Tue, 3 Dec 2024 10:34:46 -0500 Neal Cardwell wrote:
> > > > > > > > > > I have not seen these warnings firing. Neal, have you seen this in the past ?
> > > > > > > > >
> > > > > > > > > I can't recall seeing these warnings over the past 5 years or so, and
> > > > > > > > > (from checking our monitoring) they don't seem to be firing in our
> > > > > > > > > fleet recently.
> > > > > > > >
> > > > > > > > FWIW I see this at Meta on 5.12 kernels, but nothing since.
> > > > > > > > Could be that one of our workloads is pinned to 5.12.
> > > > > > > > Youngmin, what's the newest kernel you can repro this on?
> > > > > > > >
> > > > > > > Hi Jakub.
> > > > > > > Thank you for taking an interest in this issue.
> > > > > > >
> > > > > > > We've seen this issue since 5.15 kernel.
> > > > > > > Now, we can see this on 6.6 kernel which is the newest kernel we are running.
> > > > > >
> > > > > > The fact that we are processing ACK packets after the write queue has
> > > > > > been purged would be a serious bug.
> > > > > >
> > > > > > Thus the WARN() makes sense to us.
> > > > > >
> > > > > > It would be easy to build a packetdrill test. Please do so, then we
> > > > > > can fix the root cause.
> > > > > >
> > > > > > Thank you !
> > > > > >
> > > > >
> > > > > Hi Eric.
> > > > >
> > > > > Unfortunately, we are not familiar with the Packetdrill test.
> > > > > Refering to the official website on Github, I tried to install it on my device.
> > > > >
> > > > > Here is what I did on my local machine.
> > > > >
> > > > > $ mkdir packetdrill
> > > > > $ cd packetdrill
> > > > > $ git clone https://protect2.fireeye.com/v1/url?k=746d28f3-15e63dd6-746ca3bc-74fe485cbff6-e405b48a4881ecfc&q=1&e=ca164227-d8ec-4d3c-bd27-af2d38964105&u=https%3A%2F%2Fgithub.com%2Fgoogle%2Fpacketdrill.git .
> > > > > $ cd gtests/net/packetdrill/
> > > > > $./configure
> > > > > $ make CC=/home/youngmin/Downloads/arm-gnu-toolchain-13.3.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc
> > > > >
> > > > > $ adb root
> > > > > $ adb push packetdrill /data/
> > > > > $ adb shell
> > > > >
> > > > > And here is what I did on my device
> > > > >
> > > > > erd9955:/data/packetdrill/gtests/net # ./packetdrill/run_all.py -S -v -L -l tcp/
> > > > > /system/bin/sh: ./packetdrill/run_all.py: No such file or directory
> > > > >
> > > > > I'm not sure if this procedure is correct.
> > > > > Could you help us run the Packetdrill on an Android device ?
>
> BTW, Youngmin, do you have a packet trace (e.g., tcpdump .pcap file)
> of the workload that causes this warning?
>
> If not, in order to construct a packetdrill test to reproduce this
> issue, you may need to:
>
> (1) add code to the warning to print the local and remote IP address
> and port number when the warning fires (see DBGUNDO() for an example)
>
> (2) take a tcpdump .pcap trace of the workload
>
> Then you can use the {local_ip:local_port, remote_ip:remote_port} info
> from (1) to find the packet trace in (2) that can be used to construct
> a packetdrill test to reproduce this issue.
>
> thanks,
> neal
>
(Neal, please ignore my previous email as I missed adding the CC list.)
Thank you for your detailed and considerate information.
We are currently trying to reproduce this issue using our stability stress test and
aiming to capture the tcpdump output.
Thanks.
Powered by blists - more mailing lists