[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <000001db4a23$746be360$5d43aa20$@samsung.com>
Date: Mon, 9 Dec 2024 19:16:47 +0900
From: "Dujeong.lee" <dujeong.lee@...sung.com>
To: "'Eric Dumazet'" <edumazet@...gle.com>, "'Youngmin Nam'"
<youngmin.nam@...sung.com>
Cc: "'Jakub Kicinski'" <kuba@...nel.org>, "'Neal Cardwell'"
<ncardwell@...gle.com>, <davem@...emloft.net>, <dsahern@...nel.org>,
<pabeni@...hat.com>, <horms@...nel.org>, <guo88.liu@...sung.com>,
<yiwang.cai@...sung.com>, <netdev@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <joonki.min@...sung.com>,
<hajun.sung@...sung.com>, <d7271.choe@...sung.com>, <sw.ju@...sung.com>,
<iamyunsu.kim@...sung.com>, <kw0619.kim@...sung.com>, <hsl.lim@...sung.com>,
<hanbum22.lee@...sung.com>, <chaemoo.lim@...sung.com>,
<seungjin1.yu@...sung.com>
Subject: RE: [PATCH] tcp: check socket state before calling WARN_ON
On Fri, Dec 06, 2024 at 10:08:17AM +0100, Eric Dumazet wrote:
> On Fri, Dec 6, 2024 at 9:58 AM Youngmin Nam <youngmin.nam@...sung.com>
> wrote:
> >
> > On Fri, Dec 06, 2024 at 09:35:32AM +0100, Eric Dumazet wrote:
> > > On Fri, Dec 6, 2024 at 6:50 AM Youngmin Nam <youngmin.nam@...sung.com>
> wrote:
> > > >
> > > > On Wed, Dec 04, 2024 at 08:13:33AM +0100, Eric Dumazet wrote:
> > > > > On Wed, Dec 4, 2024 at 4:35 AM Youngmin Nam
> <youngmin.nam@...sung.com> wrote:
> > > > > >
> > > > > > On Tue, Dec 03, 2024 at 06:18:39PM -0800, Jakub Kicinski wrote:
> > > > > > > On Tue, 3 Dec 2024 10:34:46 -0500 Neal Cardwell wrote:
> > > > > > > > > I have not seen these warnings firing. Neal, have you seen
> this in the past ?
> > > > > > > >
> > > > > > > > I can't recall seeing these warnings over the past 5 years
> > > > > > > > or so, and (from checking our monitoring) they don't seem
> > > > > > > > to be firing in our fleet recently.
> > > > > > >
> > > > > > > FWIW I see this at Meta on 5.12 kernels, but nothing since.
> > > > > > > Could be that one of our workloads is pinned to 5.12.
> > > > > > > Youngmin, what's the newest kernel you can repro this on?
> > > > > > >
> > > > > > Hi Jakub.
> > > > > > Thank you for taking an interest in this issue.
> > > > > >
> > > > > > We've seen this issue since 5.15 kernel.
> > > > > > Now, we can see this on 6.6 kernel which is the newest kernel we
> are running.
> > > > >
> > > > > The fact that we are processing ACK packets after the write
> > > > > queue has been purged would be a serious bug.
> > > > >
> > > > > Thus the WARN() makes sense to us.
> > > > >
> > > > > It would be easy to build a packetdrill test. Please do so, then
> > > > > we can fix the root cause.
> > > > >
> > > > > Thank you !
> > > > >
> > > >
> > > > Hi Eric.
> > > >
> > > > Unfortunately, we are not familiar with the Packetdrill test.
> > > > Refering to the official website on Github, I tried to install it on
> my device.
> > > >
> > > > Here is what I did on my local machine.
> > > >
> > > > $ mkdir packetdrill
> > > > $ cd packetdrill
> > > > $ git clone https://protect2.fireeye.com/v1/url?k=746d28f3-15e63dd6-
> 746ca3bc-74fe485cbff6-e405b48a4881ecfc&q=1&e=ca164227-d8ec-4d3c-bd27-
> af2d38964105&u=https%3A%2F%2Fgithub.com%2Fgoogle%2Fpacketdrill.git .
> > > > $ cd gtests/net/packetdrill/
> > > > $./configure
> > > > $ make
> > > > CC=/home/youngmin/Downloads/arm-gnu-toolchain-13.3.rel1-x86_64-aar
> > > > ch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc
> > > >
> > > > $ adb root
> > > > $ adb push packetdrill /data/
> > > > $ adb shell
> > > >
> > > > And here is what I did on my device
> > > >
> > > > erd9955:/data/packetdrill/gtests/net # ./packetdrill/run_all.py -S
> > > > -v -L -l tcp/
> > > > /system/bin/sh: ./packetdrill/run_all.py: No such file or
> > > > directory
> > > >
> > > > I'm not sure if this procedure is correct.
> > > > Could you help us run the Packetdrill on an Android device ?
> > >
> > > packetdrill can run anywhere, for instance on your laptop, no need
> > > to compile / install it on Android
> > >
> > > Then you can run single test like
> > >
> > > # packetdrill gtests/net/tcp/sack/sack-route-refresh-ip-tos.pkt
> > >
> >
> > You mean.. To test an Android device, we need to run packetdrill on
> laptop, right ?
> >
> > Laptop(run packetdrill script) <--------------------------> Android
> > device
> >
> > By the way, how can we test the Android device (DUT) from packetdrill
> which is running on Laptop?
> > I hope you understand that I am aksing this question because we are not
> familiar with the packetdrill.
> > Thanks.
>
> packetdrill does not need to run on a physical DUT, it uses a software
> stack : TCP and tun device.
>
> You have a kernel tree, compile it and run a VM, like virtme-ng
>
> vng -bv
>
> We use this to run kernel selftests in which we started adding packetdrill
> tests (in recent kernel tree)
>
> ./tools/testing/selftests/net/packetdrill/tcp_slow_start_slow-start-ack-
> per-4pkt.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_zerocopy_client.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_zerocopy_batch.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_slow_start_slow-start-after-
> win-update.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_slow_start_slow-start-fq-
> ack-per-2pkt.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_zerocopy_maxfrags.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_inq_server.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_zerocopy_epoll_exclusive.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_zerocopy_basic.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_zerocopy_small.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_slow_start_slow-start-app-
> limited-9-packets-out.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_slow_start_slow-start-ack-
> per-2pkt.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_zerocopy_epoll_oneshot.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_zerocopy_fastopen-server.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_inq_client.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_zerocopy_epoll_edge.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_slow_start_slow-start-app-
> limited.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_zerocopy_fastopen-client.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_zerocopy_closed.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_slow_start_slow-start-ack-
> per-1pkt.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_slow_start_slow-start-after-
> idle.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_slow_start_slow-start-ack-
> per-2pkt-send-5pkt.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_slow_start_slow-start-ack-
> per-2pkt-send-6pkt.pkt
> ./tools/testing/selftests/net/packetdrill/tcp_md5_md5-only-on-client-
> ack.pkt
> ./tools/testing/selftests/net/netfilter/packetdrill/conntrack_synack_old.p
> kt
> ./tools/testing/selftests/net/netfilter/packetdrill/conntrack_syn_challeng
> e_ack.pkt
> ./tools/testing/selftests/net/netfilter/packetdrill/conntrack_inexact_rst.
> pkt
> ./tools/testing/selftests/net/netfilter/packetdrill/conntrack_synack_reuse
> .pkt
> ./tools/testing/selftests/net/netfilter/packetdrill/conntrack_rst_invalid.
> pkt
> ./tools/testing/selftests/net/netfilter/packetdrill/conntrack_ack_loss_sta
> ll.pkt
Thanks for all the details on packetdrill and we are also exploring USENIX 2013 material.
I have one question. The issue happens when DUT receives TCP ack with large delay from network, e.g., 28seconds since last Tx. Is packetdrill able to emulate this network delay (or congestion) in script level?
Powered by blists - more mailing lists