[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <04f401db5a51$1f860810$5e921830$@samsung.com>
Date: Mon, 30 Dec 2024 09:23:55 +0900
From: "Dujeong.lee" <dujeong.lee@...sung.com>
To: "'Eric Dumazet'" <edumazet@...gle.com>
Cc: "'Youngmin Nam'" <youngmin.nam@...sung.com>, "'Jakub Kicinski'"
<kuba@...nel.org>, "'Neal Cardwell'" <ncardwell@...gle.com>,
<davem@...emloft.net>, <dsahern@...nel.org>, <pabeni@...hat.com>,
<horms@...nel.org>, <guo88.liu@...sung.com>, <yiwang.cai@...sung.com>,
<netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<joonki.min@...sung.com>, <hajun.sung@...sung.com>,
<d7271.choe@...sung.com>, <sw.ju@...sung.com>, <iamyunsu.kim@...sung.com>,
<kw0619.kim@...sung.com>, <hsl.lim@...sung.com>, <hanbum22.lee@...sung.com>,
<chaemoo.lim@...sung.com>, <seungjin1.yu@...sung.com>
Subject: RE: [PATCH] tcp: check socket state before calling WARN_ON
On Wed, Dec 18, 2024 7:28 PM Eric Dumazet <edumazet@...gle.com> wrote:
> On Wed, Dec 18, 2024 at 11:18 AM Dujeong.lee <dujeong.lee@...sung.com>
> wrote:
> >
> > Tue, December 10, 2024 at 4:10 PM Dujeong Lee wrote:
> > > On Tue, Dec 10, 2024 at 12:39 PM Dujeong Lee wrote:
> > > > On Mon, Dec 9, 2024 at 7:21 PM Eric Dumazet <edumazet@...gle.com>
> wrote:
> > > > > On Mon, Dec 9, 2024 at 11:16 AM Dujeong.lee
> > > > > <dujeong.lee@...sung.com>
> > > > > wrote:
> > > > > >
> > > > >
> > > > > > Thanks for all the details on packetdrill and we are also
> > > > > > exploring
> > > > > USENIX 2013 material.
> > > > > > I have one question. The issue happens when DUT receives TCP
> > > > > > ack with
> > > > > large delay from network, e.g., 28seconds since last Tx. Is
> > > > > packetdrill able to emulate this network delay (or congestion)
> > > > > in script
> > > > level?
> > > > >
> > > > > Yes, the packetdrill scripts can wait an arbitrary amount of
> > > > > time between each event
> > > > >
> > > > > +28 <next event>
> > > > >
> > > > > 28 seconds seems okay. If the issue was triggered after 4 days,
> > > > > packetdrill would be impractical ;)
> > > >
> > > > Hi all,
> > > >
> > > > We secured new ramdump.
> > > > Please find the below values with TCP header details.
> > > >
> > > > tp->packets_out = 0
> > > > tp->sacked_out = 0
> > > > tp->lost_out = 1
> > > > tp->retrans_out = 1
> > > > tp->rx_opt.sack_ok = 5 (tcp_is_sack(tp)) mss_cache = 1400
> > > > ((struct inet_connection_sock *)sk)->icsk_ca_state = 4 ((struct
> > > > inet_connection_sock *)sk)->icsk_pmtu_cookie = 1500
> > > >
> > > > Hex from ip header:
> > > > 45 00 00 40 75 40 00 00 39 06 91 13 8E FB 2A CA C0 A8 00 F7 01 BB
> > > > A7 CC 51
> > > > F8 63 CC 52 59 6D A6 B0 10 04 04 77 76 00 00 01 01 08 0A 89 72 C8
> > > > 42
> > > > 62 F5
> > > > F5 D1 01 01 05 0A 52 59 6D A5 52 59 6D A6
> > > >
> > > > Transmission Control Protocol
> > > > Source Port: 443
> > > > Destination Port: 42956
> > > > TCP Segment Len: 0
> > > > Sequence Number (raw): 1375232972
> > > > Acknowledgment number (raw): 1381592486
> > > > 1011 .... = Header Length: 44 bytes (11)
> > > > Flags: 0x010 (ACK)
> > > > Window: 1028
> > > > Calculated window size: 1028
> > > > Urgent Pointer: 0
> > > > Options: (24 bytes), No-Operation (NOP), No-Operation (NOP),
> > > > Timestamps, No-Operation (NOP), No-Operation (NOP), SACK
> > > >
> > > > If anyone wants to check other values, please feel free to ask me
> > > >
> > > > Thanks,
> > > > Dujeong.
> > >
> > > I have a question.
> > >
> > > From the latest ramdump I could see that
> > > 1) tcp_sk(sk)->packets_out = 0
> > > 2) inet_csk(sk)->icsk_backoff = 0
> > > 3) sk_write_queue.len = 0
> > > which suggests that tcp_write_queue_purge was indeed called.
> > >
> > > Noting that:
> > > 1) tcp_write_queue_purge reset packets_out to 0 and
> > > 2) in_flight should be non-negative where in_flight = packets_out -
> > > left_out + retrans_out, what if we reset left_out and retrans_out as
> > > well in tcp_write_queue_purge?
> > >
> > > Do we see any potential issue with this?
> >
> > Hello Eric and Neal.
> >
> > It is a gentle reminder.
> > Could you please review the latest ramdump values and and question?
>
> It will have to wait next year, Neal is OOO.
>
> I asked a packetdrill reproducer, I can not spend days working on an issue
> that does not trigger in our production hosts.
>
> Something could be wrong in your trees, or perhaps some eBPF program
> changing the state of the socket...
Hi Eric
I tried to make packetdrill script for local mode, which injects delayed acks for data and FIN after close.
// Test basic connection teardown where local process closes first:
// the local process calls close() first, so we send a FIN.
// Then we receive an delayed ACK for data and FIN.
// Then we receive a FIN and ACK it.
`../common/defaults.sh`
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 // Create socket
+.01...0.011 connect(3, ..., ...) = 0 // Initiate connection
+0 > S 0:0(0) <...> // Send SYN
+0 < S. 0:0(0) ack 1 win 32768 <mss 1000,nop,wscale 6,nop,nop,sackOK> // Receive SYN-ACK with TCP options
+0 > . 1:1(0) ack 1 // Send ACK
+0 write(3, ..., 1000) = 1000 // Write 1000 bytes
+0 > P. 1:1001(1000) ack 1 // Send data with PSH flag
+0 close(3) = 0 // Local side initiates close
+0 > F. 1001:1001(0) ack 1 // Send FIN
+1 < . 1:1(0) ack 1001 win 257 // Receive ACK for data
+0 < . 1:1(0) ack 1002 win 257 // Receive ACK for FIN
+0 < F. 1:1(0) ack 1002 win 257 // Receive FIN from remote
+0 > . 1002:1002(0) ack 2 // Send ACK for FIN
But got below error when I run the script.
$ sudo ./packetdrill ../tcp/close/close-half-delayed-ack.pkt
../tcp/close/close-half-delayed-ack.pkt:22: error handling packet: live packet field tcp_fin: expected: 0 (0x0) vs actual: 1 (0x1)
script packet: 1.010997 . 1002:1002(0) ack 2
actual packet: 0.014840 F. 1001:1001(0) ack 1 win 256
From the log, looks like there is difference between script and actual results.
Could you help (provide any guide) to check why this error happens?
Thanks,
Dujeong.
Powered by blists - more mailing lists